AleoNet / snarkOS

A Decentralized Operating System for ZK Applications

Home Page:http://snarkos.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Bug] Malicious validator can send fake BlockResponse to block honest validators from processing messages

elderhammer opened this issue Β· comments

πŸ› Bug Report

  1. Block.Transactions limits the maximum number of transactions to 1048575 in the deserialization logic
    https://github.com/AleoNet/snarkVM/blob/454d555a0ee1478dce5fa1822b64525a361b6b27/ledger/block/src/transactions/bytes.rs#L29-L32
        // Ensure the number of transactions is within bounds.
        if num_txs as usize > Self::MAX_TRANSACTIONS {
            return Err(error("Failed to read transactions: too many transactions"));
        }
  1. Malicious validators can construct BlockResponse messages by filling a large number of fake transactions
  2. The victim's message processing logic is stuck because of the deserialization of a large number of fake transactions
    // Perform the deferred non-blocking deserialization of the blocks.
    let blocks = blocks.deserialize().await.map_err(|error| anyhow!("[BlockResponse] {error}"))?;
  3. The victim's block synchronization may stop due to being stuck on deserialization, see: ProvableHQ#6

Your Environment

snarkOS Version: cf83035

I think there are two problems to solve:

  1. Refer to the changes in #3304 and put the deserialization work into the rayon thread
  2. Appropriately reduce MAX_TRANSACTIONS (I remember that each certificate can only have a maximum of 50 transactions)

@joske can you confirm if this is an issue (and the related PR resolves it?)

I think #3304 should avoid the node getting stuck on big blocks. That said, there's still a deadlock in sync that we haven't found yet.

I don't know about the MAX_TRANSACTIONS

I think #3304 should avoid the node getting stuck on big blocks. That said, there's still a deadlock in sync that we haven't found yet.

I don't know about the MAX_TRANSACTIONS

I've looked at change 3304, which is a change to node/router/src/inbound.rs, but the problem described there is within the BFT module:

Event::BlockResponse(block_response) => {
// If a sync sender was provided, then process the block response.
if let Some(sync_sender) = self.sync_sender.get() {
// Retrieve the block response.
let BlockResponse { request, blocks } = block_response;
// Perform the deferred non-blocking deserialization of the blocks.
let blocks = blocks.deserialize().await.map_err(|error| anyhow!("[BlockResponse] {error}"))?;
// Ensure the block response is well-formed.
blocks.ensure_response_is_well_formed(peer_ip, request.start_height, request.end_height)?;
// Send the blocks to the sync module.
if let Err(e) = sync_sender.advance_with_sync_blocks(peer_ip, blocks.0).await {
warn!("Unable to process block response from '{peer_ip}' - {e}");
}
}
Ok(())
}

The value of MAX_TRANSACTIONS is hardcoded to 1048575, which means that a fake BlockResponse may be filled with 1048575 transactions, which will make deserialization extremely time-consuming.

it's probably a good idea to defer this deserialization to rayon in Gateway too, yes.

On the MAX_TRANSACTIONS front, there are 2 limiters:

  1. The protocol limit allows up to 2^20 transactions according to our merkle tree standard.
  2. The BFT worst case scenario formula is MAX_NUM_VALIDATORS * NUM_TRANSACTIONS_IN_PROPOSAL * MAX_GC_ROUNDS as defined by Transactions::<N>::MAX_ABORTED_TRANSACTIONS. In testnet, this value is 100 * 50 * 100 = 500,000. Which is already large enough that the problem still persists

I would say that this problem needs a more fundamental fix/avoidance than just lowering the size check.