AleoNet / snarkOS

A Decentralized Operating System for ZK Applications

Home Page:http://snarkos.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Bug] Malicious peers can still send spam connection attempts

elderhammer opened this issue Β· comments

πŸ› Bug Report

Malicious peers can still send spam connection attempts.

Steps to Reproduce

  1. After completing the TCP connection, the malicious peer sends an illegal ChallengeRequest message
  2. Due to the failure to deserialize the ChallengeRequest message, the connection process exits before executing ensure_peer_is_allowed
    // Construct the stream.
    let mut framed = Framed::new(stream, MessageCodec::<N>::handshake());
    /* Step 1: Receive the challenge request. */
    // Listen for the challenge request message.
    let peer_request = expect_message!(Message::ChallengeRequest, framed, peer_addr);
    // Obtain the peer's listening address.
    *peer_ip = Some(SocketAddr::new(peer_addr.ip(), peer_request.listener_port));
    let peer_ip = peer_ip.unwrap();
    // Knowing the peer's listening address, ensure it is allowed to connect.
    if let Err(forbidden_message) = self.ensure_peer_is_allowed(peer_ip) {
    return Err(error(format!("{forbidden_message}")));
    }
  3. Due to the early exit, the malicious peer is not recorded in restricted_peers
  4. The malicious peer repeats this process

Test log:

2024-06-12T03:44:07.113190Z DEBUG tcp{name="0"}: snarkos_node_tcp::tcp: Received a connection from 8.217.215.190:33850
2024-06-12T03:44:07.113636Z DEBUG tcp{name="0"}: snarkos_node_tcp::protocols::handshake: shaking hands with 8.217.215.190:33850 as the Responder
2024-06-12T03:44:07.113672Z DEBUG snarkos_node_router::handshake: Received a connection request from '8.217.215.190:33850'
2024-06-12T03:44:07.113809Z ERROR tcp{name="0"}: snarkos_node_tcp::protocols::handshake: handshake with 8.217.215.190:33850 failed: frame size too big
2024-06-12T03:44:07.113905Z ERROR tcp{name="0"}: snarkos_node_tcp::tcp: Failed to connect with 8.217.215.190:33850: frame size too big

2024-06-12T03:44:10.994371Z DEBUG tcp{name="0"}: snarkos_node_tcp::tcp: Received a connection from 8.217.215.190:33854
2024-06-12T03:44:10.994650Z DEBUG tcp{name="0"}: snarkos_node_tcp::protocols::handshake: shaking hands with 8.217.215.190:33854 as the Responder
2024-06-12T03:44:10.994683Z DEBUG snarkos_node_router::handshake: Received a connection request from '8.217.215.190:33854'
2024-06-12T03:44:10.994847Z ERROR tcp{name="0"}: snarkos_node_tcp::protocols::handshake: handshake with 8.217.215.190:33854 failed: frame size too big
2024-06-12T03:44:10.994948Z ERROR tcp{name="0"}: snarkos_node_tcp::tcp: Failed to connect with 8.217.215.190:33854: frame size too big

2024-06-12T03:44:14.616966Z DEBUG tcp{name="0"}: snarkos_node_tcp::tcp: Received a connection from 8.217.215.190:39244
2024-06-12T03:44:14.617241Z DEBUG tcp{name="0"}: snarkos_node_tcp::protocols::handshake: shaking hands with 8.217.215.190:39244 as the Responder
2024-06-12T03:44:14.617273Z DEBUG snarkos_node_router::handshake: Received a connection request from '8.217.215.190:39244'
2024-06-12T03:44:14.617414Z ERROR tcp{name="0"}: snarkos_node_tcp::protocols::handshake: handshake with 8.217.215.190:39244 failed: frame size too big
2024-06-12T03:44:14.617512Z ERROR tcp{name="0"}: snarkos_node_tcp::tcp: Failed to connect with 8.217.215.190:39244: frame size too big

2024-06-12T03:44:18.483406Z DEBUG tcp{name="0"}: snarkos_node_tcp::tcp: Received a connection from 8.217.215.190:39252
2024-06-12T03:44:18.483771Z DEBUG tcp{name="0"}: snarkos_node_tcp::protocols::handshake: shaking hands with 8.217.215.190:39252 as the Responder
2024-06-12T03:44:18.483837Z DEBUG snarkos_node_router::handshake: Received a connection request from '8.217.215.190:39252'
2024-06-12T03:44:18.483961Z ERROR tcp{name="0"}: snarkos_node_tcp::protocols::handshake: handshake with 8.217.215.190:39252 failed: frame size too big
2024-06-12T03:44:18.484060Z ERROR tcp{name="0"}: snarkos_node_tcp::tcp: Failed to connect with 8.217.215.190:39252: frame size too big

2024-06-12T03:44:22.113255Z DEBUG tcp{name="0"}: snarkos_node_tcp::tcp: Received a connection from 8.217.215.190:41792
2024-06-12T03:44:22.113530Z DEBUG tcp{name="0"}: snarkos_node_tcp::protocols::handshake: shaking hands with 8.217.215.190:41792 as the Responder
2024-06-12T03:44:22.113562Z DEBUG snarkos_node_router::handshake: Received a connection request from '8.217.215.190:41792'
2024-06-12T03:44:22.113682Z ERROR tcp{name="0"}: snarkos_node_tcp::protocols::handshake: handshake with 8.217.215.190:41792 failed: frame size too big
2024-06-12T03:44:22.113777Z ERROR tcp{name="0"}: snarkos_node_tcp::tcp: Failed to connect with 8.217.215.190:41792: frame size too big

2024-06-12T03:44:25.996292Z DEBUG tcp{name="0"}: snarkos_node_tcp::tcp: Received a connection from 8.217.215.190:41804
2024-06-12T03:44:25.996409Z DEBUG tcp{name="0"}: snarkos_node_tcp::protocols::handshake: shaking hands with 8.217.215.190:41804 as the Responder
2024-06-12T03:44:25.996426Z DEBUG snarkos_node_router::handshake: Received a connection request from '8.217.215.190:41804'
2024-06-12T03:44:25.996457Z ERROR tcp{name="0"}: snarkos_node_tcp::protocols::handshake: handshake with 8.217.215.190:41804 failed: frame size too big
2024-06-12T03:44:25.996482Z ERROR tcp{name="0"}: snarkos_node_tcp::tcp: Failed to connect with 8.217.215.190:41804: frame size too big

2024-06-12T03:44:29.631485Z DEBUG tcp{name="0"}: snarkos_node_tcp::tcp: Received a connection from 8.217.215.190:41810
2024-06-12T03:44:29.631779Z DEBUG tcp{name="0"}: snarkos_node_tcp::protocols::handshake: shaking hands with 8.217.215.190:41810 as the Responder
2024-06-12T03:44:29.631812Z DEBUG snarkos_node_router::handshake: Received a connection request from '8.217.215.190:41810'
2024-06-12T03:44:29.631933Z ERROR tcp{name="0"}: snarkos_node_tcp::protocols::handshake: handshake with 8.217.215.190:41810 failed: frame size too big
2024-06-12T03:44:29.632034Z ERROR tcp{name="0"}: snarkos_node_tcp::tcp: Failed to connect with 8.217.215.190:41810: frame size too big

Expected Behavior

Modify the restriction logic to prevent malicious peers from spamming connection attempts.

Your Environment

snarkOS Version: cd73c74

The BFT module also has the same problem:

// Construct the stream.
let mut framed = Framed::new(stream, EventCodec::<N>::handshake());
/* Step 1: Receive the challenge request. */
// Listen for the challenge request message.
let peer_request = expect_event!(Event::ChallengeRequest, framed, peer_addr);
// Ensure the address is not the same as this node.
if self.account.address() == peer_request.address {
return Err(error("Skipping request to connect to self".to_string()));
}
// Obtain the peer's listening address.
*peer_ip = Some(SocketAddr::new(peer_addr.ip(), peer_request.listener_port));
let peer_ip = peer_ip.unwrap();
// Knowing the peer's listening address, ensure it is allowed to connect.
if let Err(forbidden_message) = self.ensure_peer_is_allowed(peer_ip) {
return Err(error(format!("{forbidden_message}")));
}

I checked the source code and found that the limit is only checked for actively connected peers, not for passively connected peers.
This should not be able to limit spam on passive connections, which will consume the node's TCP connection capacity and prevent it from connecting to normal nodes.

The experiment is as follows:

  1. Set the victim node's MAXIMUM_NUMBER_OF_PEERS to 6 (for testing purposes)
  2. The malicious node attempts to perform a heartbeat every 200 ms, and connects to the victim node each time a heartbeat is performed. During the handshake, sleep for 3100 ms. Run 7 malicious nodes at the same time
    Test code: https://github.com/elderhammer/snarkOS/tree/spam_connection
// Send a challenge request to the peer.
debug!("Send challenge request, port: {}, type: {}, address: {}, nonce: {}", self.local_ip().port(), self.node_type, self.address(), our_nonce);
let our_request = ChallengeRequest::new(self.local_ip().port(), self.node_type, self.address(), our_nonce);
sleep(Duration::from_millis(3100)).await;
send(&mut framed, peer_addr, Message::ChallengeRequest(our_request)).await?;
  1. The victim node sets a 3s timeout for the handshake process, and disconnects the connection after the timeout, but does not record and identify the malicious node's connection spam behavior. Since the malicious node's connection is not identified and prohibited, the victim node's TCP connection capacity is consumed, and the log is as follows:
2024-06-14T03:21:06.544429Z DEBUG tcp{name="0"}: snarkos_node_tcp::tcp: Received a connection from 61.244.157.125:54914
2024-06-14T03:21:06.544499Z  WARN tcp{name="0"}: snarkos_node_tcp::tcp: Maximum number of active & pending connections (6) reached
2024-06-14T03:21:06.544508Z DEBUG tcp{name="0"}: snarkos_node_tcp::tcp: Rejecting the connection from 61.244.157.125:54914
2024-06-14T03:21:06.624104Z DEBUG tcp{name="0"}: snarkos_node_tcp::tcp: Received a connection from 61.244.157.125:54918
2024-06-14T03:21:06.624149Z  WARN tcp{name="0"}: snarkos_node_tcp::tcp: Maximum number of active & pending connections (6) reached
2024-06-14T03:21:06.624158Z DEBUG tcp{name="0"}: snarkos_node_tcp::tcp: Rejecting the connection from 61.244.157.125:54918
2024-06-14T03:21:09.146751Z DEBUG tcp{name="0"}: snarkos_node_tcp::tcp: Received a connection from 61.244.157.125:54930
2024-06-14T03:21:09.147094Z DEBUG snarkos_node_router::handshake: Received a connection request from '61.244.157.125:54930'
2024-06-14T03:21:09.766330Z DEBUG tcp{name="0"}: snarkos_node_tcp::tcp: Received a connection from 61.244.157.125:54936
2024-06-14T03:21:09.766380Z  WARN tcp{name="0"}: snarkos_node_tcp::tcp: Maximum number of active & pending connections (6) reached
2024-06-14T03:21:09.766390Z DEBUG tcp{name="0"}: snarkos_node_tcp::tcp: Rejecting the connection from 61.244.157.125:54936
2024-06-14T03:21:09.845530Z DEBUG tcp{name="0"}: snarkos_node_tcp::tcp: Received a connection from 61.244.157.125:54938
2024-06-14T03:21:09.845569Z  WARN tcp{name="0"}: snarkos_node_tcp::tcp: Maximum number of active & pending connections (6) reached
2024-06-14T03:21:09.845575Z DEBUG tcp{name="0"}: snarkos_node_tcp::tcp: Rejecting the connection from 61.244.157.125:54938
2024-06-14T03:21:12.360380Z DEBUG tcp{name="0"}: snarkos_node_tcp::tcp: Received a connection from 61.244.157.125:46188
2024-06-14T03:21:12.360797Z DEBUG snarkos_node_router::handshake: Received a connection request from '61.244.157.125:46188'
2024-06-14T03:21:12.977674Z DEBUG tcp{name="0"}: snarkos_node_tcp::tcp: Received a connection from 61.244.157.125:46192
2024-06-14T03:21:12.977735Z  WARN tcp{name="0"}: snarkos_node_tcp::tcp: Maximum number of active & pending connections (6) reached
2024-06-14T03:21:12.977744Z DEBUG tcp{name="0"}: snarkos_node_tcp::tcp: Rejecting the connection from 61.244.157.125:46192
2024-06-14T03:21:13.070124Z DEBUG tcp{name="0"}: snarkos_node_tcp::tcp: Received a connection from 61.244.157.125:46200
2024-06-14T03:21:13.070177Z  WARN tcp{name="0"}: snarkos_node_tcp::tcp: Maximum number of active & pending connections (6) reached
2024-06-14T03:21:13.070186Z DEBUG tcp{name="0"}: snarkos_node_tcp::tcp: Rejecting the connection from 61.244.157.125:46200
2024-06-14T03:21:15.587735Z DEBUG tcp{name="0"}: snarkos_node_tcp::tcp: Received a connection from 61.244.157.125:46208
2024-06-14T03:21:15.588129Z DEBUG snarkos_node_router::handshake: Received a connection request from '61.244.157.125:46208'
2024-06-14T03:21:16.205362Z DEBUG tcp{name="0"}: snarkos_node_tcp::tcp: Received a connection from 61.244.157.125:46224
2024-06-14T03:21:16.205404Z  WARN tcp{name="0"}: snarkos_node_tcp::tcp: Maximum number of active & pending connections (6) reached
2024-06-14T03:21:16.205413Z DEBUG tcp{name="0"}: snarkos_node_tcp::tcp: Rejecting the connection from 61.244.157.125:46224
2024-06-14T03:21:16.280948Z DEBUG tcp{name="0"}: snarkos_node_tcp::tcp: Received a connection from 61.244.157.125:46236
2024-06-14T03:21:16.280981Z  WARN tcp{name="0"}: snarkos_node_tcp::tcp: Maximum number of active & pending connections (6) reached
2024-06-14T03:21:16.280999Z DEBUG tcp{name="0"}: snarkos_node_tcp::tcp: Rejecting the connection from 61.244.157.125:46236
2024-06-14T03:21:18.801257Z DEBUG tcp{name="0"}: snarkos_node_tcp::tcp: Received a connection from 61.244.157.125:46244
2024-06-14T03:21:18.801634Z DEBUG snarkos_node_router::handshake: Received a connection request from '61.244.157.125:46244'
2024-06-14T03:21:19.434504Z DEBUG tcp{name="0"}: snarkos_node_tcp::tcp: Received a connection from 61.244.157.125:46246
2024-06-14T03:21:19.434537Z  WARN tcp{name="0"}: snarkos_node_tcp::tcp: Maximum number of active & pending connections (6) reached
2024-06-14T03:21:19.434547Z DEBUG tcp{name="0"}: snarkos_node_tcp::tcp: Rejecting the connection from 61.244.157.125:46246
  1. Run a normal node to try to connect to the victim node, and the connection almost always fails. The log is as follows:
2024-06-14T02:50:17.616461Z DEBUG tcp{name="0"}: snarkos_node_tcp::tcp: establishing connection with 127.0.0.1:4135; the peer is connected on port 39
170                                                                                                                                                  
2024-06-14T02:50:17.616803Z DEBUG tcp{name="0"}: snarkos_node_tcp::protocols::handshake: shaking hands with 127.0.0.1:4135 as the Initiator          
2024-06-14T02:50:17.617102Z ERROR tcp{name="0"}: snarkos_node_tcp::protocols::handshake: handshake with 127.0.0.1:4135 failed: '127.0.0.1:4135' disco
nnected before sending "Message::ChallengeResponse"                                                                                                  
2024-06-14T02:50:17.617174Z ERROR tcp{name="0"}: snarkos_node_tcp::tcp: Unable to initiate a connection with 127.0.0.1:4135: '127.0.0.1:4135' disconn
ected before sending "Message::ChallengeResponse"                                                                                                    
2024-06-14T02:50:42.617276Z DEBUG tcp{name="0"}: snarkos_node_tcp::tcp: establishing connection with 127.0.0.1:4135; the peer is connected on port 44
426                                                                                                                                                  
2024-06-14T02:50:42.617516Z DEBUG tcp{name="0"}: snarkos_node_tcp::protocols::handshake: shaking hands with 127.0.0.1:4135 as the Initiator          
2024-06-14T02:50:42.617751Z ERROR tcp{name="0"}: snarkos_node_tcp::protocols::handshake: handshake with 127.0.0.1:4135 failed: '127.0.0.1:4135' disco
nnected before sending "Message::ChallengeResponse"                                                                                                  
2024-06-14T02:50:42.617811Z ERROR tcp{name="0"}: snarkos_node_tcp::tcp: Unable to initiate a connection with 127.0.0.1:4135: '127.0.0.1:4135' disconn
ected before sending "Message::ChallengeResponse"                                                                                                    
2024-06-14T02:51:07.617859Z DEBUG tcp{name="0"}: snarkos_node_tcp::tcp: establishing connection with 127.0.0.1:4135; the peer is connected on port 59
288                                                                                                                                                  
2024-06-14T02:51:07.618123Z DEBUG tcp{name="0"}: snarkos_node_tcp::protocols::handshake: shaking hands with 127.0.0.1:4135 as the Initiator          
2024-06-14T02:51:07.618328Z ERROR tcp{name="0"}: snarkos_node_tcp::protocols::handshake: handshake with 127.0.0.1:4135 failed: '127.0.0.1:4135' disco
nnected before sending "Message::ChallengeResponse"                                                                                                  
2024-06-14T02:51:07.618383Z ERROR tcp{name="0"}: snarkos_node_tcp::tcp: Unable to initiate a connection with 127.0.0.1:4135: '127.0.0.1:4135' disconn
ected before sending "Message::ChallengeResponse"                                                                                                    
2024-06-14T02:51:32.619341Z DEBUG tcp{name="0"}: snarkos_node_tcp::tcp: establishing connection with 127.0.0.1:4135; the peer is connected on port 59
124                                                                                                                                                  
2024-06-14T02:51:32.619622Z DEBUG tcp{name="0"}: snarkos_node_tcp::protocols::handshake: shaking hands with 127.0.0.1:4135 as the Initiator          
2024-06-14T02:51:32.619852Z ERROR tcp{name="0"}: snarkos_node_tcp::protocols::handshake: handshake with 127.0.0.1:4135 failed: '127.0.0.1:4135' disco
nnected before sending "Message::ChallengeResponse"                                                                                                  
2024-06-14T02:51:32.619904Z ERROR tcp{name="0"}: snarkos_node_tcp::tcp: Unable to initiate a connection with 127.0.0.1:4135: '127.0.0.1:4135' disconn
ected before sending "Message::ChallengeResponse"                                                                                                    
2024-06-14T02:51:57.620353Z DEBUG tcp{name="0"}: snarkos_node_tcp::tcp: establishing connection with 127.0.0.1:4135; the peer is connected on port 48
370                                                                                                                                                  
2024-06-14T02:51:57.620669Z DEBUG tcp{name="0"}: snarkos_node_tcp::protocols::handshake: shaking hands with 127.0.0.1:4135 as the Initiator          
2024-06-14T02:51:57.620922Z ERROR tcp{name="0"}: snarkos_node_tcp::protocols::handshake: handshake with 127.0.0.1:4135 failed: '127.0.0.1:4135' disco
nnected before sending "Message::ChallengeResponse"                                                                                                  
2024-06-14T02:51:57.620979Z ERROR tcp{name="0"}: snarkos_node_tcp::tcp: Unable to initiate a connection with 127.0.0.1:4135: '127.0.0.1:4135' disconn
ected before sending "Message::ChallengeResponse"                                                                                                    
2024-06-14T02:52:22.621120Z DEBUG tcp{name="0"}: snarkos_node_tcp::tcp: establishing connection with 127.0.0.1:4135; the peer is connected on port 49
138                                                                                                                                                  
2024-06-14T02:52:22.621373Z DEBUG tcp{name="0"}: snarkos_node_tcp::protocols::handshake: shaking hands with 127.0.0.1:4135 as the Initiator          
2024-06-14T02:52:22.621566Z ERROR tcp{name="0"}: snarkos_node_tcp::protocols::handshake: handshake with 127.0.0.1:4135 failed: '127.0.0.1:4135' disco
nnected before sending "Message::ChallengeResponse"                                                                                                  
2024-06-14T02:52:22.621616Z ERROR tcp{name="0"}: snarkos_node_tcp::tcp: Unable to initiate a connection with 127.0.0.1:4135: '127.0.0.1:4135' disconn
ected before sending "Message::ChallengeResponse"

I've taken a look at this and I agree, this isn't ideal. However, finding a good solution isn't straightforward and isn't only a technical decision. For now, here are some thoughts and observations (cc @vicsn).

On the gateway side, we don't have any notion of a restricted peer. Designing one raises questions about how to handle malicious behaviour in the validator set (aka slashing) and this is something we don't have a fleshed out solution for yet.

On the router side, we do have a restricted peer set. Unfortunately, as you point out in your description, we check that set after the ChallengeRequest is read from the network. This is not accidental: we rely on that message to learn the peer's listening port. We have the IP from the connection but being on the responder side, the port the peer opens the connection on will be ephemeral (i.e. different with every new connection attempt) and so isn't a reliable way to identify the peer. We could restrict the IP without considering the port but what if there are multiple nodes behind the same proxy? In other words, we may need to make a tradeoff here.

Overall, I'd be in favour of integrating these cases into the design for slashing so we have a consistent solution over both network stacks. Edit: it was brought to my attention that this may be a non-attributable attack and so slashing for it may not be ideal.

Thank you for the thorough review @niklaslong ! Yeah this won't be fixable with slashing. We could tackle the Router issue first, assume good intent from validators for now.

We could restrict the IP without considering the port but what if there are multiple nodes behind the same proxy?

I may be missing networking knowledge, but if your peers are listening on the same IP as a malicious spamming peer, I would assume its their fault? And we should just block the whole IP? Could you briefly look into whether Bitcoin/Ethereum block entire IP addresses?

Ethereum seems to ban the ip address, feasible in our case if we're fine with the implications this has for proxied pools of nodes.

Sounds good @niklaslong

@elderhammer unfortunately without demonstration that 1-2 attackers can actually fully DoS a validator, this is not a valid P1/P2, but I can imagine a small reward is appropriate for the great bug find and discussion.

Sounds good @niklaslong

@elderhammer unfortunately without demonstration that 1-2 attackers can actually fully DoS a validator, this is not a valid P1/P2, but I can imagine a small reward is appropriate for the great bug find and discussion.

@vicsn What does '1-2 attackers' mean? Can you elaborate?
From the previous test log, it can be confirmed that DDoS attacks can be carried out on victims only through different ports under the same IP.

The num_connecting function counts the number of IP+Port, not just the number of IPs.

/// A set of connections that have not been finalized yet.
connecting: Mutex<HashSet<SocketAddr>>,

snarkOS/node/tcp/src/tcp.rs

Lines 147 to 150 in 09aa62b

/// Returns the number of connections that are currently being set up.
pub fn num_connecting(&self) -> usize {
self.connecting.lock().len()
}

snarkOS/node/tcp/src/tcp.rs

Lines 409 to 425 in 09aa62b

/// Checks whether the `Tcp` can handle an additional connection.
fn can_add_connection(&self) -> bool {
// Retrieve the number of connected peers.
let num_connected = self.num_connected();
// Retrieve the maximum number of connected peers.
let limit = self.config.max_connections as usize;
if num_connected >= limit {
warn!(parent: self.span(), "Maximum number of active connections ({limit}) reached");
false
} else if num_connected + self.num_connecting() >= limit {
warn!(parent: self.span(), "Maximum number of active & pending connections ({limit}) reached");
false
} else {
true
}
}

What does '1-2 attackers' mean? Can you elaborate?

In other words: my understanding is P1/P2 rewards are reserved if a single attacker would be able to bring down a validator. But given that validators have a hardcoded set of friendly peers, if they additionally connect with a single maliciour peer or even a few malicious peers, it would cost them some resources and hurt connectivity, but its not clear it would actually halt the network.

In other words: my understanding is P1/P2 rewards are reserved if a single attacker would be able to bring down a validator. But given that validators have a hardcoded set of friendly peers, if they additionally connect with a single maliciour peer or even a few malicious peers, it would cost them some resources and hurt connectivity, but its not clear it would actually halt the network.

Ok, I get it.

validators have a hardcoded set of friendly peers

Do you mean the peer address set by the startup command?
If so, unfortunately, this will not prevent spam. Because the attack is done before ensure_peer_is_allowed, verify_challenge_request.

Let me share my experiment:

  • The current committee consists of 4 validators.
  • An honest validators are ready to restart.
  • Before the restart, the malicious validator stopped participating in consensus and used this vulnerability to attack committee members (because he is a validator, he knows the addresses of other validators).
  • After the restart, the honest validator cannot establish a connection with other validators that were attacked by spam, and therefore cannot participate in the consensus process.
  • Since this honest validator does not participate in the consensus process, there are only 2 valid validators in the network.
  • The consensus process could not reach a quorum and the network stopped generating blocks.

This experiment shows that it is possible for the network to be disrupted by spam attacks.
Attack strategy:

  • Block new connections of validators through spam
  • Restarted validators or new validators cannot participate in consensus, resulting in the consensus process failing to meet the quorum

Please correct me if I missed anything in my experiment.

Thank you for correcting me @elderhammer . Reformulating my understanding of the assumptions for a valid attack on validators:

  1. It can be performed by anyone who has access to a validator's (public) IP
  2. It messes with a victim's ability to reconnect, which would mainly be a problem after a restart.

However, its also very observable and preventable if the validator in question is e.g. hiding behind a well calibrated firewall or has good alerts on their number of connections. Given that the most vulnerable moment is during a restart, the main impact is delay of reconnection.

Updating my personal recommendation to P2 / High

This issue is not especially difficult to avoid, but all the solutions are somewhat "heavy-handed"; the following is their list ordered by their effectiveness and performance (descending):

  1. OS-level (or even higher) network restrictions
  • this is the ultimate solution; the downside is that setting it up would have to be the responsibility of node operators, and the exact details would be at their discretion; that being said, this is likely to be utilized regardless
  1. turning off the node's listener for some amount of time whenever the connection limit is reached or some heuristic is triggered
  • this is the strictest measure that can be applied on the node level; it would not be possible to connect to the node at all during that time (which would also guard it against a DDoS vector), but it would remain free to start its own connections; precise conditions required to trigger it would need to be established in order to not do it needlessly
  1. IP (without port) banning on the node level
  • a fairly simple approach that can be applied at the very beginning of the node-level handshake; the question is if this would supplement or supersede the current restricted peer handling, and sadly it does nothing against a DDoS vector

Yes, I also think it should be up to the node operator to handle this by limiting network connections as appropriate.
So, should I close this issue now? @ljedrz

So, should I close this issue now? @ljedrz

@vicsn your call; we should probably adjust the README to include recommendations regarding network config, but we can also be stricter on the node level.

commented

I'm a fan of option 3. IP (without port) banning on the node level, because its good to make node operator's lives easy, simple predictable behaviour is good, and because Bitcoin/Ethereum also take this approach. In addition, I'll follow up with people to make sure there's good recommendations for node operators regarding OS-level network restrictions.

Let's keep this issue open until someone has the time to implement option 3.

the question is if this would supplement or supersede the current restricted peer handling

My current intuition says "supplement", but I think its largely up to the discretion of the implementor to see what makes most sense.

I would agree with @vicsn that we should take option 3 seriously. IMO Option 1 should still be done, but just the existence of a ban-list enforces to malicious parties that there are mechanisms for the nodes to protect themselves.

A banlist approach is a good starting point for DoS protection and can easily be expanded further if it becomes necessary.