Transaction that is added to a node's pool may not be propagated to the pools of its peer nodes
nano-adhara opened this issue · comments
Introduction
In certain situations, a transaction that is added to a node's pool (of sequenced type) may not be propagated to the pools of its peer nodes, resulting in the transaction being present only in the transaction receptor node's pool rather than being distributed across all nodes' pools.
Summary
We are using the following five nodes setup for a private Besu network with QBFT protocol:
- A non-validator node that doesn’t participate in the consensus protocol.
- One node acting as a validator.
Users connect to the non-validator node to send transactions, and it propagates them to the validator so that they are included in new blocks.
Starting from a scenario where all the nodes have a transaction (let’s call it “tx_a”) in the pool, as shown in the first image.
If the transaction is dropped from the pool of the nodes for any reason, the pools of the nodes are empty as we can see in the second image.
Then, if the transaction is sent again to the non-validator, it is added to its pool, but it is not added to the validator pool as we can see in the third image.
Detailed description
We have observed 2 problematic scenarios related to how the nodes handle their caches:
- When a client sends a transaction to the non-validator node, if this node dropped the transaction before, the node doesn't propagate the transaction to the peers in the network. If the transaction is not propagated to a validator, it will never be mined.
- When a peer propagates a transaction to the validator node, if this node has already received the transaction from a peer, the transaction is not promoted to the transactions pool. If the transaction is not promoted to the transactions pool, it will never be mined.
Both scenarios are problematic if the transaction is valid but the node dropped it from the pool before mining the transaction for any reason, e.g. exceeding tx-pool-retention-hours
limit.
We were able to add that transaction again to the validator’s pool only by restarting all the nodes and sending the transaction again.
It seems to be a bug by which internal caches are not updated properly when transactions are dropped/evicted from the pool, not allowing those transactions to be added into new blocks as they are not propagated by the non-validator node and they are rejected by the validator as well.
Versions
This issue occurs in the latest version of the system, as of the time of this writing (v24.3.3). We believe that this occurs in previous versions as well.
Steps to reproduce
This is an explanation of how we were able to reproduce the bug:
1 . Set up at least one non-validator node and at least one validator node with a QBFT network configuration.
3. Send a new transaction using a higher nonce than the expected one, so a nonce gap is created, and that transaction will remain in the pool of both the non-validator and validator nodes.
4. Force the transaction to be dropped from the pool (for example waiting for the tx-pool-retention-hours to expire).
5. Resend the transaction to the non-validator node and you will be able to check that it won’t appear in the validator’s pool (as it does not get propagated by the non-validator nor accepted by the validator because it gets filtered due to be categorized as an already seen transaction).
Expected behavior
✅ = already happening
❌ = not happening
- The user sends a transaction, which is received by the non-validator node and added to the pool ✅
- The transaction is propagated to the validator nodes. ✅
- The validator nodes accept the transaction and add it to their pool. ✅
- Every node drops the transaction from its pool. ✅
- The user resends the transaction, which is received by the non-validator node and added to the pool. ✅
- The same as step 2. ❌
- The same as step 3. ❌
Nodes execution arguments
--tx-pool-max-size=5
--tx-pool=sequenced
--tx-pool-limit-by-account-percentage=1
Nodes config
Non-validator node
# Network
p2p-host="127.0.0.1"
p2p-port=1232
max-peers=42
rpc-http-enabled=true
rpc-http-api=["ETH","NET","WEB3","IBFT","QBFT","TXPOOL","ADMIN"]
host-whitelist=["*"]
rpc-http-cors-origins=["all"]
rpc-http-host="0.0.0.0"
rpc-http-port=8545
rpc-ws-enabled=true
rpc-ws-host="0.0.0.0"
rpc-ws-port=30303
# Mining
miner-enabled=true
miner-coinbase="0xfe3b557e8fb62b89f4916b721be55ceb828dbd73"
min-gas-price="0"
revert-reason-enabled=true
metrics-category=[ "ETHEREUM", "BLOCKCHAIN","EXECUTORS","JVM","NETWORK","PEERS","PROCESS","KVSTORE_ROCKSDB","KVSTORE_ROCKSDB_STATS","RPC","SYNCHRONIZER", "TRANSACTION_POOL" ]
metrics-enabled=true
metrics-host="0.0.0.0"
metrics-port=9095
Validator node config
# Network
p2p-host="127.0.0.1"
p2p-port=1234
max-peers=42
rpc-http-enabled=true
rpc-http-api=["ETH","NET","WEB3","IBFT","QBFT","TXPOOL"]
host-whitelist=["*"]
rpc-http-cors-origins=["all"]
rpc-http-host="0.0.0.0"
rpc-http-port=8585
rpc-ws-enabled=true
rpc-ws-host="0.0.0.0"
rpc-ws-port=30305
# Mining
miner-enabled=true
miner-coinbase="0xfe3b557e8fb62b89f4916b721be55ceb828dbd73"
min-gas-price="0"
revert-reason-enabled=true
metrics-category=[ "ETHEREUM", "BLOCKCHAIN","EXECUTORS","JVM","NETWORK","PEERS","PROCESS","KVSTORE_ROCKSDB","KVSTORE_ROCKSDB_STATS","RPC","SYNCHRONIZER", "TRANSACTION_POOL" ]
metrics-enabled=true
metrics-host="0.0.0.0"
metrics-port=9097
genesis.json
{
"config": {
"muirGlacierBlock": 0,
"chainId": 44844,
"contractSizeLimit": 2147483647,
"qbft": {
"blockperiodseconds": 1,
"epochlength": 30000,
"requesttimeoutseconds": 10
}
},
"nonce": "0x0",
"timestamp": "0x58ee40ba",
"gasLimit": "0x5F5E100",
"difficulty": "0x1",
"mixHash": "0x63746963616c2062797a616e74696e65206661756c7420746f6c6572616e6365",
"coinbase": "0x0000000000000000000000000000000000000000",
"alloc": {
"fe3b557e8fb62b89f4916b721be55ceb828dbd73": {
"privateKey": "8f2a55949038a9610f50fb23b5883af3b4ecb3c3bb792cbcefbd1542c692be63",
"comment": "private key and this comment are ignored. In a real chain, the private key should NOT be stored",
"balance": "0xad78ebc5ac6200000"
},
"627306090abaB3A6e1400e9345bC60c78a8BEf57": {
"privateKey": "c87509a1c067bbde78beb793e6fa76530b6382a4c0241e5e4a9ec0a0f44dc0d3",
"comment": "private key and this comment are ignored. In a real chain, the private key should NOT be stored",
"balance": "90000000000000000000000"
},
"f17f52151EbEF6C7334FAD080c5704D77216b732": {
"privateKey": "ae6ae8e5ccbfb04590405997ee2d52d2b330726137b875053c36d94e974d162f",
"comment": "private key and this comment are ignored. In a real chain, the private key should NOT be stored",
"balance": "90000000000000000000000"
}
},
"extraData": "0xf87aa00000000000000000000000000000000000000000000000000000000000000000f85494792fc5093a85bd8fb52c781aefee7da96d2180cf9414275b2f4cefb4c72f12ca32ce0044578923e1b694fcbc96c1e8a673b7cdc333c6687f07fa2c28befe94b5ec93ab0a6ad8f8c0e404e3225b16cb2ea23a1ec080c0"
}
I added the node execution arguments into the issue. This is using the sequenced node pool
Hi @nano-adhara, I see where the issue is coming from, Besu has caches to remember txs exchanges with other peers, basically because we want to avoid to resend a tx multiple time to a peer (actually also the p2p protocol state that peers resending txs should be disconnected) and we also want to avoid reprocessing a tx that we have already seen.
Those caches are quite basic a the moment, so they could be improved to better handle scenarios like the one you are reporting, and I could take a look at them, trying to remove from these cache, valid txs that are dropped from the txpool, this needs to be done with care to avoid being exposed to attacks.
As last a question about your network, is it common for a valid tx to stay in the pool for so long that it could be evicted by the timer?
Hello @fab-10,
So as an answer to your last question, is not happening that much, once a month, but the real problem with that is that whenever happens, the only solution is to restart the besu clients.
Ill include Fernando @chookly314 and Coenie @coeniebeyers in the topic to move forward in the conversation.