hashgraph / hedera-services

Crypto, token, consensus, file, and smart contract services for the Hedera public ledger

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

JRS tests are failing with branching after restart

imalygin opened this issue · comments

Description

There are several tests failing relatively consistently with branching after restart
Here are the examples
Crypto-Update-Setting-1.5k-15m - data directory
Crypto-Update-Jar-1.5k-25m - data directory

@alittley :

It can be explained with the following sequence of events:
1. node3 creates an event X, which is gossiped out and written to PCES by at least 1 node in the network, but NOT by node3
2. the network is restarted without a freeze
3. node3 comes back online well before other nodes, and thus cannot gossip for the full duration of the OBSERVING status, as is otherwise intended
4. node3 transitions to CHECKING, having not yet received event X
5. node3 creates event X' and gossips it out
6. the network (including node3 itself) correctly determines that node3 branched

Steps to reproduce

Run Crypto-Update-Setting-1.5k-15m or Crypto-Update-Jar-1.5k-25m

Additional context

No response

Hedera network

other

Version

v0.51

Operating system

None

commented

My original description (duplicated in the body of the ticket) may not be correct. Another test failure, where the branching still occurs shortly after startup, but it isn't the first node to reach CHECKING that starts branching. By the time the node branches, all nodes in the network had already been gossiping for some time, so I'm not sure why the branching node didn't hear about its previous self event.

commented

Another addition:

I don't think nodes not freezing before an update is actually playing a role. I found the pre-update logs for these tests (which I didn't previously know existed), and nodes are freezing before restarting.

Instead, I now suspect that it's just post-freeze signature events that are causing the branching.