PriFi non-restart bug located
lbarman opened this issue · comments
After a client goes into Airplane mode, everything is stuck.
Reason: Relay/trustees believe the protocol is still running
E : (relay.(*PriFiLibRelayInstance).checkIfRoundHasEndedAfterTimeOut_Phase1: 41) - MAX_NUMBER_OF_CONSECUTIVE_FAILED_ROUNDS ( 3 ) reached, killing protocol.
3 : (relay.(*PriFiLibRelayInstance).checkIfRoundHasEndedAfterTimeOut_Phase1: 44) - Stopping experiment, if any.
E : ( services.(*ServiceState).NetworkErrorHappened: 109) - NetworkErrorHappened, dumping trace
goroutine 677 [running]:
runtime/debug.Stack(0x1, 0xc42020cc50, 0x5422c8)
/usr/local/go/src/runtime/debug/stack.go:24 +0xa7
runtime/debug.PrintStack()
/usr/local/go/src/runtime/debug/stack.go:16 +0x22
github.com/dedis/prifi/sda/services.(*ServiceState).NetworkErrorHappened(0xc4200a5540, 0x0)
/home/prifi/go/src/github.com/dedis/prifi/sda/services/prifi.go:110 +0x74
github.com/dedis/prifi/sda/services.(*ServiceState).handleTimeout(0xc4200a5540, 0xc4202c98a0, 0x1, 0x1, 0xc3c628, 0x0, 0x0)
/home/prifi/go/src/github.com/dedis/prifi/sda/services/prifi.go:101 +0x34
github.com/dedis/prifi/sda/services.(*ServiceState).(github.com/dedis/prifi/sda/services.handleTimeout)-fm(0xc4202c98a0, 0x1, 0x1, 0xc3c628, 0x0, 0x0)
/home/prifi/go/src/github.com/dedis/prifi/sda/services/config.go:90 +0x66
github.com/dedis/prifi/sda/protocols.(*PriFiSDAProtocol).handleTimeout(0xc420276000, 0xc4204a60c0, 0x1, 0x1, 0xc3c628, 0x0, 0x0)
/home/prifi/go/src/github.com/dedis/prifi/sda/protocols/protocol.go:158 +0x250
github.com/dedis/prifi/sda/protocols.(*PriFiSDAProtocol).(github.com/dedis/prifi/sda/protocols.handleTimeout)-fm(0xc4204a60c0, 0x1, 0x1, 0xc3c628, 0x0, 0x0)
/home/prifi/go/src/github.com/dedis/prifi/sda/protocols/prifi.go:118 +0x66
github.com/dedis/prifi/prifi-lib/relay.(*PriFiLibRelayInstance).checkIfRoundHasEndedAfterTimeOut_Phase1(0xc42017c880, 0xc400000222)
/home/prifi/go/src/github.com/dedis/prifi/prifi-lib/relay/timeouts.go:46 +0x589
created by github.com/dedis/prifi/prifi-lib/relay.(*PriFiLibRelayInstance).downstreamPhase1_openRoundAndSendData
/home/prifi/go/src/github.com/dedis/prifi/prifi-lib/relay/relay.go:602 +0x760
E : ( services.(*ServiceState).NetworkErrorHappened: 121) - A network error occurred with node <nil> , warning other clients.
1 : ( services.(*ServiceState).StopPriFiCommunicateProtocol: 175) - Stopping PriFi protocol
I : ( services.(*ServiceState).connectToTrustees: 199) - connectToTrustees still alive, Protocol running true
I : ( services.(*ServiceState).connectToTrustees: 199) - connectToTrustees still alive, Protocol running true
Update: it's a concurrency issue.
Thread 1 is still processing a message, timeout happens on Thread 2, which tries to call "Stop()" on thread 1
1 @ 0x545ee8 0x545cf0 0x542834 0x819c3a 0x81cd5a 0x817f0d 0x81976f 0x819634 0x81caf6 0x8026a0 0x805e26 0x7f9c59 0x459871
# 0x545ee7 runtime/pprof.writeRuntimeProfile+0x97 /usr/local/go/src/runtime/pprof/pprof.go:679
# 0x545cef runtime/pprof.writeGoroutine+0x9f /usr/local/go/src/runtime/pprof/pprof.go:641
# 0x542833 runtime/pprof.(*Profile).WriteTo+0x3e3 /usr/local/go/src/runtime/pprof/pprof.go:310
# 0x819c39 github.com/dedis/prifi/sda/services.(*ServiceState).StopPriFiCommunicateProtocol+0xb9 /home/prifi/go/src/github.com/dedis/prifi/sda/services/prifi.go:181
# 0x81cd59 github.com/dedis/prifi/sda/services.(*ServiceState).StopPriFiCommunicateProtocol-fm+0x29 /home/prifi/go/src/github.com/dedis/prifi/sda/services/service.go:149
# 0x817f0c github.com/dedis/prifi/sda/services.(*churnHandler).handleUnknownDisconnection+0xbc /home/prifi/go/src/github.com/dedis/prifi/sda/services/churn.go:257
# 0x81976e github.com/dedis/prifi/sda/services.(*ServiceState).NetworkErrorHappened+0x11e /home/prifi/go/src/github.com/dedis/prifi/sda/services/prifi.go:127
# 0x819633 github.com/dedis/prifi/sda/services.(*ServiceState).handleTimeout+0x33 /home/prifi/go/src/github.com/dedis/prifi/sda/services/prifi.go:106
# 0x81caf5 github.com/dedis/prifi/sda/services.(*ServiceState).(github.com/dedis/prifi/sda/services.handleTimeout)-fm+0x65 /home/prifi/go/src/github.com/dedis/prifi/sda/services/config.go:90
# 0x80269f github.com/dedis/prifi/sda/protocols.(*PriFiSDAProtocol).handleTimeout+0x24f /home/prifi/go/src/github.com/dedis/prifi/sda/protocols/protocol.go:158
# 0x805e25 github.com/dedis/prifi/sda/protocols.(*PriFiSDAProtocol).(github.com/dedis/prifi/sda/protocols.handleTimeout)-fm+0x65 /home/prifi/go/src/github.com/dedis/prifi/sda/protocols/prifi.go:118
# 0x7f9c58 github.com/dedis/prifi/prifi-lib/relay.(*PriFiLibRelayInstance).checkIfRoundHasEndedAfterTimeOut_Phase1+0x588 /home/prifi/go/src/github.com/dedis/prifi/prifi-lib/relay/timeouts.go:46
I : ( services.(*ServiceState).IsPriFiProtocolRunning: 43) - IsPriFiProtocolRunning not nil, has stopped false
1 : ( services.(*ServiceState).StopPriFiCommunicateProtocol: 187) - PriFi Running, continuing stop attempt...
1 : ( services.(*ServiceState).StopPriFiCommunicateProtocol: 191) - PriFi not nil, calling stop...
but gets put on hold in
p.relayState.processingLock.Lock()
defer p.relayState.processingLock.Unlock()
We need to find another way to stop the protocol without having it handle a message
Bug fixed. I had to rewrite the history (my mistake - this should not happen).
If git pull
fails and tells you to merge, don't do it. Instead, do:
git fetch --all
git reset --hard origin/prifi-android
(careful: this does not touches your untracked file, but resets all modified files)
I am working on the client side.
The client doesn't reconnect after recovering the connectivity.