Race condition in capnp
hspaay opened this issue · comments
Problem: Running go test -race on my modules sometimes fails with a race condition in go-capnproto2's transport.
To reproduce:
The offending code was extracted and put into a simpler repo here: github.com/hiveot/racetest.
This repo has 2 tests, 'race' and 'state'. They each show different problems, although it is possible that 'race''s problem is the result of simplification.
Setup the tests:
git clone github.com/hiveot/racetest
cd racetest
make capnp
Race Package
To reproduce, run this a few times. It usually shows up within 3 attempts:
go test -race pkg/race/Race_test.go
Output of 'race':
henk@msi:~/dev/hiveot/racetest$ go test -race pkg/race/Race_test.go
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x6eb3ba]
goroutine 148 [running]:
capnproto.org/go/capnp/v3/rpc/transport.(*ctxReader).leakyRead.func1()
/home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.5/rpc/transport/transport.go:348 +0x5a
created by capnproto.org/go/capnp/v3/rpc/transport.(*ctxReader).leakyRead
/home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.5/rpc/transport/transport.go:347 +0x16c
FAIL command-line-arguments 3.027s
FAIL
State Package
The 'state' package intermittently shows a race. Usually it shows up within 10 attempts.
go test -race pkg/state/State_test.go
Output of 'state':
henk@msi:~/dev/hiveot/racetest$ go test -race pkg/state/State_test.go
==================
WARNING: DATA RACE
Read at 0x00c00015e7c0 by goroutine 37:
capnproto.org/go/capnp/v3.(*Segment).slice()
/home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.5/segment.go:49 +0x5a
capnproto.org/go/capnp/v3.(*Segment).readUint64()
/home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.5/segment.go:65 +0x51
capnproto.org/go/capnp/v3.(*Segment).readRawPointer()
/home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.5/segment.go:69 +0x50
capnproto.org/go/capnp/v3.(*Segment).resolveFarPointer()
/home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.5/segment.go:235 +0x4f
capnproto.org/go/capnp/v3.(*Segment).readPtr()
/home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.5/segment.go:116 +0x75
capnproto.org/go/capnp/v3.Struct.Ptr()
/home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.5/struct.go:109 +0x119
capnproto.org/go/capnp/v3.Transform()
/home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.5/answer.go:548 +0x2ed
capnproto.org/go/capnp/v3.resolution.ptr()
/home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.5/answer.go:574 +0xfc
capnproto.org/go/capnp/v3.resolution.client()
/home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.5/answer.go:589 +0x96
capnproto.org/go/capnp/v3.(*Answer).PipelineRecv()
/home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.5/answer.go:375 +0x3f4
capnproto.org/go/capnp/v3.(*Answer).PipelineRecv-fm()
<autogenerated>:1 +0xc7
capnproto.org/go/capnp/v3/server.queueCaller.PipelineRecv()
/home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.5/server/answer.go:162 +0x2c2
capnproto.org/go/capnp/v3/server.(*answerQueue).PipelineRecv()
/home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.5/server/answer.go:136 +0xc9
capnproto.org/go/capnp/v3/rpc.(*Conn).handleCall()
/home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.5/rpc/rpc.go:795 +0x19b2
capnproto.org/go/capnp/v3/rpc.(*Conn).receive()
/home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.5/rpc/rpc.go:491 +0x40e
capnproto.org/go/capnp/v3/rpc.(*Conn).receive-fm()
<autogenerated>:1 +0x39
capnproto.org/go/capnp/v3/rpc.(*Conn).backgroundTask.func1()
/home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.5/rpc/rpc.go:191 +0x92
golang.org/x/sync/errgroup.(*Group).Go.func1()
/home/henk/go/pkg/mod/golang.org/x/sync@v0.0.0-20201020160332-67f06af15bc9/errgroup/errgroup.go:57 +0x91
Previous write at 0x00c00015e7c0 by goroutine 28:
capnproto.org/go/capnp/v3.alloc()
/home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.5/message.go:351 +0x1d1
capnproto.org/go/capnp/v3.NewCompositeList()
/home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.5/list.go:62 +0xe4
capnproto.org/go/capnp/v3/std/capnp/rpc.NewCapDescriptor_List()
/home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.5/std/capnp/rpc/rpc.capnp.go:2443 +0xce
capnproto.org/go/capnp/v3/std/capnp/rpc.Payload.NewCapTable()
/home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.5/std/capnp/rpc/rpc.capnp.go:2215 +0x84
capnproto.org/go/capnp/v3/rpc.(*Conn).fillPayloadCapTable()
/home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.5/rpc/export.go:183 +0x14f
capnproto.org/go/capnp/v3/rpc.(*answer).sendReturn()
/home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.5/rpc/answer.go:236 +0x198
capnproto.org/go/capnp/v3/rpc.(*answer).Return()
/home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.5/rpc/answer.go:201 +0x291
capnproto.org/go/capnp/v3/server.(*Server).handleCall()
/home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.5/server/server.go:217 +0x287
capnproto.org/go/capnp/v3/server.(*Server).handleCalls.func2()
/home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.5/server/server.go:182 +0x84
capnproto.org/go/capnp/v3/server.(*Server).handleCalls()
/home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.5/server/server.go:183 +0x1b6
capnproto.org/go/capnp/v3/server.New.func1()
/home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.5/server/server.go:122 +0x58
Goroutine 37 (running) created at:
golang.org/x/sync/errgroup.(*Group).Go()
/home/henk/go/pkg/mod/golang.org/x/sync@v0.0.0-20201020160332-67f06af15bc9/errgroup/errgroup.go:54 +0xee
capnproto.org/go/capnp/v3/rpc.NewConn()
/home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.5/rpc/rpc.go:157 +0x7d1
racetest/pkg/caphelp.CapServe.func2()
/home/henk/dev/hiveot/racetest/pkg/caphelp/CapServe.go:53 +0x2c4
Goroutine 28 (running) created at:
capnproto.org/go/capnp/v3/server.New()
/home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.5/server/server.go:122 +0x594
racetest/capnp/go/api.CapState_NewServer()
/home/henk/dev/hiveot/racetest/capnp/go/api/State.capnp.go:63 +0xc9
racetest/capnp/go/api.CapState_ServerToClient()
/home/henk/dev/hiveot/racetest/capnp/go/api/State.capnp.go:69 +0x36
racetest/pkg/state/capnpserver.StartStateCapnpServer()
/home/henk/dev/hiveot/racetest/pkg/state/capnpserver/StateCapnpServer.go:50 +0x166
command-line-arguments_test.createStateStore.func1()
/home/henk/dev/hiveot/racetest/pkg/state/State_test.go:37 +0x6f
==================
--- FAIL: TestGetSet (1.02s)
testing.go:1312: race detected during execution of test
FAIL
FAIL command-line-arguments 3.035s
FAIL
@hspaay Can you try upgrading to alpha 7? I think this was fixed.
The race test sometimes errors out with a nil exception.
In one captured example this shows up in transport.go:358 (alpha-5)
n, err := cr.Reader.Read(cr.buf[:max])
cr.Reader is nil
At the same time transport.go:130 (function NewMessage) tries to Encode the message:
if err = s.c.Encode(ctx, msg); err != nil {
s.c.wc.WriteClose is also nil.
@lthibault speaking of race conditions, lol.
I'll upgrade to alpha-7.
Both race and state tests fail on alpha-7
Interesting! Thanks for the update.
Let's see if #318 fixes it. Should land in a few days time.
Follow-up question: which test triggers the race condition? Could this issue be a duplicate of #301?
Yeah, I think this is probably the same as #301, but great that we apparently have the ability to semi-reproduce it now.
but great that we apparently have the ability to semi-reproduce it now
Yes! Many thanks for this @hspaay !!
Glad to be of help. Just let me know when you have a potential fix. It should be easy to verify now.
@hspaay Can you try this again with the latest tagged version?
@lthibault unfortunately the race is still there using: capnproto.org/go/capnp/v3 v3.0.0-alpha.8
go test -race -failfast -p 1 ./pkg/...
? racetest/pkg/caphelp [no test files]
? racetest/pkg/kvstore [no test files]
? racetest/pkg/listener [no test files]
? racetest/pkg/logging [no test files]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x6ea77a]
goroutine 138 [running]:
capnproto.org/go/capnp/v3/rpc/transport.(*ctxReader).leakyRead.func1()
/home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.8/rpc/transport/transport.go:350 +0x5a
created by capnproto.org/go/capnp/v3/rpc/transport.(*ctxReader).leakyRead
/home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.8/rpc/transport/transport.go:349 +0x16c
FAIL racetest/pkg/race 0.026s
? racetest/pkg/race/capnpclient [no test files]
? racetest/pkg/race/capnpserver [no test files]
? racetest/pkg/race/service [no test files]
==================
WARNING: DATA RACE
Write at 0x00c00019a610 by goroutine 28:
capnproto.org/go/capnp/v3.alloc()
/home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.8/message.go:351 +0x1d1
capnproto.org/go/capnp/v3.NewCompositeList()
/home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.8/list.go:62 +0xe4
capnproto.org/go/capnp/v3/std/capnp/rpc.NewCapDescriptor_List()
/home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.8/std/capnp/rpc/rpc.capnp.go:2443 +0xce
capnproto.org/go/capnp/v3/std/capnp/rpc.Payload.NewCapTable()
/home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.8/std/capnp/rpc/rpc.capnp.go:2215 +0x84
capnproto.org/go/capnp/v3/rpc.(*Conn).fillPayloadCapTable()
/home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.8/rpc/export.go:183 +0x14f
capnproto.org/go/capnp/v3/rpc.(*answer).sendReturn()
/home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.8/rpc/answer.go:236 +0x198
capnproto.org/go/capnp/v3/rpc.(*answer).Return()
/home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.8/rpc/answer.go:201 +0x291
capnproto.org/go/capnp/v3/server.(*Server).handleCall()
/home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.8/server/server.go:214 +0x267
capnproto.org/go/capnp/v3/server.(*Server).handleCalls.func2()
/home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.8/server/server.go:182 +0x84
capnproto.org/go/capnp/v3/server.(*Server).handleCalls()
/home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.8/server/server.go:183 +0x1b6
capnproto.org/go/capnp/v3/server.New.func1()
/home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.8/server/server.go:122 +0x58
Previous read at 0x00c00019a610 by goroutine 37:
capnproto.org/go/capnp/v3.(*Segment).slice()
/home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.8/segment.go:49 +0x5a
capnproto.org/go/capnp/v3.(*Segment).readUint64()
/home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.8/segment.go:65 +0x51
capnproto.org/go/capnp/v3.(*Segment).readRawPointer()
/home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.8/segment.go:69 +0x50
capnproto.org/go/capnp/v3.(*Segment).resolveFarPointer()
/home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.8/segment.go:235 +0x4f
capnproto.org/go/capnp/v3.(*Segment).readPtr()
/home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.8/segment.go:116 +0x75
capnproto.org/go/capnp/v3.Struct.Ptr()
/home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.8/struct.go:109 +0x119
capnproto.org/go/capnp/v3.Transform()
/home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.8/answer.go:559 +0x2ed
capnproto.org/go/capnp/v3.resolution.ptr()
/home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.8/answer.go:585 +0xfc
capnproto.org/go/capnp/v3.resolution.client()
/home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.8/answer.go:600 +0x96
capnproto.org/go/capnp/v3.(*Answer).PipelineRecv()
/home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.8/answer.go:386 +0x3f4
capnproto.org/go/capnp/v3.(*Answer).PipelineRecv-fm()
<autogenerated>:1 +0xc7
capnproto.org/go/capnp/v3/server.queueCaller.PipelineRecv()
/home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.8/server/answer.go:162 +0x2c2
capnproto.org/go/capnp/v3/server.(*answerQueue).PipelineRecv()
/home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.8/server/answer.go:136 +0xc9
capnproto.org/go/capnp/v3/rpc.(*Conn).handleCall()
/home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.8/rpc/rpc.go:801 +0x1932
capnproto.org/go/capnp/v3/rpc.(*Conn).receive()
/home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.8/rpc/rpc.go:505 +0x40e
capnproto.org/go/capnp/v3/rpc.(*Conn).receive-fm()
<autogenerated>:1 +0x39
capnproto.org/go/capnp/v3/rpc.(*Conn).backgroundTask.func1()
/home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.8/rpc/rpc.go:191 +0x92
golang.org/x/sync/errgroup.(*Group).Go.func1()
/home/henk/go/pkg/mod/golang.org/x/sync@v0.1.0/errgroup/errgroup.go:75 +0x86
Goroutine 28 (running) created at:
capnproto.org/go/capnp/v3/server.New()
/home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.8/server/server.go:122 +0x594
racetest/capnp/go/api.CapState_NewServer()
/home/henk/dev/hiveot/racetest/capnp/go/api/State.capnp.go:63 +0xc9
racetest/capnp/go/api.CapState_ServerToClient()
/home/henk/dev/hiveot/racetest/capnp/go/api/State.capnp.go:69 +0x36
racetest/pkg/state/capnpserver.StartStateCapnpServer()
/home/henk/dev/hiveot/racetest/pkg/state/capnpserver/StateCapnpServer.go:50 +0x166
racetest/pkg/state_test.createStateStore.func1()
/home/henk/dev/hiveot/racetest/pkg/state/State_test.go:37 +0x6f
Goroutine 37 (running) created at:
golang.org/x/sync/errgroup.(*Group).Go()
/home/henk/go/pkg/mod/golang.org/x/sync@v0.1.0/errgroup/errgroup.go:72 +0x12e
capnproto.org/go/capnp/v3/rpc.NewConn()
/home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.8/rpc/rpc.go:157 +0x7d1
racetest/pkg/caphelp.CapServe.func2()
/home/henk/dev/hiveot/racetest/pkg/caphelp/CapServe.go:53 +0x2c4
==================
--- FAIL: TestGetSet (1.02s)
testing.go:1312: race detected during execution of test
FAIL
FAIL racetest/pkg/state 2.032s
? racetest/pkg/state/capnpclient [no test files]
? racetest/pkg/state/capnpserver [no test files]
? racetest/pkg/state/service/statekvstore [no test files]
FAIL
make: *** [Makefile:19: test] Error 1
But wait, there is more. go test -p 1 ./pkg/race just fails on leakyread, without -race. It just fails. Maybe something I'm doing wrong because this is clearly broken.
go test -failfast -p 1 ./pkg/race/
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x5e19b7]
goroutine 42 [running]:
capnproto.org/go/capnp/v3/rpc/transport.(*ctxReader).leakyRead.func1()
/home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.8/rpc/transport/transport.go:350 +0x37
created by capnproto.org/go/capnp/v3/rpc/transport.(*ctxReader).leakyRead
/home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.8/rpc/transport/transport.go:349 +0xe5
FAIL racetest/pkg/race 0.009s
FAIL
Update: these problems might be caused by the way the tests are run. Each test uses the same socket path to connect. When one test is finished, the connection is dropped then the next test runs. However, if the cleanup in the background takes some time, the next test is already starting. The first thing the test does is to remove the socket.
After changing the test to use a different socket the race condition no longer happens.
Testing this change with alpha.5 it also shows no race condition.
Conclusions: False alarm and cleanup is tricky.
Thanks for following up on this !