capnproto / go-capnp

Cap'n Proto library and code generator for Go

Home Page:https://capnproto.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Race condition in capnp

hspaay opened this issue · comments

commented

Problem: Running go test -race on my modules sometimes fails with a race condition in go-capnproto2's transport.

To reproduce:

The offending code was extracted and put into a simpler repo here: github.com/hiveot/racetest.
This repo has 2 tests, 'race' and 'state'. They each show different problems, although it is possible that 'race''s problem is the result of simplification.

Setup the tests:

git clone github.com/hiveot/racetest
cd racetest
make capnp

Race Package

To reproduce, run this a few times. It usually shows up within 3 attempts:

go test -race pkg/race/Race_test.go

Output of 'race':

henk@msi:~/dev/hiveot/racetest$ go test -race pkg/race/Race_test.go 
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x6eb3ba]

goroutine 148 [running]:
capnproto.org/go/capnp/v3/rpc/transport.(*ctxReader).leakyRead.func1()
	/home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.5/rpc/transport/transport.go:348 +0x5a
created by capnproto.org/go/capnp/v3/rpc/transport.(*ctxReader).leakyRead
	/home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.5/rpc/transport/transport.go:347 +0x16c
FAIL	command-line-arguments	3.027s
FAIL

State Package

The 'state' package intermittently shows a race. Usually it shows up within 10 attempts.

go test -race pkg/state/State_test.go

Output of 'state':

henk@msi:~/dev/hiveot/racetest$ go test -race pkg/state/State_test.go 
==================
WARNING: DATA RACE
Read at 0x00c00015e7c0 by goroutine 37:
  capnproto.org/go/capnp/v3.(*Segment).slice()
      /home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.5/segment.go:49 +0x5a
  capnproto.org/go/capnp/v3.(*Segment).readUint64()
      /home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.5/segment.go:65 +0x51
  capnproto.org/go/capnp/v3.(*Segment).readRawPointer()
      /home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.5/segment.go:69 +0x50
  capnproto.org/go/capnp/v3.(*Segment).resolveFarPointer()
      /home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.5/segment.go:235 +0x4f
  capnproto.org/go/capnp/v3.(*Segment).readPtr()
      /home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.5/segment.go:116 +0x75
  capnproto.org/go/capnp/v3.Struct.Ptr()
      /home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.5/struct.go:109 +0x119
  capnproto.org/go/capnp/v3.Transform()
      /home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.5/answer.go:548 +0x2ed
  capnproto.org/go/capnp/v3.resolution.ptr()
      /home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.5/answer.go:574 +0xfc
  capnproto.org/go/capnp/v3.resolution.client()
      /home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.5/answer.go:589 +0x96
  capnproto.org/go/capnp/v3.(*Answer).PipelineRecv()
      /home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.5/answer.go:375 +0x3f4
  capnproto.org/go/capnp/v3.(*Answer).PipelineRecv-fm()
      <autogenerated>:1 +0xc7
  capnproto.org/go/capnp/v3/server.queueCaller.PipelineRecv()
      /home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.5/server/answer.go:162 +0x2c2
  capnproto.org/go/capnp/v3/server.(*answerQueue).PipelineRecv()
      /home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.5/server/answer.go:136 +0xc9
  capnproto.org/go/capnp/v3/rpc.(*Conn).handleCall()
      /home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.5/rpc/rpc.go:795 +0x19b2
  capnproto.org/go/capnp/v3/rpc.(*Conn).receive()
      /home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.5/rpc/rpc.go:491 +0x40e
  capnproto.org/go/capnp/v3/rpc.(*Conn).receive-fm()
      <autogenerated>:1 +0x39
  capnproto.org/go/capnp/v3/rpc.(*Conn).backgroundTask.func1()
      /home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.5/rpc/rpc.go:191 +0x92
  golang.org/x/sync/errgroup.(*Group).Go.func1()
      /home/henk/go/pkg/mod/golang.org/x/sync@v0.0.0-20201020160332-67f06af15bc9/errgroup/errgroup.go:57 +0x91

Previous write at 0x00c00015e7c0 by goroutine 28:
  capnproto.org/go/capnp/v3.alloc()
      /home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.5/message.go:351 +0x1d1
  capnproto.org/go/capnp/v3.NewCompositeList()
      /home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.5/list.go:62 +0xe4
  capnproto.org/go/capnp/v3/std/capnp/rpc.NewCapDescriptor_List()
      /home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.5/std/capnp/rpc/rpc.capnp.go:2443 +0xce
  capnproto.org/go/capnp/v3/std/capnp/rpc.Payload.NewCapTable()
      /home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.5/std/capnp/rpc/rpc.capnp.go:2215 +0x84
  capnproto.org/go/capnp/v3/rpc.(*Conn).fillPayloadCapTable()
      /home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.5/rpc/export.go:183 +0x14f
  capnproto.org/go/capnp/v3/rpc.(*answer).sendReturn()
      /home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.5/rpc/answer.go:236 +0x198
  capnproto.org/go/capnp/v3/rpc.(*answer).Return()
      /home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.5/rpc/answer.go:201 +0x291
  capnproto.org/go/capnp/v3/server.(*Server).handleCall()
      /home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.5/server/server.go:217 +0x287
  capnproto.org/go/capnp/v3/server.(*Server).handleCalls.func2()
      /home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.5/server/server.go:182 +0x84
  capnproto.org/go/capnp/v3/server.(*Server).handleCalls()
      /home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.5/server/server.go:183 +0x1b6
  capnproto.org/go/capnp/v3/server.New.func1()
      /home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.5/server/server.go:122 +0x58

Goroutine 37 (running) created at:
  golang.org/x/sync/errgroup.(*Group).Go()
      /home/henk/go/pkg/mod/golang.org/x/sync@v0.0.0-20201020160332-67f06af15bc9/errgroup/errgroup.go:54 +0xee
  capnproto.org/go/capnp/v3/rpc.NewConn()
      /home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.5/rpc/rpc.go:157 +0x7d1
  racetest/pkg/caphelp.CapServe.func2()
      /home/henk/dev/hiveot/racetest/pkg/caphelp/CapServe.go:53 +0x2c4

Goroutine 28 (running) created at:
  capnproto.org/go/capnp/v3/server.New()
      /home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.5/server/server.go:122 +0x594
  racetest/capnp/go/api.CapState_NewServer()
      /home/henk/dev/hiveot/racetest/capnp/go/api/State.capnp.go:63 +0xc9
  racetest/capnp/go/api.CapState_ServerToClient()
      /home/henk/dev/hiveot/racetest/capnp/go/api/State.capnp.go:69 +0x36
  racetest/pkg/state/capnpserver.StartStateCapnpServer()
      /home/henk/dev/hiveot/racetest/pkg/state/capnpserver/StateCapnpServer.go:50 +0x166
  command-line-arguments_test.createStateStore.func1()
      /home/henk/dev/hiveot/racetest/pkg/state/State_test.go:37 +0x6f
==================
--- FAIL: TestGetSet (1.02s)
    testing.go:1312: race detected during execution of test
FAIL
FAIL	command-line-arguments	3.035s
FAIL

@hspaay Can you try upgrading to alpha 7? I think this was fixed.

commented

The race test sometimes errors out with a nil exception.

In one captured example this shows up in transport.go:358 (alpha-5)

  n, err := cr.Reader.Read(cr.buf[:max])

cr.Reader is nil

image

At the same time transport.go:130 (function NewMessage) tries to Encode the message:
if err = s.c.Encode(ctx, msg); err != nil {
s.c.wc.WriteClose is also nil.

image

commented

@lthibault speaking of race conditions, lol.
I'll upgrade to alpha-7.

commented

Both race and state tests fail on alpha-7

Interesting! Thanks for the update.

Let's see if #318 fixes it. Should land in a few days time.

Follow-up question: which test triggers the race condition? Could this issue be a duplicate of #301?

Yeah, I think this is probably the same as #301, but great that we apparently have the ability to semi-reproduce it now.

but great that we apparently have the ability to semi-reproduce it now

Yes! Many thanks for this @hspaay !!

commented

Glad to be of help. Just let me know when you have a potential fix. It should be easy to verify now.

@hspaay Can you try this again with the latest tagged version?

commented

@lthibault unfortunately the race is still there using: capnproto.org/go/capnp/v3 v3.0.0-alpha.8

go test -race -failfast -p 1  ./pkg/...
?   	racetest/pkg/caphelp	[no test files]
?   	racetest/pkg/kvstore	[no test files]
?   	racetest/pkg/listener	[no test files]
?   	racetest/pkg/logging	[no test files]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x6ea77a]

goroutine 138 [running]:
capnproto.org/go/capnp/v3/rpc/transport.(*ctxReader).leakyRead.func1()
	/home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.8/rpc/transport/transport.go:350 +0x5a
created by capnproto.org/go/capnp/v3/rpc/transport.(*ctxReader).leakyRead
	/home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.8/rpc/transport/transport.go:349 +0x16c
FAIL	racetest/pkg/race	0.026s
?   	racetest/pkg/race/capnpclient	[no test files]
?   	racetest/pkg/race/capnpserver	[no test files]
?   	racetest/pkg/race/service	[no test files]
==================
WARNING: DATA RACE
Write at 0x00c00019a610 by goroutine 28:
  capnproto.org/go/capnp/v3.alloc()
      /home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.8/message.go:351 +0x1d1
  capnproto.org/go/capnp/v3.NewCompositeList()
      /home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.8/list.go:62 +0xe4
  capnproto.org/go/capnp/v3/std/capnp/rpc.NewCapDescriptor_List()
      /home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.8/std/capnp/rpc/rpc.capnp.go:2443 +0xce
  capnproto.org/go/capnp/v3/std/capnp/rpc.Payload.NewCapTable()
      /home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.8/std/capnp/rpc/rpc.capnp.go:2215 +0x84
  capnproto.org/go/capnp/v3/rpc.(*Conn).fillPayloadCapTable()
      /home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.8/rpc/export.go:183 +0x14f
  capnproto.org/go/capnp/v3/rpc.(*answer).sendReturn()
      /home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.8/rpc/answer.go:236 +0x198
  capnproto.org/go/capnp/v3/rpc.(*answer).Return()
      /home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.8/rpc/answer.go:201 +0x291
  capnproto.org/go/capnp/v3/server.(*Server).handleCall()
      /home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.8/server/server.go:214 +0x267
  capnproto.org/go/capnp/v3/server.(*Server).handleCalls.func2()
      /home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.8/server/server.go:182 +0x84
  capnproto.org/go/capnp/v3/server.(*Server).handleCalls()
      /home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.8/server/server.go:183 +0x1b6
  capnproto.org/go/capnp/v3/server.New.func1()
      /home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.8/server/server.go:122 +0x58

Previous read at 0x00c00019a610 by goroutine 37:
  capnproto.org/go/capnp/v3.(*Segment).slice()
      /home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.8/segment.go:49 +0x5a
  capnproto.org/go/capnp/v3.(*Segment).readUint64()
      /home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.8/segment.go:65 +0x51
  capnproto.org/go/capnp/v3.(*Segment).readRawPointer()
      /home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.8/segment.go:69 +0x50
  capnproto.org/go/capnp/v3.(*Segment).resolveFarPointer()
      /home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.8/segment.go:235 +0x4f
  capnproto.org/go/capnp/v3.(*Segment).readPtr()
      /home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.8/segment.go:116 +0x75
  capnproto.org/go/capnp/v3.Struct.Ptr()
      /home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.8/struct.go:109 +0x119
  capnproto.org/go/capnp/v3.Transform()
      /home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.8/answer.go:559 +0x2ed
  capnproto.org/go/capnp/v3.resolution.ptr()
      /home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.8/answer.go:585 +0xfc
  capnproto.org/go/capnp/v3.resolution.client()
      /home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.8/answer.go:600 +0x96
  capnproto.org/go/capnp/v3.(*Answer).PipelineRecv()
      /home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.8/answer.go:386 +0x3f4
  capnproto.org/go/capnp/v3.(*Answer).PipelineRecv-fm()
      <autogenerated>:1 +0xc7
  capnproto.org/go/capnp/v3/server.queueCaller.PipelineRecv()
      /home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.8/server/answer.go:162 +0x2c2
  capnproto.org/go/capnp/v3/server.(*answerQueue).PipelineRecv()
      /home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.8/server/answer.go:136 +0xc9
  capnproto.org/go/capnp/v3/rpc.(*Conn).handleCall()
      /home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.8/rpc/rpc.go:801 +0x1932
  capnproto.org/go/capnp/v3/rpc.(*Conn).receive()
      /home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.8/rpc/rpc.go:505 +0x40e
  capnproto.org/go/capnp/v3/rpc.(*Conn).receive-fm()
      <autogenerated>:1 +0x39
  capnproto.org/go/capnp/v3/rpc.(*Conn).backgroundTask.func1()
      /home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.8/rpc/rpc.go:191 +0x92
  golang.org/x/sync/errgroup.(*Group).Go.func1()
      /home/henk/go/pkg/mod/golang.org/x/sync@v0.1.0/errgroup/errgroup.go:75 +0x86

Goroutine 28 (running) created at:
  capnproto.org/go/capnp/v3/server.New()
      /home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.8/server/server.go:122 +0x594
  racetest/capnp/go/api.CapState_NewServer()
      /home/henk/dev/hiveot/racetest/capnp/go/api/State.capnp.go:63 +0xc9
  racetest/capnp/go/api.CapState_ServerToClient()
      /home/henk/dev/hiveot/racetest/capnp/go/api/State.capnp.go:69 +0x36
  racetest/pkg/state/capnpserver.StartStateCapnpServer()
      /home/henk/dev/hiveot/racetest/pkg/state/capnpserver/StateCapnpServer.go:50 +0x166
  racetest/pkg/state_test.createStateStore.func1()
      /home/henk/dev/hiveot/racetest/pkg/state/State_test.go:37 +0x6f

Goroutine 37 (running) created at:
  golang.org/x/sync/errgroup.(*Group).Go()
      /home/henk/go/pkg/mod/golang.org/x/sync@v0.1.0/errgroup/errgroup.go:72 +0x12e
  capnproto.org/go/capnp/v3/rpc.NewConn()
      /home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.8/rpc/rpc.go:157 +0x7d1
  racetest/pkg/caphelp.CapServe.func2()
      /home/henk/dev/hiveot/racetest/pkg/caphelp/CapServe.go:53 +0x2c4
==================
--- FAIL: TestGetSet (1.02s)
    testing.go:1312: race detected during execution of test
FAIL
FAIL	racetest/pkg/state	2.032s
?   	racetest/pkg/state/capnpclient	[no test files]
?   	racetest/pkg/state/capnpserver	[no test files]
?   	racetest/pkg/state/service/statekvstore	[no test files]
FAIL
make: *** [Makefile:19: test] Error 1

commented

But wait, there is more. go test -p 1 ./pkg/race just fails on leakyread, without -race. It just fails. Maybe something I'm doing wrong because this is clearly broken.

go test -failfast -p 1 ./pkg/race/
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x5e19b7]

goroutine 42 [running]:
capnproto.org/go/capnp/v3/rpc/transport.(*ctxReader).leakyRead.func1()
	/home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.8/rpc/transport/transport.go:350 +0x37
created by capnproto.org/go/capnp/v3/rpc/transport.(*ctxReader).leakyRead
	/home/henk/go/pkg/mod/capnproto.org/go/capnp/v3@v3.0.0-alpha.8/rpc/transport/transport.go:349 +0xe5
FAIL	racetest/pkg/race	0.009s
FAIL
commented

Update: these problems might be caused by the way the tests are run. Each test uses the same socket path to connect. When one test is finished, the connection is dropped then the next test runs. However, if the cleanup in the background takes some time, the next test is already starting. The first thing the test does is to remove the socket.
After changing the test to use a different socket the race condition no longer happens.
Testing this change with alpha.5 it also shows no race condition.

Conclusions: False alarm and cleanup is tricky.

Thanks for following up on this !