handleGRPC can panic
simitt opened this issue · comments
(*mux).handleGRPC(..)
occasionally panics with following error:
http: panic serving <ip:port>: send on closed channel
goroutine 3386993 [running]:
net/http.(*conn).serve.func1()
net/http/server.go:1825 +0xbf
panic({0x557013b82be0, 0x557013f70d50})
runtime/panic.go:844 +0x258
github.com/elastic/gmux.(*mux).handleGRPC(0x0?, 0x60?, {0x557013f99500?, 0xc0021f0cc0?}, 0xc003e135c0, {0x557013db38c0?, 0xc001fe1201?})
github.com/elastic/gmux@v0.2.0/mux.go:300 +0x7d
github.com/elastic/gmux.(*mux).handleH2(0x20?, 0xc0000b8800?, {0x557013f98e20?, 0xc0007a0380?}, {0x557013f7a528, 0xc001fe1200})
github.com/elastic/gmux@v0.2.0/mux.go:138 +0x519
github.com/elastic/gmux.ConfigureServer.func1(0xc002a3ab60, 0xc0007a0380, {0x557013f7a528?, 0xc001fe1200?})
github.com/elastic/gmux@v0.2.0/mux.go:79 +0x51
net/http.(*conn).serve(0xc0027acb40, {0x557013f92370, 0xc004642180})
net/http/server.go:1874 +0x1293
created by net/http.(*Server).Serve
net/http/server.go:3071 +0x4db
I have not yet found how to reproduce it.
TLDR: In the rare occasion during shutdown where a new gprc connection comes in after grpcServer shutdown and before httpServer shutdown, the user program panics.
Managed to reliably reproduce the issue.
2023/01/10 11:03:43 http: panic serving 127.0.0.1:35366: send on closed channel
goroutine 28987 [running]:
net/http.(*conn).serve.func1()
/home/carson/sdk/go1.19.3/src/net/http/server.go:1850 +0xbf
panic({0x8e3d20, 0xa3beb0})
/home/carson/sdk/go1.19.3/src/runtime/panic.go:890 +0x262
github.com/elastic/gmux.(*mux).handleGRPC(...)
/home/carson/projects/gmux/mux.go:300
github.com/elastic/gmux.(*mux).withGRPCInsecure.func1({0xa409a0, 0xc00166a380}, 0xc001b28300)
/home/carson/projects/gmux/mux.go:110 +0x568
net/http.HandlerFunc.ServeHTTP(0x0?, {0xa409a0?, 0xc00166a380?}, 0x46518e?)
/home/carson/sdk/go1.19.3/src/net/http/server.go:2109 +0x2f
net/http.serverHandler.ServeHTTP({0xc00265a7b0?}, {0xa409a0, 0xc00166a380}, 0xc001b28300)
/home/carson/sdk/go1.19.3/src/net/http/server.go:2947 +0x30c
net/http.(*conn).serve(0xc0015b01e0, {0xa410e0, 0xc0001184e0})
/home/carson/sdk/go1.19.3/src/net/http/server.go:1991 +0x607
created by net/http.(*Server).Serve
/home/carson/sdk/go1.19.3/src/net/http/server.go:3102 +0x4db
This only happens in the rare occasion where user program is shutting down and a new grpc connection comes in between grpcServer shutdown and httpServer shutdown. This is caused by httpServer and grpcServer closing order in code using gmux. There is no bug inside gmux mux.go
implementation.
The program panics when the servers are closed in this order in user code:
grpcServer.GracefulStop()
httpServer.Shutdown(context.Background())
This is because grpcServer.GracefulStop()
closes the grpc listener channel mux.grpcConns
while httpServer is still happily accepting grpc connections. When connections are accepted and tried to be placed into a closed channel mux.grpcConns
, the program panics.
The solution is to reverse the closing order in user code, i.e.
httpServer.Shutdown(context.Background())
grpcServer.GracefulStop()
such that the gmux configured httpServer stops accepting connections first.
This is because grpcServer.GracefulStop() closes the grpc listener channel mux.grpcConns while httpServer is still happily accepting grpc connections. When connections are accepted and tried to be placed into a closed channel mux.grpcConns, the program panics.
@carsonip How is mux.grpcConns
being closed? I don't see anything closing it in mux.go.
This is because grpcServer.GracefulStop() closes the grpc listener channel mux.grpcConns while httpServer is still happily accepting grpc connections. When connections are accepted and tried to be placed into a closed channel mux.grpcConns, the program panics.
@carsonip How is
mux.grpcConns
being closed? I don't see anything closing it in mux.go.
@axw It is not closed in mux.go
. In the case of apm-server, it is closed in https://github.com/elastic/apm-server/blob/main/internal/beater/server.go#L222
@carsonip I don't mean closing the server, but the channel. You said:
This is because grpcServer.GracefulStop() closes the grpc listener channel mux.grpcConns
How does closing the server close the channel?
How does closing the server close the channel?
@axw Good question. It isn't immediately obvious.
In apm-server, on startup, we start the grpc server with
s.grpcServer.Serve(s.httpServer.grpcListener)
which calls grpc
package server.go
func (s *Server) Serve(lis net.Listener) error {
...
ls := &listenSocket{Listener: lis}
s.lis[ls] = true
...
}
In apm-server, on shutdown,
s.grpcServer.GracefulStop()
which calls grpc
package server.go
func (s *Server) GracefulStop() {
...
for lis := range s.lis {
lis.Close()
}
...
}
which in turn calls chanListener.Close
because lis
is a gmux/conn.go
chanListener
which implements net.Listener
,
func (l *chanListener) Close() error {
l.closeOnce.Do(func() {
close(l.conns)
})
return nil
}
Thanks @carsonip! Makes sense now.
I think it would still be good to fix in gmux, as it feels like a bit of a footgun. Not urgent, but can we leave this open?
Thanks @carsonip! Makes sense now.
I think it would still be good to fix in gmux, as it feels like a bit of a footgun. Not urgent, but can we leave this open?
Yes, let's leave this open. Would like to know how we can fix this properly, otherwise this will be a point to document in readme.
People usually say "only writers should close the channel" but in this case gmux has no idea whether the channel is closed / should be closed due to reasons including but not limited to shutdown. We may have to add something to signal shutdown to gmux so that it stops sending stuff into the channel.
Would like to know how we can fix this properly, otherwise this will be a point to document in readme.
Could we just never close the chanListener.conns
channel? Instead we could have a chanListener.closed
channel which is closed by chanListener.Close
, and is only ever received from. chanListener.Accept
would select on both <-conns
and <-closed
. handleGRPC may need to do the same, not sure now.