grpc / grpc-go

The Go language implementation of gRPC. HTTP/2 based RPC

Home Page:https://grpc.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Flaky test: TestClientSendsAGoAway

arvindbr8 opened this issue · comments

This is the first time I'm seeing it:
https://github.com/grpc/grpc-go/actions/runs/8787418013/job/24112571687?pr=7156

Might be worth to check it out.

--- FAIL: Test (47.61s)
    --- FAIL: Test/ClientSendsAGoAway (10.00s)
        tlogger.go:116: INFO clientconn.go:305 [core] [Channel #472]Channel created  (t=+184.125µs)
        tlogger.go:116: INFO clientconn.go:1680 [core] [Channel #472]original dial target is: "127.0.0.1:34251"  (t=+212.779µs)
        tlogger.go:116: INFO clientconn.go:1685 [core] [Channel #472]dial target "127.0.0.1:34251" parse failed: parse "127.0.0.1:34251": first path segment in URL cannot contain colon  (t=+246.482µs)
        tlogger.go:116: INFO clientconn.go:1706 [core] [Channel #472]fallback to scheme "passthrough"  (t=+274.044µs)
        tlogger.go:116: INFO clientconn.go:1714 [core] [Channel #472]parsed dial target is: passthrough:///127.0.0.1:34251  (t=+299.482µs)
        tlogger.go:116: INFO clientconn.go:1837 [core] [Channel #472]Channel authority set to "127.0.0.1:34251"  (t=+325.33µs)
        tlogger.go:116: INFO resolver_wrapper.go:197 [core] [Channel #472]Resolver state updated: {
              "Addresses": [
                {
                  "Addr": "127.0.0.1:34251",
                  "ServerName": "",
                  "Attributes": null,
                  "BalancerAttributes": null,
                  "Metadata": null
                }
              ],
              "Endpoints": [
                {
                  "Addresses": [
                    {
                      "Addr": "127.0.0.1:34251",
                      "ServerName": "",
                      "Attributes": null,
                      "BalancerAttributes": null,
                      "Metadata": null
                    }
                  ],
                  "Attributes": null
                }
              ],
              "ServiceConfig": null,
              "Attributes": null
            } (resolver returned new addresses)  (t=+464.651µs)
        tlogger.go:116: INFO balancer_wrapper.go:103 [core] [Channel #472]Channel switches to new LB policy "pick_first"  (t=+513.843µs)
        tlogger.go:116: INFO clientconn.go:846 [core] [Channel #472 SubChannel #473]Subchannel created  (t=+569.808µs)
        tlogger.go:116: INFO clientconn.go:535 [core] [Channel #472]Channel Connectivity change to CONNECTING  (t=+882.825µs)
        tlogger.go:116: INFO clientconn.go:305 [core] [Channel #472]Channel exiting idle mode  (t=+914.[194](https://github.com/grpc/grpc-go/actions/runs/8787418013/job/24112571687?pr=7156#step:4:195)µs)
        tlogger.go:116: INFO clientconn.go:1207 [core] [Channel #472 SubChannel #473]Subchannel Connectivity change to CONNECTING  (t=+971.722µs)
        tlogger.go:116: INFO clientconn.go:1322 [core] [Channel #472 SubChannel #473]Subchannel picks a new address "127.0.0.1:34251" to connect  (t=+998.061µs)
        tlogger.go:116: INFO clientconn.go:535 [core] [Channel #472]Channel Connectivity change to SHUTDOWN  (t=+1.652745ms)
        tlogger.go:116: INFO resolver_wrapper.go:100 [core] [Channel #472]Closing the name resolver  (t=+1.674466ms)
        tlogger.go:116: INFO balancer_wrapper.go:135 [core] [Channel #472]ccBalancerWrapper: closing  (t=+1.694022ms)
        tlogger.go:116: INFO clientconn.go:1[207](https://github.com/grpc/grpc-go/actions/runs/8787418013/job/24112571687?pr=7156#step:4:208) [core] [Channel #472 SubChannel #473]Subchannel Connectivity change to SHUTDOWN  (t=+1.721133ms)
        tlogger.go:116: INFO clientconn.go:1553 [core] [Channel #472 SubChannel #473]Subchannel deleted  (t=+1.773004ms)
        tlogger.go:116: INFO clientconn.go:305 [core] [Channel #472]Channel deleted  (t=+1.793723ms)
        goaway_test.go:826: Context timed out
FAIL

@arvindbr8 This seems to be working for me. I ran the following commands to cross check and all of'em worked. Let me know in case i am missing something

go test -coverprofile=coverage.out -coverpkg=./... ./...
go test ./...
go test ./... -race

@arvindbr8 I assigned it back to you since @aranjans needs more clarification. Let me know if its ok

Sure.

We discussed this offline:

It's usually hard to reproduce these flakes that occur in GitHub CI on local. This is because, usually since our laptops have more juice than the free GitHub VMs that we use. Traditionally, we have an internal mechanism (discussed in chat with @aranjans) which would allow us to run the test in the order of 10-100K runs to check for flake rate. Once we have the flake rate, we can decide the priority of this issue.

Regardless imo, this is worth taking a look now since this was checked very recently by @aranjans and he has not so stale context on the test that he added.

@aranjans -- please let me know if you want to sync up again regarding the mechanism mentioned above.

Closing this issue as we are not able to repro this.

Here is a repro of the race condition: https://github.com/grpc/grpc-go/compare/master...arjan-bal:repro-race-cond?expand=1

We should probably wait for the client's Channel to be READY before closing it.