golang / go

The Go programming language

Home Page:https://go.dev

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

net/http: HTTP/2 response body Close method sometimes returns spurious context cancelation error (1.17.3 regression)

cespare opened this issue · comments

What version of Go are you using (go version)?

Go 1.17.3

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

linux/amd64

What did you do?

  • Use an http.Client with (any) Timeout set
  • Make an HTTP/2 request
  • Read the body
  • Close the body

Demo code:

package main

import (
	"io"
	"log"
	"net/http"
	"time"
)

func main() {
	client := &http.Client{Timeout: time.Minute}
	for i := 0; i < 20; i++ {
		resp, err := client.Get("https://google.com")
		if err != nil {
			log.Fatal(err)
		}
		if _, err := io.Copy(io.Discard, resp.Body); err != nil {
			log.Fatal(err)
		}
		if err := resp.Body.Close(); err != nil {
			log.Fatalf("[i=%d] body close error: %s", i, err)
		}
	}
}

(I'm pinging google.com as an easy way to get an HTTP/2 request going.)

What did you expect to see?

No error, at least not on the resp.Body.Close() call.

What did you see instead?

For example:

2021/11/04 15:49:08 [i=0] body close error: context canceled

Sometimes it takes a few iterations to hit an error.

This is HTTP/2-specific: with GODEBUG=http2client=0 it does not reproduce.

This does not occur in Go 1.17.2 but it started happening with 1.17.3.

I realize it's weird to be checking the resp.Body.Close error in the first place. This came up in the context of twirp; I'm going to send a PR to them to stop checking the error. Even so, I don't think we should get this error in the client. It's especially confusing because we get the error immediately (so it's not related to the 1 minute timeout specifically).

I bisected on master to 7109323, which is CL 353870. The original x/net/http2 CL is CL 353390 and it was backported for Go 1.17.3 as CL 357683.

/cc @neild @bradfitz @nightlyone @fraenkel

resp.Body.Close wants to wait until the request stream has been finalized. (This ensures that, among other things, if we immediately send another request, the previous request no longer counts against the connection's concurrency limit.)

Finalizing the stream (in cleanupWriteRequest) can write to the network (sending a stream reset. Network writes can take an arbitrarily long amount of time, so resp.Body.Close waits until the cleanup happens or until the request is canceled, whichever happens first.

A http.Client with a non-zero timeout wraps the response body in a *http.cancelTimerBody. The cancelTimerBody cancels the request context after reading any error, including io.EOF, from the response body.

So if you read the full response, the request context is always canceled before resp.Body.Close is called.

I'm not immediately certain what the right fix is: resp.Body.Close waits for the stream cleanup because we have tests which expect that all stream state is cleaned up after the response body is closed. This seems like a reasonable assumption, and a good property to preserve. But we don't want to block indefinitely, so there needs to be some bound on how long resp.Body.Close will take. The context lifetime seems like the perfect bound, but we apparently can't use it.

Naive suggestion from someone without context:

If cancelTimerBody.Close would return context.Canceled and it had previously called stop, then could it know that the latter caused the former and return nil as a special case?

That fixes the problem where we return context.Canceled instead of nil, but it doesn't fix the problem where we want to wait for the request to be cleaned up before returning from Close.

@neild I'm not sure I'm following you. Altering whether the function returns nil or context.Canceled doesn't change how long the function waits before returning.

Or are you saying that the problem I'm noticing is a symptom of a deeper problem, which is that resp.Body.Close doesn't wait for stream cleanup as intended if there's a timeout set?

Yes, that's right: resp.Body.Close is supposed to be waiting for stream cleanup, but it's returning immediately when the body has been completely read.

This just hit us too; sorry for the +1 post. We reverted to the previous Go version because it's not feasible to change every instance where we check such errors (and not clear that's the right thing to do anyway). This feels like a pretty serious regression.

Change https://golang.org/cl/362354 mentions this issue: http2: avoid spurious context cancelation error from Response.Body.Close

Change https://golang.org/cl/361919 mentions this issue: net/http: do not cancel request context on response body read

We also saw this error in our logs when we upgraded to 1.17.3. We downgraded the log to warn level after seeing this issue to remove the noise.

This issue is serious for me as I have a case when the entire body is lost before being processed.
This is happening when using httputil.DumpResponse to log the response body before returning it.
For example: https://github.com/nexmoinc/gosrvlib/blob/main/pkg/httpclient/client.go#L95 but with the body param set to true.
For now I solved by adding a Context Timeout.

@gopherbot Please open backport issues for 1.16 and 1.17

Backport issue(s) opened: #49558 (for 1.16), #49559 (for 1.17).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://golang.org/wiki/MinorReleases.

Change https://golang.org/cl/368084 mentions this issue: [release-branch.go1.16] net/http: do not cancel request context on response body read

Change https://golang.org/cl/368085 mentions this issue: [release-branch.go1.17] net/http: do not cancel request context on response body read