How to handle Response bodies?

Question

How to handle Response bodies?

sethmlarson opened this issue 4 years ago · comments

Seth Michael Larson commented 4 years ago

Some time ago @njsmith and I discussed how best to implement the "streaming" / "preloading" of Response data and what that interface might look like.

Existing Implementations

urllib3 has preload_content=False on the HTTPResponse object to signal that the data should not be loaded into memory and instead .read() until empty.

Requests has stream=True on the .request() method which changes whether preload_content is called on the urllib3.HTTPResponse object but also provides functions like .iter_content() with optional decoding of the raw binary data into text. Requests also provides a .raw object which if I remember correctly doesn't do any decompression of the raw data stream.

Other ideas include having different methods on the Response itself that trigger streaming versus load into memory (.text() / .stream_text()). This way is tough because we then have to worry about GC and connection issues if the user is using the library to only check the status code, in an interactive console, etc.

Another idea is to have a separate method for streaming but that'd move away from our "one function entrypoint" for no real benefit, maybe

Ideas for Hip

How to configure streaming?

Definitely going to support both streaming and preloading of the body onto the Response.

Users expect preloading the body by default as is the case in Requests and urllib3. This makes calling things like .text and .json easy and also makes ensuring your connection is returned to the pool a default, no GC / pooling issues.

Using typing.Literal[True] and typing.Literal[False] along with overload we should be able to achieve proper type-hinting and IDE support for stream=True/False knowing that a given response will be streamable or not. Requests does this so it'd be familiar to users. We're both leaning towards this as the solution.

Calling .text and .json on a Response don't make sense from a streaming context, so @njsmith and I thought it'd be a good idea to have two response types, one for preloaded mode and one for streaming mode. Then we can avoid the awkward state on HTTP responses

What happens when Response body is larger than memory?

Current behavior for HTTP clients is to load everything into memory by default.

Maybe we could set a reasonable limit and error out if more than that is loaded info memory?
Maybe detect memory size and error out on that threshold by default?
stream=True case doesn't have to worry about memory, users are able to make their own decisions about memory.

How to access a text stream / file-like object?

I've wanted to add a mode to get back a file-like object for interfaces that only support that (think csv.reader(), etc) as this has been a pain point for playing with HTTP response data.

Requests also has .iter_content() with decoding support, what would this interface look like for Hip? Requests chunk sizes are also always related to the amount of bytes read, rather than the amount of text that is decoded. This struck me as confusing but maybe it's still the right thing to do (limiting actual data read from the socket not data chunks coming from the stream).