python-trio / hip

A new Python HTTP client for everybody

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Discussion: What do we call the async versions of hip.get, hip.post, etc.?

njsmith opened this issue · comments

So everyone's used to being able to do:

import requests
response = requests.get(...)

So I'm assuming we will also want to support:

import hip
response = hip.get(...)

But what if you want to do an async get? If hip.get is already claimed for the sync version, then we can't use that for async gets.

There are three options I've thought about:

Option 1: Don't offer top-level async helpers at all. If you want to make async requests, you have to make an AsyncSession.

This is what I originally thought we'd want to do, because most users generally want to re-use connections, and especially b/c we want to support HTTP/2 in async mode, which requires background tasks and a shared connection pool. So pushing them towards sessions is the way to go, and we'd treat the no-session mode as a kind of simplified legacy interface for sync users only.

But... based on the discussion in #125, it sounds like we're going to be able to support HTTP/2 and shared connection pools even if users aren't explicitly using a session. So that makes the top-level hip.get-style helpers way more useful than before, and makes them equally useful in both sync and async mode. So maybe we want them after all!

Option 2: We could follow the same pattern as Python uses for e.g. __iter__ vs __aiter__, and have:

response = hip.get(...)  # sync
response = await hip.aget(...)  # async

This would work. There are some downsides though: it makes the async version feel a little more awkward and "second class". It means we either need to either rename the session methods too (await async_session.aget(...)), or else have an inconsistency (await hip.aget(...) / await async_session.get(...)). And if someone accidentally writes hip.get inside synchronous code, then it'll seem to work but cause nasty performance problems, as it blocks the whole main loop.

Also, making users pick between these two modes on a call-by-call basis just feels a bit wrong: generally either your whole program is sync, or else your whole program is async; it's not going to magically change between two lines in the same file.

Option 3: Use different namespaces. E.g.:

# Sync
import hip
response = hip.get(...)

# Async
import ahip
response = ahip.get(...)

This seems pretty attractive. We could even remove the class name munging code, so instead of hip.Session and hip.AsyncSession, we'd just have hip.Session and async_hip.Session – make the two APIs completely identical except for the top-level namespace and the async/await annotations. And it has the nice property that if you're using async, you import the async namespace at the top of your file, and then you can't accidentally use the sync version; and the async version gets to use all the nice unqualified names like get.

It also makes life somewhat simpler for code that wants to build on top of hip, while using unasync to support both sync+async modes. You just do if ASYNC_MODE: import async_hip as hip; else import sync_hip as hip, and then the rest of your file uses hip and it just works in both modes. With "option 2" above you can also make this work, but you need a more sophisticated version of unasync that can munge agetget, etc.

If we go down this road then we'll need to do some major bikeshedding to pick the names – hip vs async_hip? sync_hip versus async_hip? hip.sync versus hip.async_? (Unfortunately you can't have a module named async because it's a keyword :-(.) hip.s versus hip.a? This is kind of a second-order issue, because if separate namespaces are a bad idea, then we don't need to even think about names. But it's also kind of important, because my main hesitation with this approach is that all the "async hip" names look kind of ugly.

I guess we could even take this a step further and distribute hip and ahip as two separate packages in PyPI. Not sure if that's better or worse that bundling them together.

What's to prevent making

import hip
# Async
async hip.get()
# Sync
hip.get()

work, by detecting with sniffio (or likewise) whether the user is running async or not? (Not sure it's the best idea, just asking whether it's possible 😉)

Is it possible to return an awaitable from a sync function? If so then that's feasible.

I bet we could do typing.Callable[[...], typing.Union[SyncResponse, typing.Awaitable[AsyncResponse]] and that'd work nicely?

But won't mypy then force you to do an explicit cast or isinstance check every time you want to use the returned object?

(Automagically switching between sync/async also makes me uncomfortable on vague grounds like "explicit is better than implicit". But I think the type issue is a more practical problem.)

That's my worry too, I'll experiment with how mypy handles this case.

Can confirm that mypy doesn't like the type annotation. Can't use overloads either because the return types are different.

Maybe put the async names in hip and the sync names in hip.sync? And if you invoke the async names without an async event loop running, raise an error that points the user at hip.sync. There are plenty of other HTTP libraries that are sync-first; maybe there's room for hip as one that's prominently async-first.

hip.sync is the same number of characters as requests, too...

Maybe put the async names in hip and the sync names in hip.sync? And if you invoke the async names without an async event loop running, raise an error that points the user at hip.sync. There are plenty of other HTTP libraries that are sync-first; maybe there's room for hip as one that's prominently async-first.

Hmm. It could work, but I'm a bit worried that it would send the message that we consider sync support to be an afterthought or second-class citizen. Of course we know that's not true, but I've noticed that people can be very sensitive to these kinds of signals, especially in that first 30 seconds looking at a project when they haven't decided yet whether to invest a whole 5 minutes in reading about it. And we want to position ourselves as the successor to urllib3/requests/etc., so it's important to communicate we care about those use cases.


Maybe we should make it sync_hip and async_hip, with a strong explicit convention that you do

import sync_hip as hip

or

import async_hip as hip

? It's slightly weird, but at least it's just a single small dollop of weirdness that you can confront once and deal with, and it lets us use plain hip when we want to talk about both versions at once. (Which will be a lot of the time, since all the API details are duplicated.)

Hmm, I feel an aversion to having our top-level namespace be anything besides the package name. Currently I'm feeling hip.SyncSession() / hip.AsyncSession(), hip.sync_request() / hip.async_request()?

hip.a.request() and hip.s.request() being symmetrical also works for me and having all non-I/O related functions/members be imported at the top-level hip. I'm kinda liking this one more and more due to how out of the way it is.

Yeah, the different top-level namespaces is definitely "weird", but it does centralize all the weirdness in a single line at the top of the file... if we make people type hip.sync_get(...) all the time then it's weird everywhere :-/

Here's another idea: maybe we could get rid of the top-level get/post/etc., if we made sessions really easy to use? So like the minimal example would be:

# Sync
import hip
client = hip.Client()
response = client.get(...)

# Async
import hip
client = hip.AsyncClient()

async def main():
    response = await client.get(...)

trio.run(main)

The part that's super-weird-magical about this is that async operations always have to be done within a specific event loop. So in this version, AsyncClient would need to wait until the first time it's used before actually initializing anything (e.g. background workers)... and it would need some mechanism to catch cases like:

# Run it once
trio.run(main)
# Run it a second time
trio.run(main)

In this case, two calls to trio.run are using the same AsyncClient object, but they can't share sockets or any IO-related state, because of the separate event loops. So we'd need some strategy to deal with this.

But of course we need something similar anyway for top-level operations like hip.get; this just makes it more obvious.

Edit: If we add from . import a, s into the hip namespace then users will only need to call hip.s.Session() or hip.a.Session() once in their code unless they're creating sessions in multiple places, I feel like that case is rare? Usually I've only seen sessions created in one place in a codebase.

I feel like users have come to expect something like a session-less .request() method for toying around, I can tell you for certain I've never written requests.Session() within an interactive console.

So I'm now convinced that the two namespaces idea is the way forward due to removing the confusion between a sync / async painted object and a shared object between the sync and async namespaces. Here's the two that imo are the best contenders for names:

hip / ahip

We can reserve ahip if we go this route to auto-install hip / potentially in the future split the project into it's sync and async components?

Pros

  • Ships a namespace that's the same as the package name
  • When using a REPL, import hip gives you the right version that you want.
  • Short and to the point, can

Cons

  • ahip isn't really "anything", it's not immediately obvious that the a means async.
  • Favors "sync" over "async" because it's the same as package name.

Aesthetics

### Sync

import hip

session = hip.Session(
    headers={"authorization": "Bearer abc"},
    ca_certs="/path/to/bundle.pem",
    proxies={
        "http": "https://localhost:1234",
        "https": "https://localhost:1234"
    }
)

def main():
    try:
        resp = session.request(
            "GET", "https://example.com",
            retries=hip.Retries(
                total=10,
                backoff_factor=0.2,
                max_backoff=3.0
            )
        )
        resp.raise_for_status()
    except hip.HTTPError:
        print("Error!")
        raise

    for chunk in resp.stream_text():
        print(chunk)

main()

### Async

import trio
import ahip

session = ahip.Session(
    headers={"authorization": "Bearer abc"},
    ca_certs="/path/to/bundle.pem",
    proxies={
        "http": "https://localhost:1234",
        "https": "https://localhost:1234"
    }
)

async def main():
    try:
        resp = await session.request(
            "GET", "https://example.com",
            retries=ahip.Retries(
                total=10,
                backoff_factor=0.2,
                max_backoff=3.0
            )
        )
        resp.raise_for_status()
    except ahip.HTTPError:
        print("Error!")
        raise

    async for chunk in resp.stream_text():
        print(chunk)

trio.run(main)

### REPL

>>> import hip
>>>
>>> resp = hip.get("https://example.com")
<Response [200]>
>>> resp.headers
{"date": "Thu, 16 Jan 2020 14:26:07 GMT", ...}
>>> resp.text
"<!doctype html>\n<html>..."

hip / async_hip

Has very similar pros and cons as above. More verbose definitely but can hopefully be overcome with renaming imports (which is more common now, thanks data science community!).
If we were to use this method I'd suggest we recommend users do the following within documentation: import async_hip as hip

### Sync

import hip

session = hip.Session(
    headers={"authorization": "Bearer abc"},
    ca_certs="/path/to/bundle.pem"
)

def main():
    try:
        resp = session.request(
            "GET", "https://example.com",
            retries=hip.Retries(
                total=10,
                backoff_factor=0.2,
                max_backoff=3.0
            ),
            proxies={
                "http": "https://localhost:1234",
                "https": "https://localhost:1234"
            }
        )
        resp.raise_for_status()
    except hip.HTTPError:
        print("Error!")
        raise

    for chunk in resp.stream_text():
        print(chunk)

main()

### Async

import trio
import async_hip as hip

session = hip.Session(
    headers={"authorization": "Bearer abc"},
    ca_certs="/path/to/bundle.pem"
)

async def main():
    try:
        resp = await session.request(
            "GET", "https://example.com",
            retries=hip.Retries(
                total=10,
                backoff_factor=0.2,
                max_backoff=3.0
            ),
            proxies={
                "http": "https://localhost:1234",
                "https": "https://localhost:1234"
            }
        )
        resp.raise_for_status()
    except hip.HTTPError:
        print("Error!")
        raise

    async for chunk in resp.stream_text():
        print(chunk)

trio.run(main)

### REPL

>>> import hip
>>>
>>> resp = hip.get("https://example.com")
<Response [200]>
>>> resp.headers
{"date": "Thu, 16 Jan 2020 14:26:07 GMT", ...}
>>> resp.text
"<!doctype html>\n<html>..."

How about having the async functions in the top-level hip namespace with the sync api mirrored in hip.sync?

If a user wanted to use the sync interface they could then simply do:

import hip.sync as hip

The reason I think the sync functions should be default namespace is they're the ones you want when using a console, probably what most users will type first when experimenting.

At least in ipython that's much less of a concern as you can await things in the repl with the

%autoawait trio

"magic" command.
e.g.

In [10]: async with httpx.Client(backend=TrioBackend()) as client:
    ...:     response = await client.get("http://127.0.0.1:8000")

Just Works.

Discussed the IPython point with @njsmith and we agreed that the IPython case users probably still want the sync functions. There isn't much to gain from using async I/O within a REPL when the typical use-case is playing around and experimenting versus performance.

I use the repl a lot to play around, but when I've got some working code I then copy & paste it into a script.

If you play around with the sync version you can no longer copy & paste without having to then annotate the code with async/await in all the right places. At least for me, async-ifying sync code is a bit of a whack-a-mole process until I've got the asyncs and awaits all correct. So I think there is actually a lot of value in an async repl!

But it's a totally valid position to not want to demote the sync version to a mere sub-package and

import ahip as hip

works just as well as

import hip.sync as hip

I'll definitely keep the async repl use-case in mind, so you are in favor of hip / ahip? That's where I'm leaning as well.

I think the nice thing about ahip is that it doesn't require renaming at import time – obviously you can if you want, but typing one extra character isn't a big deal.

And that means that when people are like, pasting code in chat to ask for help, we can see that this code has a bug (missing await):

response = ahip.get("https://github.com")

In the possible world where both imports get renamed to hip, the code becomes

response = hip.get("https://github.com")

...and we have no idea whether there's a bug or not.

Other cases where this kind of clarity is helpful: pasting scripts from stackoverflow, tutorial text targeting users who aren't so clear on what async/await are, etc.

The reason I think the sync functions should be default namespace is they're the ones you want when using a console, probably what most users will type first when experimenting.

And realistically, most Python projects will be sync for the foreseeable future. No matter how sexy our new async libraries are, for most projects sync works just fine :-)

Another consideration here is unasync support. Say a library like Apache Libcloud wants to adopt unasync and use hip as an HTTP client, I guess "ahip" in _async folders becomes "hip" in _sync folders, but where is this transformation encoded?

What about something like

# For sync

from hip.sync import hip

session = hip.Session(
    headers={"authorization": "Bearer abc"},
    ca_certs="/path/to/bundle.pem"
)

def main():
    try:
        resp = session.request(
            "GET", "https://example.com",
            retries=hip.Retries(
                total=10,
                backoff_factor=0.2,
                max_backoff=3.0
            ),
            proxies={
                "http": "https://localhost:1234",
                "https": "https://localhost:1234"
            }
        )
        resp.raise_for_status()
    except hip.HTTPError:
        print("Error!")
        raise

    for chunk in resp.stream_text():
        print(chunk)

main()

# For async
from hip.async import hip
import trio

session = hip.Session(
    headers={"authorization": "Bearer abc"},
    ca_certs="/path/to/bundle.pem"
)

async def main():
    try:
        resp = await session.request(
            "GET", "https://example.com",
            retries=hip.Retries(
                total=10,
                backoff_factor=0.2,
                max_backoff=3.0
            ),
            proxies={
                "http": "https://localhost:1234",
                "https": "https://localhost:1234"
            }
        )
        resp.raise_for_status()
    except hip.HTTPError:
        print("Error!")
        raise

    async for chunk in resp.stream_text():
        print(chunk)

trio.run(main)

And both sync and async versions have similar APIs

Say a library like Apache Libcloud wants to adopt unasync and use hip as an HTTP client, I guess "ahip" in _async folders becomes "hip" in _sync folders, but where is this transformation encoded?

I guess the two options are:

  • bake it into unasync's default replacements list, because hip is just that special :-)
  • provide some convenient interface to let projects define custom replacements, e.g. inside setup.py

If we wanted to get fancy, I guess we could do some light AST analysis, and only enable the ahiphip replacement for files that contain an import ahip

from hip.async import hip

Annoyingly, this is actually impossible: since async is a syntax keyword, Python refuses to let you use it as the name for a module.

It sounds like we've pretty much converged on the hip/ahip approach... I wish ahip was prettier, but everything else has much more impactful downsides, so let's keep it simple.

I claimed the name on PyPI: https://pypi.org/project/ahip/
(And the placeholder code is trivial, but I went ahead and pushed it to here: https://github.com/python-trio/ahip)

Closing this as decided, thanks everyone!