Async requests

Question

Async requests

MiWeiss opened this issue 2 years ago · comments

Exposing async interfaces would allow using this library in a much more modern, performant, and scalable way.

Would be great if the maintainers could mention if they plan to add async methods in the future (i.e., allow for nonblocking api usage). Even specifying explicitly that this won't be added would be great, as it allows 3rd parties to release their own fork or wrapper, without the risk of being obsolete just moments later :-)

Artem Bernatskyy · Answer 1 · Thu Jul 28 2022 23:28:03 GMT+0800 (China Standard Time)

Nice ignore, ggwp

hallacy · Answer 2 · Thu Jul 28 2022 23:39:55 GMT+0800 (China Standard Time)

Hi @MiWeiss! Thanks for the issue.

Adding async support is on the roadmap, though we aren't committing to a timeline for when it'll be released. While I generally agree that an async interface would be good, can you tell me a little more about the performance improvement you'd expect to see if we added it? That'll help give us a sense about how to prioritize it.

Michael Weiss · Answer 3 · Fri Jul 29 2022 03:07:59 GMT+0800 (China Standard Time)

Hi @hallacy

Thanks for your answer.

I am not sure how to understand your question (e.g., are you asking about the technical reasons or use cases?), so please excuse if my answer misses its target...

Quick motivation:
Async requests makes it very easy to perform other tasks while waiting for the response to a request to OpenAI. If the "other tasks" are also io/network bound (e.g. more requests to OpenAI 😄), I'm likely also running them in an async way, such that I can combine the waiting time (as such, the time I wait is approx. equal to the time the slowest task takes). This is naturally much faster than doing all the waiting sequentially.

And yes, much of that could be done using threads, but there are various disadvantages (especially for i/o bound operations) to using threading over asyncio. See e.g. this great comment.

See also these fastapi docs providing a detailed, yet simple and intuitive motivation for using async requests.

Does this answer your question?

Also, IMHO
It may be nontrivial to change the library such that both async and synchronous requests are supported, both from an implementation perspective (session handling, api design, etc.) and regarding documentation (every snippet can be async or sync)? It might be easier to offer async-openai as a standalone library. That's just my five cents and I am happy to be proven wrong, though.

hallacy · Answer 4 · Fri Jul 29 2022 07:33:57 GMT+0800 (China Standard Time)

It does! Thank you for the writeup. That comment you linked to was particularly helpful.

I think I agree with that opinion about non-triviality. I can't commit to a timeline, but I'll make a point of bringing this up to the team soon

lucasgadams · Answer 5 · Wed Oct 05 2022 21:32:56 GMT+0800 (China Standard Time)

This is badly needed! Having to make concurrent requests using threading is not good for modern python. And it's honestly better to start on it earlier, because the whole library will need to be upgraded. Or of course can make a new library for it. The nodejs openai library is naturally async which is a big advantage. In the meantime, asyncifying function calls with a threadpool seems to work well for me https://docs.python.org/3/library/asyncio-eventloop.html#asyncio.loop.run_in_executor.

Andrew Chen Wang · Answer 6 · Mon Dec 05 2022 01:33:04 GMT+0800 (China Standard Time)

Happy to commit to this. Based on previous experience with Redis and Django, a standalone package isn't needed, and this package would only need to include aiohttp as a dependency. Elasticsearch has async dependencies as optional, but in my opinion it doesn't need to be for this repository. Should be done tomorrow. (In the meantime, you can use asgiref and use sync_to_async)

Andrew Chen Wang · Answer 7 · Tue Dec 06 2022 15:44:21 GMT+0800 (China Standard Time)

#146 resolves this issue. Please go test it, thanks. Usage (notice the a prefix, used in CPython lib, Django, and other libs):

import openai

openai.api_key = "sk-..."

async def main():
    await openai.Completion.acreate(prompt="This is a test", engine="text-ada-001")

In the meantime, you can use asgiref (notice the lack of the a prefix):

import openai
from asgiref.sync import sync_to_async

openai.api_key = "sk-..."

async def main():
    await sync_to_async(openai.Completion.create)(prompt="Test is a test", engine="text-ada-001")

itayzit · Answer 8 · Sun Dec 18 2022 05:52:07 GMT+0800 (China Standard Time)

in the meantime, you can also use this light-weight client i wrote (it's using httpx)
https://pypi.org/project/openai-async/

something like:

pip install openai-async

and then:

import openai_async

response = await openai_async.complete(
    "<API KEY>",
    timeout=2,
    payload={
        "model": "text-davinci-003",
        "prompt": "Correct this sentence: Me like you.",
        "temperature": 0.7,
    },
)
print(response.json()["choices"][0]["text"].strip())
>>> "I like you."

Dan Meruelo · Answer 9 · Sat Feb 04 2023 01:36:34 GMT+0800 (China Standard Time)

@itayzit @MiWeiss any luck with this on a high concurrency implementation? i'm trying this but not getting the rates i'm hoping for.

Andrew Chen Wang · Answer 10 · Sat Feb 04 2023 05:31:30 GMT+0800 (China Standard Time)

@danbf Recommend you also manually control the aiohttp session:

import openai
from aiohttp import ClientSession

openai.aiosession.set(ClientSession())
# At the end of your program, close the http session
await openai.aiosession.get().close()

Michael Weiss · Answer 11 · Sat Feb 04 2023 15:04:14 GMT+0800 (China Standard Time)

@danbf also note that OpenAI has rate limits in place: see here

Dan Meruelo · Answer 12 · Sun Feb 05 2023 04:03:20 GMT+0800 (China Standard Time)

thanks @Andrew-Chen-Wang and @MiWeiss going to use that code hint and yup, was aware of the OpenAI rate limit.

Dan Meruelo · Answer 13 · Tue Feb 07 2023 02:41:40 GMT+0800 (China Standard Time)

@Andrew-Chen-Wang @MiWeiss any ideas what to set TCPConnector(limit=XXX) to maximize throughput?

lucasgadams · Answer 14 · Wed Feb 08 2023 23:17:39 GMT+0800 (China Standard Time)

You could try setting it to 0 for no limit. But honestly I very much doubt you can query openai with much more than 100 connections and not hit one of their quota limits.

Jonathan Vargas · Answer 15 · Fri Apr 14 2023 07:23:31 GMT+0800 (China Standard Time)

It can be done also using asyncer

import openai

openai.api_key = settings.OPENAI_API_KEY

response = await asyncify(openai.ChatCompletion.create)(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "system", "content": "You are a helpful help desk assistant."},
        {"role": "user", "content": "Which is the capital of Ecuador?"},
        {
            "role": "assistant",
            "content": "The capital of Ecuador is Quito.",
        },
        {"role": "user", "content": "Responde lo mismo pero en español"},
    ],
)
print("response", response)

Vedant Roy · Answer 16 · Fri May 12 2023 06:29:42 GMT+0800 (China Standard Time)

@Andrew-Chen-Wang -- would there be any disadvantage of using your PR over asyncer or vice-versa?

Andrew Chen Wang · Answer 17 · Fri May 12 2023 06:52:25 GMT+0800 (China Standard Time)

I would go with whatever openai-python has (ie my PR) since there are constant improvements to the repo, it's the official one, and all the examples online utilize this package. Performance is negligible for this repo since latency is the largest performance hindrance. It also seems easier to just import once (openai) and not twice (with asyncify)