async `generate_content` is very slow
ascillitoe opened this issue · comments
The async performance of the new SDK still seems to be much worse than the old SDK (with transport='grpc_asyncio').
We can do 1000 text classifications in ~5s with the old client, but this consistently takes over 30s with the new client.
Is this a know issue? Or are there certain settings that must be configured with the new client? (e.g. we found the transport option was very important with the previous client).
Environment details
- Programming language: Python
- OS: Ubuntu 22.04.5 LTS
- Language runtime version: 3.10.12
- Package version: 1.7.0
Steps to reproduce
- Run N basic text completions async with the new
google.genai.client.aio.models.generate_content - Compare with the old
google.generativeaigenerate_content_async
Compare the runtimes before and after the call to generate_content. LLM response speeds are stochastic as every provider changes how many gpus are being used, etc
Note that google.genai uses REST transport. In general, gRPC is faster than REST. We'll calibrate the runtime performance and see how to improve. Thanks for raising this.
Hi @yinghsienwu, thanks for the response. Are there no plans to add support for gRPC AsyncIO like the old SDK? This seems like a pretty big regression?
Same problem here. It's an extremely slow interface. We're getting 100% cpu usage with only a handful of concurrent requests, which is practically unusable for our purposes.
I compared Vertex SDK (google-cloud-aiplatform) (#1 default grpc_asyncio, and #2 rest_asyncio transport) with #3 google-genai SDK v1.9 (rest, httpx) and #4 aiohttp prototype for async generateContent requests (100, 500, 1000 async requests).
- Vertex SDK, rest_asyncio and grpc_asyncio transport perform similarly (within the std dev). gRPC should not be the key to better runtime performance.
- Currently google-genai SDK (httpx) runtime is ~6X of grpc_asyncio runtime when sending 1000 async requests. (same as the observation above (#557 (comment)).
- If we want to improve runtime performance, using aiohttp in google-genai SDK AsyncClient implementation may achieve similar performance as Vertex SDK's rest_asyncio.
We'll try to put it into our roadmap.
Thanks for confirming @yinghsienwu! Do you think this situation might be improved by the time the old sdk reaches end of life on Aug 31st?
I think likely to be available in Q2 2025. I'll attach a PR here.
Such a poor decision.
Now, how do I force it to use httpx? I can't uninstall aiohttp.