async `generate_content` is very slow

Question

async `generate_content` is very slow

ascillitoe opened this issue 8 months ago · comments

The async performance of the new SDK still seems to be much worse than the old SDK (with transport='grpc_asyncio').

We can do 1000 text classifications in ~5s with the old client, but this consistently takes over 30s with the new client.

Is this a know issue? Or are there certain settings that must be configured with the new client? (e.g. we found the transport option was very important with the previous client).

Environment details

Programming language: Python
OS: Ubuntu 22.04.5 LTS
Language runtime version: 3.10.12
Package version: 1.7.0

Steps to reproduce

Run N basic text completions async with the new google.genai.client.aio.models.generate_content
Compare with the old google.generativeai generate_content_async

Amy Wu commented 5 months ago

#948
#962

Andrew Stelmach · Answer 1 · Tue Mar 25 2025 03:14:11 GMT+0800 (China Standard Time)

Compare the runtimes before and after the call to generate_content. LLM response speeds are stochastic as every provider changes how many gpus are being used, etc

Amy Wu · Answer 2 · Thu Mar 27 2025 01:07:15 GMT+0800 (China Standard Time)

Note that google.genai uses REST transport. In general, gRPC is faster than REST. We'll calibrate the runtime performance and see how to improve. Thanks for raising this.

Ashley Scillitoe · Answer 3 · Fri Mar 28 2025 07:15:44 GMT+0800 (China Standard Time)

Hi @yinghsienwu, thanks for the response. Are there no plans to add support for gRPC AsyncIO like the old SDK? This seems like a pretty big regression?

hugbubby · Answer 4 · Tue Apr 15 2025 06:21:01 GMT+0800 (China Standard Time)

Same problem here. It's an extremely slow interface. We're getting 100% cpu usage with only a handful of concurrent requests, which is practically unusable for our purposes.

Amy Wu · Answer 5 · Wed Apr 16 2025 01:32:01 GMT+0800 (China Standard Time)

I compared Vertex SDK (google-cloud-aiplatform) (#1 default grpc_asyncio, and #2 rest_asyncio transport) with #3 google-genai SDK v1.9 (rest, httpx) and #4 aiohttp prototype for async generateContent requests (100, 500, 1000 async requests).

Vertex SDK, rest_asyncio and grpc_asyncio transport perform similarly (within the std dev). gRPC should not be the key to better runtime performance.
Currently google-genai SDK (httpx) runtime is ~6X of grpc_asyncio runtime when sending 1000 async requests. (same as the observation above (#557 (comment)).
If we want to improve runtime performance, using aiohttp in google-genai SDK AsyncClient implementation may achieve similar performance as Vertex SDK's rest_asyncio.

We'll try to put it into our roadmap.

Ashley Scillitoe · Answer 6 · Fri May 02 2025 23:47:40 GMT+0800 (China Standard Time)

Thanks for confirming @yinghsienwu! Do you think this situation might be improved by the time the old sdk reaches end of life on Aug 31st?

Amy Wu · Answer 7 · Sat May 03 2025 04:59:42 GMT+0800 (China Standard Time)

I think likely to be available in Q2 2025. I'll attach a PR here.

Marcelo Trylesinski · Answer 8 · Tue Aug 05 2025 18:51:16 GMT+0800 (China Standard Time)

Such a poor decision.

Now, how do I force it to use httpx? I can't uninstall aiohttp.

Victor Prins · Answer 9 · Wed Aug 06 2025 17:20:42 GMT+0800 (China Standard Time)