googleapis / python-genai

Google Gen AI Python SDK provides an interface for developers to integrate Google's generative models into their Python applications.

Home Page:https://googleapis.github.io/python-genai/

Repository from Github https://github.comgoogleapis/python-genaiRepository from Github https://github.comgoogleapis/python-genai

async `generate_content` is very slow

ascillitoe opened this issue · comments

The async performance of the new SDK still seems to be much worse than the old SDK (with transport='grpc_asyncio').

We can do 1000 text classifications in ~5s with the old client, but this consistently takes over 30s with the new client.

Is this a know issue? Or are there certain settings that must be configured with the new client? (e.g. we found the transport option was very important with the previous client).

Environment details

  • Programming language: Python
  • OS: Ubuntu 22.04.5 LTS
  • Language runtime version: 3.10.12
  • Package version: 1.7.0

Steps to reproduce

  1. Run N basic text completions async with the new google.genai.client.aio.models.generate_content
  2. Compare with the old google.generativeai generate_content_async

Compare the runtimes before and after the call to generate_content. LLM response speeds are stochastic as every provider changes how many gpus are being used, etc

Note that google.genai uses REST transport. In general, gRPC is faster than REST. We'll calibrate the runtime performance and see how to improve. Thanks for raising this.

Hi @yinghsienwu, thanks for the response. Are there no plans to add support for gRPC AsyncIO like the old SDK? This seems like a pretty big regression?

Same problem here. It's an extremely slow interface. We're getting 100% cpu usage with only a handful of concurrent requests, which is practically unusable for our purposes.

I compared Vertex SDK (google-cloud-aiplatform) (#1 default grpc_asyncio, and #2 rest_asyncio transport) with #3 google-genai SDK v1.9 (rest, httpx) and #4 aiohttp prototype for async generateContent requests (100, 500, 1000 async requests).

  1. Vertex SDK, rest_asyncio and grpc_asyncio transport perform similarly (within the std dev). gRPC should not be the key to better runtime performance.
  2. Currently google-genai SDK (httpx) runtime is ~6X of grpc_asyncio runtime when sending 1000 async requests. (same as the observation above (#557 (comment)).
  3. If we want to improve runtime performance, using aiohttp in google-genai SDK AsyncClient implementation may achieve similar performance as Vertex SDK's rest_asyncio.

We'll try to put it into our roadmap.

Thanks for confirming @yinghsienwu! Do you think this situation might be improved by the time the old sdk reaches end of life on Aug 31st?

I think likely to be available in Q2 2025. I'll attach a PR here.

Such a poor decision.

Now, how do I force it to use httpx? I can't uninstall aiohttp.

See also #1206 and #1074.

You can force the usage of httpx by instantiating the Client as follows:

from httpx import AsyncHTTPTransport
from google.genai import types

genai_client = genai.Client(
    http_options=types.HttpOptions(
        async_client_args={"transport": AsyncHTTPTransport()}
    ),
)