[Bug] Undeterministic Batch Formation
sunggg opened this issue · comments
The default run for serve/tests/test_engine.py
first adds 4 requests and then start the engine.
I expected this would form single prefill batch with 4 requests.
However, it shows non-deterministic behavior, sometimes it forms two prefill batches of 2 requests each, sometimes it forms two prefill batches of 1/3 requests each.
Debug log does not provide any useful information.
@elvin-n will take a look.
I think this is inherent to async request add / batch creation in staging engine. Not sure if we can make it deterministic @yelite
I think this is inherent to async request add / batch creation in staging engine. Not sure if we can make it deterministic @yelite
But that test adds requests one by one in a for loop, and the engine push requests to queues before they reach to the worker batch. I don't see anything obvious in that code path which would create undeterministic behavior.
By "async" I meant the request arrives to the worker via AddRequestsCommand
and I think that's done asynchronously with worker.step()
?
By "async" I meant the request arrives to the worker via
AddRequestsCommand
and I think that's done asynchronously withworker.step()
?
Yes that's async and very likely to be the cause of undeterministic batch here. I don't have a good way to fix this without impacting the performance.
In #193 I plan to remove the sync engine, but add some flags to the staging engine to run things in a more synchronous way, so we can have more deterministic behavior in unit tests if it's necessary.
There are three use cases:
- Explicit creation of the SynchronousInferenceEngine and usage its method add/step directly. It will have completely determined behaviour and I have not seen that
serve/tests/test_engine.py
process different number of requests than 4. - Usage of AsyncEngineConnector. It will create async point and even for SynchronousInferenceEngine behaviour cannot be deterministic
- Usage of StagingInferenceEngine in any mode adds one more point of async for submitting of requests in the second process
2 and 3 async points were added by design, helps in real situation and cannot be removed. The only predictable flow can be with Sync engine and explicit call of add/step.