Asynchronous execution: High latency when retrieving results

Question

Asynchronous execution: High latency when retrieving results

mvidela31 opened this issue 5 months ago · comments

Hi everyone and thanks for this amazing work!

I tried to perform the asynchronous execution to accelerate the generation inference time following the documentation example:

async_results = []
for batch in batch_generator(): # For-loop 1
    async_results.extend(generator.generate_batch(batch, asynchronous=True))

for async_result in async_results: # For-loop 2
    print(async_result.result())  # This method blocks until the result is available.

First I tried to run generator.generate_batch(batch, asynchronous=True) on a dataset of 1_000 samples with a batch size of 128 and device_index=[0, 1, 2, 3] (4 x Nvidia Tesla T4). The for-loop 1 finishes quickly and the for-loop 2 finishes almost immediately (~3s). However, when a tried to run the same code on a dataset of 100_000 samples, the for-loop 1 also finishes quickly (~20m, which is ok to me), but the for-loop 2 takes ~5 min just for retrieving the results of the first 1_000 samples (the same samples as in the first run).

I think this performance difference (3s vs. 5min in async_result.result() on the same 1_000 samples) could be related to the limited queue size mentioned in the documentation. Is there a way to speed up the asynchronous results retrieving (for-loop 2) so that the processing speed of the first run is recovered?

Minh-Thuc · Answer 1 · Mon Feb 12 2024 22:45:12 GMT+0800 (China Standard Time)

How do you get the time of the result for the first 1000 samples ? Normally, when 100 000 samples are passed in the for-loop 1, the first 1000 samples have to finished to clean the queue.