Llama2 LoadGen server mode: TPS not reported properly

Question

Llama2 LoadGen server mode: TPS not reported properly

nvzhihanj opened this issue 4 months ago · comments

In v4.0 submission, we found in the server log that "result_token_throughput" is not reported properly, and most of them are at the e-09 scale (@pgmpablo157321 feel free to to check this link and this link. There is another metric "result_token_throughput_with_loadgen_overhead", but I am not sure if it shows the right metric.

Since we are shifting from QPS to TPS, we need to fix and report the "result_completed_token_per_second".

Zhihan Jiang · Answer 1 · Wed Mar 13 2024 12:29:13 GMT+0800 (China Standard Time)

Seems like the result_token_per_second is in the summary.txt. Not sure why it's not in the details.txt