Llama2 LoadGen server mode: TPS not reported properly
nvzhihanj opened this issue · comments
Zhihan Jiang commented
In v4.0 submission, we found in the server log that "result_token_throughput" is not reported properly, and most of them are at the e-09 scale (@pgmpablo157321 feel free to to check this link and this link. There is another metric "result_token_throughput_with_loadgen_overhead", but I am not sure if it shows the right metric.
Since we are shifting from QPS to TPS, we need to fix and report the "result_completed_token_per_second".
Zhihan Jiang commented
Seems like the result_token_per_second is in the summary.txt. Not sure why it's not in the details.txt