mlcommons / inference

Reference implementations of MLPerf™ inference benchmarks

Home Page:https://mlcommons.org/en/groups/inference

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Understanding the Benchmarking Scenarios for DLRmv2

ZhanqiuHu opened this issue · comments

Hi!
I've been delving into the DLRMv2 benchmark, and I want to confirm my understanding of the scenarios. For the Server scenario, my understanding is that it runs this command

./run_local.sh pytorch dlrm multihot-criteo cpu --scenario Server --max-ind-range=40000000 --samples-to-aggregate-quantile-file=./tools/dist_quantile.txt --max-batchsize=2048

Each query contains one sample, and the sample has B user-item pairs. The number of user-item pairs, B, is variable and determined based on a distribution sampled from dist_quantile.txt, ranging between 100-700. Is it corrrect to say that the reported throughput in Queries Per Second (QPS) reflects the samples processed per second (like these ones), and the throughput in terms of the number of user-item pairs processed per second is greater than QPS (around B * QPS) because each sample contains multiple user-item pairs?

Also, for the Offline scenario, it seems like the number of user-item pairs per sample is also drawn from the dist_quantile.txt. I was wondering what number is used for the number of samples per query.

Thanks!