mlcommons / inference

Reference implementations of MLPerf™ inference benchmarks

Home Page:https://mlcommons.org/en/groups/inference

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Reintroducing min_query_count for SingleStream (64) and Server (662)

psyhtest opened this issue · comments

The minimum query count (min_query_count) was removed from mlperf.conf a while ago. I believe the thinking was that submitters could choose how many samples to process. As long as the minimum run duration (min_duration) constraint of 10 minutes was met, early stopping would take care of estimating the 90th percentile for SingleStream, and the 99th percentile for MultiStream and Server.

However, for SingleStream early stopping still requires at least 64 samples to estimate the 90th percentile. Similarly, for MultiStream early stopping requires at least 662 samples to estimate the 99th percentile. So trying to process less than 64 samples for SingleStream and 662 queries for Server will result in INVALID runs.

Perhaps it would be better to reintroduce these constraints to match the one for MultiStream:

*.MultiStream.min_query_count = 662

yes, it'll be good to make loadgen generate minimum that many queries, right?