RetinaNet TEST05 is not passing for a Singlestream run

Question

RetinaNet TEST05 is not passing for a Singlestream run

arjunsuresh opened this issue 7 months ago · comments

We are trying to reproduce a qaic submission but for retinanet SS run never passes TEST05. The performance numbers are consistent across the runs and so reruns or even longer runs are not helping

TEST05 log

================================================
MLPerf Results Summary
================================================
SUT name : KILT_SERVER
Scenario : SingleStream
Mode     : PerformanceOnly
90th percentile latency (ns) : 19641101
Result is : VALID
  Min duration satisfied : Yes
  Min queries satisfied : Yes
  Early stopping satisfied: Yes
Early Stopping Result:
 * Processed at least 64 queries (56356).
 * Would discard 5469 highest latency queries.
 * Early stopping 90th percentile estimate: 19666047
 * Early stopping 99th percentile estimate: 21004798

Actual performance run

================================================
MLPerf Results Summary
================================================
SUT name : KILT_SERVER
Scenario : SingleStream
Mode     : PerformanceOnly
90th percentile latency (ns) : 18503321
Result is : VALID
  Min duration satisfied : Yes
  Min queries satisfied : Yes
  Early stopping satisfied: Yes
Early Stopping Result:
 * Processed at least 64 queries (35227).
 * Would discard 3390 highest latency queries.
 * Early stopping 90th percentile estimate: 18526238
 * Early stopping 99th percentile estimate: 20238324

Anton Lokhmotov · Answer 1 · Thu Feb 22 2024 07:38:40 GMT+0800 (China Standard Time)

@arjunsuresh Why is the number of samples different between the two runs? (10-11 minutes vs 16-17 minutes?) Maybe the system you are running on (AWS?) has insufficient cooling, so longer runs become slower?

Anton Lokhmotov · Answer 2 · Thu Feb 22 2024 07:38:53 GMT+0800 (China Standard Time)

@arjunsuresh Please provide full summary logs including settings like performance_sample_count.

Arjun Suresh · Answer 3 · Thu Feb 22 2024 08:22:24 GMT+0800 (China Standard Time)

Thank you @psyhtest for replying. The number of samples are different because on a repeated run we are automatically making the run longer. But we tried to match the duration for the TEST05 and performance runs - no difference. We even ran for 20 minutes each - then also no difference. But performance_sample_count = 128 worked. I think we should revisit TEST05 for retinanet considering the variable performance on different inputs.

Arjun Suresh · Answer 4 · Thu Feb 22 2024 16:48:06 GMT+0800 (China Standard Time)

Recommended to use a high performance_sample_count for RetinaNet. Will discuss this issue further for 4.1. Closing for now.