Memory Efficiency differs between reportseff and seff for multi-node jobs

Question

Memory Efficiency differs between reportseff and seff for multi-node jobs

panchyni opened this issue 2 years ago · comments

I've just started using reportseff on our university cluster specifically for the reasons mentioned on your blog post (be good to your scheduler) and overall its great. However, I've noticed that the memory efficiency reports by reportseff and seff differ specifically for multi node jobs. For example:

reportseff:
65638294 COMPLETED 07:36:03 4.5% 98.6% 182.8%

seff:
Job ID: 65638294
State: COMPLETED (exit code 0)
Nodes: 2
Cores per node: 8
CPU Utilized: 4-23:56:22
CPU Efficiency: 98.62% of 5-01:36:48 core-walltime
Job Wall-clock time: 07:36:03
Memory Utilized: 7.70 GB
Memory Efficiency: 24.06% of 32.00 GB

I relatively new to both reportseff and seff so I am not sure if this an underlying issue with reporting for the SLURM side, but any insight into the origin the difference would be useful for interpreting the results. Thanks!

Troy Comi · Answer 1 · Fri Dec 02 2022 04:46:09 GMT+0800 (China Standard Time)

Honestly I don't have much test data with multiple nodes. Can you provide the output of reportseff with the --debug option? That gives the raw sacct output that I can use for testing.

Nicholas Panchy · Answer 2 · Fri Dec 02 2022 06:12:04 GMT+0800 (China Standard Time)

I've pasted the reporstseff output for that specific job with the --debug option. Let me know if there is anything else I can do to help out.

|16|07:36:03|65638294|65638294||2|32G|COMPLETED|6-23:59:00|4-23:56:21
|1|07:36:03|65638294.batch|65638294.batch|1147220K|1||COMPLETED||07:30:20
|16|07:36:03|65638294.extern|65638294.extern|0|2||COMPLETED||00:00.001
|15|00:00:11|65638294.0|65638294.0|0|1||COMPLETED||00:11.830
|15|00:02:15|65638294.1|65638294.1|4455540K|1||COMPLETED||31:09.458
|15|00:00:10|65638294.2|65638294.2|0|1||COMPLETED||00:00:04
|15|00:00:08|65638294.3|65638294.3|0|1||COMPLETED||00:09.602
|15|00:00:07|65638294.4|65638294.4|0|1||COMPLETED||00:56.827
|15|00:00:06|65638294.5|65638294.5|0|1||COMPLETED||00:03.512
|15|00:00:08|65638294.6|65638294.6|0|1||COMPLETED||00:08.520
|15|00:00:13|65638294.7|65638294.7|0|1||COMPLETED||01:02.013
|15|00:00:02|65638294.8|65638294.8|0|1||COMPLETED||00:03.639
|15|00:00:06|65638294.9|65638294.9|0|1||COMPLETED||00:08.683
|15|00:00:08|65638294.10|65638294.10|0|1||COMPLETED||00:57.438
|15|00:00:06|65638294.11|65638294.11|0|1||COMPLETED||00:03.642
|15|00:00:09|65638294.12|65638294.12|0|1||COMPLETED||00:10.271
|15|00:01:24|65638294.13|65638294.13|4149700K|1||COMPLETED||17:18.067
|15|00:00:01|65638294.14|65638294.14|0|1||COMPLETED||00:03.302
|15|00:00:10|65638294.15|65638294.15|0|1||COMPLETED||00:14.615
|15|00:06:45|65638294.16|65638294.16|4748052K|1||COMPLETED||01:36:40
|15|00:00:10|65638294.17|65638294.17|0|1||COMPLETED||00:03.864
|15|00:00:09|65638294.18|65638294.18|0|1||COMPLETED||00:48.987
|15|01:32:53|65638294.19|65638294.19|7734356K|1||COMPLETED||23:09:33
|15|00:00:01|65638294.20|65638294.20|0|1||COMPLETED||00:03.520
|15|00:00:07|65638294.21|65638294.21|0|1||COMPLETED||00:50.015
|15|00:55:17|65638294.22|65638294.22|8074500K|1||COMPLETED||13:45:29
|15|00:00:13|65638294.23|65638294.23|0|1||COMPLETED||00:04.413
|15|00:00:12|65638294.24|65638294.24|0|1||COMPLETED||00:49.100
|15|00:57:41|65638294.25|65638294.25|7883152K|1||COMPLETED||14:20:36
|15|00:00:01|65638294.26|65638294.26|0|1||COMPLETED||00:03.953
|15|00:00:05|65638294.27|65638294.27|0|1||COMPLETED||00:47.223
|15|01:00:17|65638294.28|65638294.28|7715752K|1||COMPLETED||14:59:40
|15|00:00:06|65638294.29|65638294.29|0|1||COMPLETED||00:04.341
|15|00:00:07|65638294.30|65638294.30|0|1||COMPLETED||00:50.416
|15|01:22:31|65638294.31|65638294.31|7663264K|1||COMPLETED||20:33:59
|15|00:00:05|65638294.32|65638294.32|0|1||COMPLETED||00:04.199
|15|00:00:08|65638294.33|65638294.33|0|1||COMPLETED||00:50.009
|15|01:32:23|65638294.34|65638294.34|7764884K|1||COMPLETED||23:01:52
|15|00:00:06|65638294.35|65638294.35|0|1||COMPLETED||00:04.527

 JobID    State       Elapsed  TimeEff   CPUEff   MemEff

65638294 COMPLETED 07:36:03 4.5% 98.6% 182.8%

Troy Comi · Answer 3 · Fri Dec 02 2022 23:00:44 GMT+0800 (China Standard Time)

Great thank you!