Memory Efficiency differs between reportseff and seff for multi-node jobs
panchyni opened this issue · comments
I've just started using reportseff on our university cluster specifically for the reasons mentioned on your blog post (be good to your scheduler) and overall its great. However, I've noticed that the memory efficiency reports by reportseff and seff differ specifically for multi node jobs. For example:
reportseff:
65638294 COMPLETED 07:36:03 4.5% 98.6% 182.8%
seff:
Job ID: 65638294
State: COMPLETED (exit code 0)
Nodes: 2
Cores per node: 8
CPU Utilized: 4-23:56:22
CPU Efficiency: 98.62% of 5-01:36:48 core-walltime
Job Wall-clock time: 07:36:03
Memory Utilized: 7.70 GB
Memory Efficiency: 24.06% of 32.00 GB
I relatively new to both reportseff and seff so I am not sure if this an underlying issue with reporting for the SLURM side, but any insight into the origin the difference would be useful for interpreting the results. Thanks!
Honestly I don't have much test data with multiple nodes. Can you provide the output of reportseff with the --debug
option? That gives the raw sacct output that I can use for testing.
I've pasted the reporstseff output for that specific job with the --debug option. Let me know if there is anything else I can do to help out.
|16|07:36:03|65638294|65638294||2|32G|COMPLETED|6-23:59:00|4-23:56:21
|1|07:36:03|65638294.batch|65638294.batch|1147220K|1||COMPLETED||07:30:20
|16|07:36:03|65638294.extern|65638294.extern|0|2||COMPLETED||00:00.001
|15|00:00:11|65638294.0|65638294.0|0|1||COMPLETED||00:11.830
|15|00:02:15|65638294.1|65638294.1|4455540K|1||COMPLETED||31:09.458
|15|00:00:10|65638294.2|65638294.2|0|1||COMPLETED||00:00:04
|15|00:00:08|65638294.3|65638294.3|0|1||COMPLETED||00:09.602
|15|00:00:07|65638294.4|65638294.4|0|1||COMPLETED||00:56.827
|15|00:00:06|65638294.5|65638294.5|0|1||COMPLETED||00:03.512
|15|00:00:08|65638294.6|65638294.6|0|1||COMPLETED||00:08.520
|15|00:00:13|65638294.7|65638294.7|0|1||COMPLETED||01:02.013
|15|00:00:02|65638294.8|65638294.8|0|1||COMPLETED||00:03.639
|15|00:00:06|65638294.9|65638294.9|0|1||COMPLETED||00:08.683
|15|00:00:08|65638294.10|65638294.10|0|1||COMPLETED||00:57.438
|15|00:00:06|65638294.11|65638294.11|0|1||COMPLETED||00:03.642
|15|00:00:09|65638294.12|65638294.12|0|1||COMPLETED||00:10.271
|15|00:01:24|65638294.13|65638294.13|4149700K|1||COMPLETED||17:18.067
|15|00:00:01|65638294.14|65638294.14|0|1||COMPLETED||00:03.302
|15|00:00:10|65638294.15|65638294.15|0|1||COMPLETED||00:14.615
|15|00:06:45|65638294.16|65638294.16|4748052K|1||COMPLETED||01:36:40
|15|00:00:10|65638294.17|65638294.17|0|1||COMPLETED||00:03.864
|15|00:00:09|65638294.18|65638294.18|0|1||COMPLETED||00:48.987
|15|01:32:53|65638294.19|65638294.19|7734356K|1||COMPLETED||23:09:33
|15|00:00:01|65638294.20|65638294.20|0|1||COMPLETED||00:03.520
|15|00:00:07|65638294.21|65638294.21|0|1||COMPLETED||00:50.015
|15|00:55:17|65638294.22|65638294.22|8074500K|1||COMPLETED||13:45:29
|15|00:00:13|65638294.23|65638294.23|0|1||COMPLETED||00:04.413
|15|00:00:12|65638294.24|65638294.24|0|1||COMPLETED||00:49.100
|15|00:57:41|65638294.25|65638294.25|7883152K|1||COMPLETED||14:20:36
|15|00:00:01|65638294.26|65638294.26|0|1||COMPLETED||00:03.953
|15|00:00:05|65638294.27|65638294.27|0|1||COMPLETED||00:47.223
|15|01:00:17|65638294.28|65638294.28|7715752K|1||COMPLETED||14:59:40
|15|00:00:06|65638294.29|65638294.29|0|1||COMPLETED||00:04.341
|15|00:00:07|65638294.30|65638294.30|0|1||COMPLETED||00:50.416
|15|01:22:31|65638294.31|65638294.31|7663264K|1||COMPLETED||20:33:59
|15|00:00:05|65638294.32|65638294.32|0|1||COMPLETED||00:04.199
|15|00:00:08|65638294.33|65638294.33|0|1||COMPLETED||00:50.009
|15|01:32:23|65638294.34|65638294.34|7764884K|1||COMPLETED||23:01:52
|15|00:00:06|65638294.35|65638294.35|0|1||COMPLETED||00:04.527
JobID State Elapsed TimeEff CPUEff MemEff
65638294 COMPLETED 07:36:03 4.5% 98.6% 182.8%
Great thank you!