troycomi / reportseff

Tabular seff

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Memory Efficiency differs between reportseff and seff for multi-node jobs

panchyni opened this issue · comments

I've just started using reportseff on our university cluster specifically for the reasons mentioned on your blog post (be good to your scheduler) and overall its great. However, I've noticed that the memory efficiency reports by reportseff and seff differ specifically for multi node jobs. For example:

reportseff:
65638294 COMPLETED 07:36:03 4.5% 98.6% 182.8%

seff:
Job ID: 65638294
State: COMPLETED (exit code 0)
Nodes: 2
Cores per node: 8
CPU Utilized: 4-23:56:22
CPU Efficiency: 98.62% of 5-01:36:48 core-walltime
Job Wall-clock time: 07:36:03
Memory Utilized: 7.70 GB
Memory Efficiency: 24.06% of 32.00 GB

I relatively new to both reportseff and seff so I am not sure if this an underlying issue with reporting for the SLURM side, but any insight into the origin the difference would be useful for interpreting the results. Thanks!

Honestly I don't have much test data with multiple nodes. Can you provide the output of reportseff with the --debug option? That gives the raw sacct output that I can use for testing.

I've pasted the reporstseff output for that specific job with the --debug option. Let me know if there is anything else I can do to help out.


|16|07:36:03|65638294|65638294||2|32G|COMPLETED|6-23:59:00|4-23:56:21
|1|07:36:03|65638294.batch|65638294.batch|1147220K|1||COMPLETED||07:30:20
|16|07:36:03|65638294.extern|65638294.extern|0|2||COMPLETED||00:00.001
|15|00:00:11|65638294.0|65638294.0|0|1||COMPLETED||00:11.830
|15|00:02:15|65638294.1|65638294.1|4455540K|1||COMPLETED||31:09.458
|15|00:00:10|65638294.2|65638294.2|0|1||COMPLETED||00:00:04
|15|00:00:08|65638294.3|65638294.3|0|1||COMPLETED||00:09.602
|15|00:00:07|65638294.4|65638294.4|0|1||COMPLETED||00:56.827
|15|00:00:06|65638294.5|65638294.5|0|1||COMPLETED||00:03.512
|15|00:00:08|65638294.6|65638294.6|0|1||COMPLETED||00:08.520
|15|00:00:13|65638294.7|65638294.7|0|1||COMPLETED||01:02.013
|15|00:00:02|65638294.8|65638294.8|0|1||COMPLETED||00:03.639
|15|00:00:06|65638294.9|65638294.9|0|1||COMPLETED||00:08.683
|15|00:00:08|65638294.10|65638294.10|0|1||COMPLETED||00:57.438
|15|00:00:06|65638294.11|65638294.11|0|1||COMPLETED||00:03.642
|15|00:00:09|65638294.12|65638294.12|0|1||COMPLETED||00:10.271
|15|00:01:24|65638294.13|65638294.13|4149700K|1||COMPLETED||17:18.067
|15|00:00:01|65638294.14|65638294.14|0|1||COMPLETED||00:03.302
|15|00:00:10|65638294.15|65638294.15|0|1||COMPLETED||00:14.615
|15|00:06:45|65638294.16|65638294.16|4748052K|1||COMPLETED||01:36:40
|15|00:00:10|65638294.17|65638294.17|0|1||COMPLETED||00:03.864
|15|00:00:09|65638294.18|65638294.18|0|1||COMPLETED||00:48.987
|15|01:32:53|65638294.19|65638294.19|7734356K|1||COMPLETED||23:09:33
|15|00:00:01|65638294.20|65638294.20|0|1||COMPLETED||00:03.520
|15|00:00:07|65638294.21|65638294.21|0|1||COMPLETED||00:50.015
|15|00:55:17|65638294.22|65638294.22|8074500K|1||COMPLETED||13:45:29
|15|00:00:13|65638294.23|65638294.23|0|1||COMPLETED||00:04.413
|15|00:00:12|65638294.24|65638294.24|0|1||COMPLETED||00:49.100
|15|00:57:41|65638294.25|65638294.25|7883152K|1||COMPLETED||14:20:36
|15|00:00:01|65638294.26|65638294.26|0|1||COMPLETED||00:03.953
|15|00:00:05|65638294.27|65638294.27|0|1||COMPLETED||00:47.223
|15|01:00:17|65638294.28|65638294.28|7715752K|1||COMPLETED||14:59:40
|15|00:00:06|65638294.29|65638294.29|0|1||COMPLETED||00:04.341
|15|00:00:07|65638294.30|65638294.30|0|1||COMPLETED||00:50.416
|15|01:22:31|65638294.31|65638294.31|7663264K|1||COMPLETED||20:33:59
|15|00:00:05|65638294.32|65638294.32|0|1||COMPLETED||00:04.199
|15|00:00:08|65638294.33|65638294.33|0|1||COMPLETED||00:50.009
|15|01:32:23|65638294.34|65638294.34|7764884K|1||COMPLETED||23:01:52
|15|00:00:06|65638294.35|65638294.35|0|1||COMPLETED||00:04.527

 JobID    State       Elapsed  TimeEff   CPUEff   MemEff

65638294 COMPLETED 07:36:03 4.5% 98.6% 182.8%

Great thank you!