Wrong memory efficiency when using "srun"
angel-devicente opened this issue · comments
Hello,
probably related to #37, but a bit different, so I thought I'd open a new issue.
When I run:
srun -n 8 stress -m 1 -t 52 --vm-keep --vm-bytes 1800M
I use 8 CPUs and almost 16GB, but reportseff
gets the CPU efficiency OK, but the memory efficiency way off (it basically reports I only used 1800M).
$ seff 131042
######################## JOB EFFICIENCY REPORT ########################
# Job ID: 131042
# State: COMPLETED (exit code 0)
# Cores: 8
# CPU Utilized: 00:06:58
# CPU Efficiency: 98.58% of 00:07:04 core-walltime
# Wall-clock time: 00:00:53
# Memory Utilized: 14.86 GB (estimated maximum)
#######################################################################
$ reportseff --debug 131042
^|^8^|^00:00:53^|^131042^|^131042^|^^|^1^|^16000M^|^COMPLETED^|^00:01:00^|^06:57.815
^|^8^|^00:00:53^|^131042.batch^|^131042.batch^|^20264K^|^1^|^^|^COMPLETED^|^^|^00:00.034
^|^8^|^00:00:53^|^131042.extern^|^131042.extern^|^1052K^|^1^|^^|^COMPLETED^|^^|^00:00.001
^|^8^|^00:00:53^|^131042.0^|^131042.0^|^1947276K^|^1^|^^|^COMPLETED^|^^|^06:57.779
JobID State Elapsed TimeEff CPUEff MemEff
131042 COMPLETED 00:00:53 88.3% 98.3% 11.9%
Can you run seff -d 131042
to get the raw data? Seems the memory reported by sacct should be scaled by ntasks as shown here
$ seff -d 131042
Slurm data: JobID ArrayJobID User Group State Clustername Ncpus Nnodes Ntasks Reqmem PerNode Cput Walltime Mem ExitStatus
Slurm data: 131042 xxx xxx COMPLETED xxxx 8 1 8 16384000 0 418 53 15578208 0
######################## JOB EFFICIENCY REPORT ########################
# Job ID: 131042
# Cluster: xxx
# User/Group: xxx/xxx
# State: COMPLETED (exit code 0)
# Cores: 8
# CPU Utilized: 00:06:58
# CPU Efficiency: 98.58% of 00:07:04 core-walltime
# Wall-clock time: 00:00:53
# Memory Utilized: 14.86 GB (estimated maximum)
#######################################################################
Thank you for providing this information and opening the issue. I don't do many multi-task jobs so their test coverage is lighter than it should be. I should have time to fix this in a week or so.
Great, thanks. If you need to run any tests, please let me know.
And, BTW, many thanks for developing this tool!
Should be addressed with version 2.7.6. Please reopen if you notice any problems.
Awesome. I'll give it a try as soon as I can. Thanks.