troycomi / reportseff

Tabular seff

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Wrong memory efficiency when using "srun"

angel-devicente opened this issue · comments

Hello,

probably related to #37, but a bit different, so I thought I'd open a new issue.

When I run:
srun -n 8 stress -m 1 -t 52 --vm-keep --vm-bytes 1800M
I use 8 CPUs and almost 16GB, but reportseff gets the CPU efficiency OK, but the memory efficiency way off (it basically reports I only used 1800M).

$ seff 131042
######################## JOB EFFICIENCY REPORT ########################
# Job ID:          131042
# State:           COMPLETED (exit code 0)
# Cores:           8
# CPU Utilized:    00:06:58
# CPU Efficiency:  98.58% of 00:07:04 core-walltime
# Wall-clock time: 00:00:53
# Memory Utilized: 14.86 GB (estimated maximum)
#######################################################################

$ reportseff --debug 131042
^|^8^|^00:00:53^|^131042^|^131042^|^^|^1^|^16000M^|^COMPLETED^|^00:01:00^|^06:57.815
^|^8^|^00:00:53^|^131042.batch^|^131042.batch^|^20264K^|^1^|^^|^COMPLETED^|^^|^00:00.034
^|^8^|^00:00:53^|^131042.extern^|^131042.extern^|^1052K^|^1^|^^|^COMPLETED^|^^|^00:00.001
^|^8^|^00:00:53^|^131042.0^|^131042.0^|^1947276K^|^1^|^^|^COMPLETED^|^^|^06:57.779

   JobID    State       Elapsed  TimeEff   CPUEff   MemEff 
  131042  COMPLETED    00:00:53   88.3%    98.3%    11.9% 

Can you run seff -d 131042 to get the raw data? Seems the memory reported by sacct should be scaled by ntasks as shown here

$ seff -d 131042
Slurm data: JobID ArrayJobID User Group State Clustername Ncpus Nnodes Ntasks Reqmem PerNode Cput Walltime Mem ExitStatus
Slurm data: 131042  xxx xxx COMPLETED xxxx 8 1 8 16384000 0 418 53 15578208 0

######################## JOB EFFICIENCY REPORT ########################
# Job ID:          131042
# Cluster:         xxx
# User/Group:      xxx/xxx
# State:           COMPLETED (exit code 0)
# Cores:           8
# CPU Utilized:    00:06:58
# CPU Efficiency:  98.58% of 00:07:04 core-walltime
# Wall-clock time: 00:00:53
# Memory Utilized: 14.86 GB (estimated maximum)
#######################################################################

Thank you for providing this information and opening the issue. I don't do many multi-task jobs so their test coverage is lighter than it should be. I should have time to fix this in a week or so.

Great, thanks. If you need to run any tests, please let me know.
And, BTW, many thanks for developing this tool!

Should be addressed with version 2.7.6. Please reopen if you notice any problems.

Awesome. I'll give it a try as soon as I can. Thanks.