troycomi / reportseff

Tabular seff

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

reportseff 2.7.4: python exception: ValueError when parsing `CANCELLED by <ID>` with --format '+jobname'

tbroch-rv opened this issue · comments

Failure looks like this,

$ reportseff -u some_user_name --format "+jobname" --extra-args '-j 29919940'
Error processing entry: {'AdminComment': '', 'AllocCPUS': '1', 'Elapsed': '00:00:00', 'JobID': '29919940', 'JobIDRaw': '29919940', 'JobName': 'some_jobname . ', 'MaxRSS': ' grep hot', 'NNodes': '', 'REQMEM': '1', 'State': '8000M', 'Timelimit': 'CANCELLED by 1977600432', 'TotalCPU': 'Partition_Limit'}
Traceback (most recent call last):
  File "/<install_path>/bin/reportseff", line 8, in <module>
    sys.exit(main())
  File "/<install_path>/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/<install_path>/lib/python3.10/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/<install_path>/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/<install_path>/lib/python3.10/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/<install_path>/lib/python3.10/site-packages/reportseff/console.py", line 125, in main
    output, entries = get_jobs(args)
  File "/<install_path>/lib/python3.10/site-packages/reportseff/console.py", line 185, in get_jobs
    raise error
  File "/<install_path>/lib/python3.10/site-packages/reportseff/console.py", line 182, in get_jobs
    job_collection.process_entry(entry, add_job=add_jobs)
  File "/<install_path>/lib/python3.10/site-packages/reportseff/job_collection.py", line 223, in process_entry
    self.jobs[job_id].update(entry)
  File "/<install_path>/lib/python3.10/site-packages/reportseff/job.py", line 111, in update
    self._update_main_job(entry)
  File "/<install_path>/lib/python3.10/site-packages/reportseff/job.py", line 143, in _update_main_job
    requested = _parse_slurm_timedelta(entry["Timelimit"])
  File "/<install_path>/lib/python3.10/site-packages/reportseff/job.py", line 318, in _parse_slurm_timedelta
    raise ValueError(f'Failed to parse time "{delta}"')
ValueError: Failed to parse time "CANCELLED by 1977600432"

$ sacct -j 29919940 -o State%100
                                                                                               State 
---------------------------------------------------------------------------------------------------- 
                                                                             CANCELLED by 1977600432

and dependent on presence of --format "+jobname" as running w/ a different set of --format WAI,

$ reportseff -u some_user_name --format "+Elapsed" --extra-args '-j 29919940'
     JobID    State       Elapsed  TimeEff   CPUEff   MemEff   Elapsed  
  29919940  CANCELLED    00:00:00    ---      ---      0.0%    00:00:00

$ reportseff --version
reportseff, version 2.7.4

Thanks for quick response. Below is the raw output with --debug included,

$ reportseff --debug -u some_user_name --format "+jobname" --extra-args '-j 29919940'
|1|00:00:00|29919940|29919940|some_jobname . | grep hot||1|12000M|CANCELLED by 1977600432|Partition_Limit|00:00:00

Error processing entry: {'AdminComment': '', 'AllocCPUS': '1', 'Elapsed': '00:00:00', 'JobID': '29919940', 'JobIDRaw': '29919940', 'JobName': 'some_jobname . ', 'MaxRSS': ' grep hot', 'NNodes': '', 'REQMEM': '1', 'State': '12000M', 'Timelimit': 'CANCELLED by 1977600432', 'TotalCPU': 'Partition_Limit'}
Traceback (most recent call last):
  File "/<install_path>/bin/reportseff", line 8, in <module>
    sys.exit(main())
  File "/<install_path>/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/<install_path>/lib/python3.10/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/<install_path>/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/<install_path>/lib/python3.10/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/<install_path>/lib/python3.10/site-packages/reportseff/console.py", line 125, in main
    output, entries = get_jobs(args)
  File "/<install_path>/lib/python3.10/site-packages/reportseff/console.py", line 185, in get_jobs
    raise error
  File "/<install_path>/lib/python3.10/site-packages/reportseff/console.py", line 182, in get_jobs
    job_collection.process_entry(entry, add_job=add_jobs)
  File "/<install_path>/lib/python3.10/site-packages/reportseff/job_collection.py", line 223, in process_entry
    self.jobs[job_id].update(entry)
  File "/<install_path>/lib/python3.10/site-packages/reportseff/job.py", line 111, in update
    self._update_main_job(entry)
  File "/<install_path>/lib/python3.10/site-packages/reportseff/job.py", line 143, in _update_main_job
    requested = _parse_slurm_timedelta(entry["Timelimit"])
  File "/<install_path>/lib/python3.10/site-packages/reportseff/job.py", line 318, in _parse_slurm_timedelta
    raise ValueError(f'Failed to parse time "{delta}"')
ValueError: Failed to parse time "CANCELLED by 1977600432"

Ah, your job name has a pipe, which is the same character used as the delimiter in sacct.

I could replace the delimiter with something else, maybe || or ## but could eventually run into a similar problem (albeit those are less common than a simple pipe).

Alternatively I could use more complex parsing logic to check for additional pipes in the jobname.

These are partially notes for me in the future. I'm moving this month and likely won't have time to address this for several weeks. If someone else wants to submit a PR I can review in the meantime. The offending line is here and unit tests should be easy to add.

Ok, #32 should address this. Can you try v2.7.5 and close if this is fixed?

lgtm ... thanks for the fix!

reportseff --debug -u some_user_name --format "+jobname" --extra-args '-j 29919940'
^|^1^|^00:00:00^|^29919940^|^29919940^|^some_job_name . | grep hot^|^^|^1^|^12000M^|^CANCELLED by 1977600432^|^Partition_Limit^|^00:00:00

     JobID    State       Elapsed  TimeEff   CPUEff   MemEff               JobName             
  29919940  CANCELLED    00:00:00    ---      ---      0.0%    some_job_name . | grep hot