reportseff 2.7.4: python exception: ValueError when parsing `CANCELLED by <ID>` with --format '+jobname'
tbroch-rv opened this issue · comments
Failure looks like this,
$ reportseff -u some_user_name --format "+jobname" --extra-args '-j 29919940'
Error processing entry: {'AdminComment': '', 'AllocCPUS': '1', 'Elapsed': '00:00:00', 'JobID': '29919940', 'JobIDRaw': '29919940', 'JobName': 'some_jobname . ', 'MaxRSS': ' grep hot', 'NNodes': '', 'REQMEM': '1', 'State': '8000M', 'Timelimit': 'CANCELLED by 1977600432', 'TotalCPU': 'Partition_Limit'}
Traceback (most recent call last):
File "/<install_path>/bin/reportseff", line 8, in <module>
sys.exit(main())
File "/<install_path>/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/<install_path>/lib/python3.10/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/<install_path>/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/<install_path>/lib/python3.10/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/<install_path>/lib/python3.10/site-packages/reportseff/console.py", line 125, in main
output, entries = get_jobs(args)
File "/<install_path>/lib/python3.10/site-packages/reportseff/console.py", line 185, in get_jobs
raise error
File "/<install_path>/lib/python3.10/site-packages/reportseff/console.py", line 182, in get_jobs
job_collection.process_entry(entry, add_job=add_jobs)
File "/<install_path>/lib/python3.10/site-packages/reportseff/job_collection.py", line 223, in process_entry
self.jobs[job_id].update(entry)
File "/<install_path>/lib/python3.10/site-packages/reportseff/job.py", line 111, in update
self._update_main_job(entry)
File "/<install_path>/lib/python3.10/site-packages/reportseff/job.py", line 143, in _update_main_job
requested = _parse_slurm_timedelta(entry["Timelimit"])
File "/<install_path>/lib/python3.10/site-packages/reportseff/job.py", line 318, in _parse_slurm_timedelta
raise ValueError(f'Failed to parse time "{delta}"')
ValueError: Failed to parse time "CANCELLED by 1977600432"
$ sacct -j 29919940 -o State%100
State
----------------------------------------------------------------------------------------------------
CANCELLED by 1977600432
and dependent on presence of --format "+jobname"
as running w/ a different set of --format WAI,
$ reportseff -u some_user_name --format "+Elapsed" --extra-args '-j 29919940'
JobID State Elapsed TimeEff CPUEff MemEff Elapsed
29919940 CANCELLED 00:00:00 --- --- 0.0% 00:00:00
$ reportseff --version
reportseff, version 2.7.4
Thanks for quick response. Below is the raw output with --debug
included,
$ reportseff --debug -u some_user_name --format "+jobname" --extra-args '-j 29919940'
|1|00:00:00|29919940|29919940|some_jobname . | grep hot||1|12000M|CANCELLED by 1977600432|Partition_Limit|00:00:00
Error processing entry: {'AdminComment': '', 'AllocCPUS': '1', 'Elapsed': '00:00:00', 'JobID': '29919940', 'JobIDRaw': '29919940', 'JobName': 'some_jobname . ', 'MaxRSS': ' grep hot', 'NNodes': '', 'REQMEM': '1', 'State': '12000M', 'Timelimit': 'CANCELLED by 1977600432', 'TotalCPU': 'Partition_Limit'}
Traceback (most recent call last):
File "/<install_path>/bin/reportseff", line 8, in <module>
sys.exit(main())
File "/<install_path>/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/<install_path>/lib/python3.10/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/<install_path>/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/<install_path>/lib/python3.10/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/<install_path>/lib/python3.10/site-packages/reportseff/console.py", line 125, in main
output, entries = get_jobs(args)
File "/<install_path>/lib/python3.10/site-packages/reportseff/console.py", line 185, in get_jobs
raise error
File "/<install_path>/lib/python3.10/site-packages/reportseff/console.py", line 182, in get_jobs
job_collection.process_entry(entry, add_job=add_jobs)
File "/<install_path>/lib/python3.10/site-packages/reportseff/job_collection.py", line 223, in process_entry
self.jobs[job_id].update(entry)
File "/<install_path>/lib/python3.10/site-packages/reportseff/job.py", line 111, in update
self._update_main_job(entry)
File "/<install_path>/lib/python3.10/site-packages/reportseff/job.py", line 143, in _update_main_job
requested = _parse_slurm_timedelta(entry["Timelimit"])
File "/<install_path>/lib/python3.10/site-packages/reportseff/job.py", line 318, in _parse_slurm_timedelta
raise ValueError(f'Failed to parse time "{delta}"')
ValueError: Failed to parse time "CANCELLED by 1977600432"
Ah, your job name has a pipe, which is the same character used as the delimiter in sacct.
I could replace the delimiter with something else, maybe ||
or ##
but could eventually run into a similar problem (albeit those are less common than a simple pipe).
Alternatively I could use more complex parsing logic to check for additional pipes in the jobname.
These are partially notes for me in the future. I'm moving this month and likely won't have time to address this for several weeks. If someone else wants to submit a PR I can review in the meantime. The offending line is here and unit tests should be easy to add.
lgtm ... thanks for the fix!
reportseff --debug -u some_user_name --format "+jobname" --extra-args '-j 29919940'
^|^1^|^00:00:00^|^29919940^|^29919940^|^some_job_name . | grep hot^|^^|^1^|^12000M^|^CANCELLED by 1977600432^|^Partition_Limit^|^00:00:00
JobID State Elapsed TimeEff CPUEff MemEff JobName
29919940 CANCELLED 00:00:00 --- --- 0.0% some_job_name . | grep hot