ffevotte / slurm.el

Emacs extension to interact with the SLURM jobs scheduler

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Array jobs

FabianGrammes opened this issue · comments

Nice emacs extension, would be great if it would also be possible to cancel slurm array jobs (like JOBID: 118805_1) which cause problems with the slurm-job-id function...

cheers, F

Hi,

I didn't know about slurm array jobs, thanks for mentioning them. After looking at the documentation a bit, I understand that in the case of a job array, 118805 is a job id, and 118805_[1-42] are job+task ids.

When looking at the details of a job (slurm-job-details) or updating them (slurm-job-update), I guess the only sensible thing to do is to work on an individual job+task. However, when cancelling a task (slurm-job-cancel) belonging to a job array, what do you think would be the most useful? Cancel the individual task or the whole array? Maybe chose between the two action using a command prefix (e.g. d could delete the individual task, when C-u d would cancel the whole array)? Or maybe womething else I haven't thought about?

I don't have any experience using job arrays; what do you think would make the most sense as a user?

Good morning and thanks for the quick reply!

I would say that (normally) when canceling an array job something is wrong with the script, so calling
C-u d on one individual array task should cancel the whole array.

cheers, F

Hi,

I'm sorry, but the version of slurm I'm using (2.5.7) doesn't support array jobs, which makes it difficult for me to test new features related to this.

Could you please send me a (possibly anonymized) sample output of squeue -o '%.9i %9P %37j %8u %2t %.4M %.5D %.4Q %40R' showing array jobs (as well as standard jobs).
This should help me develop and somewhat test array-related features (but I fear you'll have to fool-proof it anyway in the end).

Thanks in advance.

Hi, here is the output for 2 array jobs and one standard job. Thanks a lot for your effort !
Should really learn some lisp coding :)

squeue -o '%.9i %9P %37j %8u %2t %.4M %.5D %.4Q %40R'
JOBID     PARTITION NAME                                  USER     ST TIME        NODES PRIO NODELIST(REASON)                        
126155_34 part1     tblast                                fab      R  8:10:03     1     1083 m600                                   
126155_33 part2     tblast                                fab      R  8:20:08     1     1083 m620-3                                   
119139    part1     blastx                                fab      R  3-07:06:53  1     1040 m600   

cheers, F

Whops, did not mean to close the issue

Works, thanks a bunch ! squeue is a bit slower than in the master branch, but that's ok....

It doesn't allow cancelling a whole array for now. I have another commit almost ready for this.

Thanks for the feedback. Yes, it's possible that squeue is slower now than before: I changed the way external processes are called to handle things a bit more cleanly (no need to escape arguments, ...). A side effect of this is that processes are now called asynchronously: you can continue working while emacs is waiting for squeue to terminate. Although I did not expect much performance loss, this might well be the explanation for the slowness you noticed.

Out of curiosity, are you using slurm.el locally (i.e. slurm runs on the same machine as emacs), or remotely (via TRAMP)? On the few tests I performed, I noticed a sensible slow down for remote usage, but nothing worth noting for local processes.

I use slurm.el remote via TRAMP.

cheers, F

Typing C-u d should now cancel the whole array if the job at point belongs to one.
Also, the performances should be back to those of the master branch.

Could you please test it? If everything works correctly, I will integrate the branch into master.

Just tested, works great!! The speed is much better now (when using TRAMP) and canceling whole array jobs works also.

Thanks a bunch !