lesteve / test-binder

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Some feedback and outline of possible next steps

willirath opened this issue · comments

I've been playing with this over there willirath/dask_jobqueue_workshop_materials#6 and can give some feedback.

  • Thanks for kicking this off, @lesteve! We're just a few steps away from running no-requirements training on dask jobqueue now.

  • This works on Pangeo binder! As pangeo's binder (currently) offers more ressources to the user, we can do meaningful computations with a SLURM cluster running on the same VM as the notebook server.

  • Make the whole slurm.conf part of the repo and explicitly COPY it tot the Docker image. This way, it might be easier to have a SLURM admin chime in and help.

  • In a final setting, this would lead to a look-and-feel similar to the https://examples.dask.org binder (labview plugin and juputer_server_plugin).

  • Could use help of somebody experienced with administrating a SLURM setup:

    • Jobs don't seem to stop (or be very slow at it) when I kill the scheduler.
    • After canceling jobs, new jobs don't start with FrontEndDown being given as reason.