stackhpc / ansible-slurm-appliance

A Slurm-based HPC workload management environment, driven by Ansible.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Autoscale - unsolved issues

sjpb opened this issue · comments

This issue collects notes for things which are not (yet) implemented in #151 :

  • Might need to set treewidth automatically based on cluster size
  • Cope with other failure modes
  • Cope with failures not in sync with prolog (run remediation python as systemd service?)