galaxyproject / ansible-slurm

Ansible role for installing and managing the Slurm Workload Manager

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

generating slurm.conf values

vphan13 opened this issue · comments

I'm a bit of an ansible noob here. . .but,

When generating the slurm.conf file

Instead of hard coded values:
slurm_nodes:

  • name: "{{ headnode }}"
    CoresPerSocket: "6"
    CPUs: "12"
    Gres: "gpu:p620:1"
    NodeAddr: "{{ headnode }}"
    RealMemory: "31846"
    Sockets: "1"
    ThreadsPerCore: "2"
    Feature: "gpu,intel,ht"
    State: "UNKNOWN"

Is it possible to get the values from ansible_facts . . .something along the lines of

slurm_nodes:

  • name: "{{ headnode }}"
    CoresPerSocket: "{{ ansible_facts['ansible_processor_cores'] }}"
    CPUs: "{{ ansible_facts['ansible_processor_vcpu'] }}"
    Gres: "gpu:p620:1"
    NodeAddr: "{{ headnode }}"
    RealMemory: {{ ansible_facts['ansible_memory_mb.real.total'] }}
    Sockets: "1"
    ThreadsPerCore: "{{ ansible_facts['ansible_processor_threads_per_core'] }}"
    Feature: "gpu,intel,ht"
    State: "UNKNOWN"

Yes, it's possible to use the values from ansible_facts to dynamically populate the values of your slurm.conf file. The template you've written is largely correct, but there are a few potential issues to be aware of:

Ensure Ansible Facts are Collected: Before you can use ansible_facts, you need to make sure they are gathered. Ansible gathers facts about the system it's running on by default, but this behavior can be changed. If you're not seeing the facts you expect, check the gather_facts setting in your playbook.

Variable Existence: Not all ansible_facts variables might exist on every system. For example, ansible_processor_vcpu might not be available on certain systems. It's good practice to include a default value or handle the situation where the variable might not exist. You can do this with the default filter, like so: {{ ansible_facts['ansible_processor_vcpu'] | default(1) }}.

Value Types: Be careful about the types of values that ansible_facts provides. For instance, ansible_memory_mb.real.total provides a number, not a string. In your example, you didn't quote this value, which is correct if the field expects a number. But if the field expects a string, you should convert it with the string filter, like so: {{ ansible_facts['ansible_memory_mb.real.total'] | string }}.

Here's your example with the modifications:

slurm_nodes:
    name: "{{ headnode }}"
    CoresPerSocket: "{{ ansible_facts['ansible_processor_cores'] | default(1) | string }}"
    CPUs: "{{ ansible_facts['ansible_processor_vcpu'] | default(1) | string }}"
    Gres: "gpu:p620:1"
    NodeAddr: "{{ headnode }}"
    RealMemory: "{{ ansible_facts['ansible_memory_mb.real.total'] | default(1024) | string }}"
    Sockets: "1"
    ThreadsPerCore: "{{ ansible_facts['ansible_processor_threads_per_core'] | default(1) | string }}"
    Feature: "gpu,intel,ht"
    State: "UNKNOWN"

Replace the default(1) and default(1024) with the actual default values you want for your use case.

Thanks for the detailed reply, your example didn't work for me, but I'm pretty sure I have set gather_facts to true somewhere. I think I have enough info to figure it out. I will post up what worked for me in case there are others who have the same question

Here is a working configuration that queries the values for every node using ansible facts. For large clusters, this is still a manual process since we'd still need to create this stanza for every node in the cluster. It would be nice to be able to loop through the host members of a group to generate the slurm.conf values.

- name: "nodec"
NodeAddr: "nodec"
CPUs: "{{ hostvars['nodec']['ansible_facts']['processor_vcpus'] }}"
RealMemory: "{{ hostvars['nodec']['ansible_memory_mb']['real']['total'] }}"
Sockets: "{{ hostvars['nodec']['ansible_processor_count'] }}"
CoresPerSocket: "{{ hostvars['nodec']['ansible_processor_cores'] }}"
ThreadsPerCore: "2"
State: "UNKNOWN"

edit: Never mind, I saw the example in the README. I believe the following should work

NodeAddr: "node[1-10][a-d]"