ruiruitang / beegfs-shared-slurm-on-centos7.2

A cluster of CentOS7.2 VM managed by SLURM and sharing storage with BeeGFS

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

This ARM template is inspired by Christian Smith template:

All in one cluster (BeeGFS & SLURM) on CentOS 7.2

Deploys on the same set of VM:

  • BeeGFS cluster with metadata and storage nodes
  • Slurm as Job Scheduler

Click here to deploy:

Deploy to Azure

Questions for deployement:

  1. Fill in the mandatory parameters.

  2. Select an existing resource group or enter the name of a new resource group to create.

  3. Select the resource group location.

  4. Accept the terms and agreements.

  5. Click Create.

Architecture

Logical Architecture

Alt text

The VM called storage0 is :

  • the BeeGFS metadata server + management host
  • the slurm master
  • NFS server: export the following shared storage /share/home & /share/data

The VMs called storage[1-n] are:

  • BeeGFS storage server
  • [Optionnal] some of them may also be BeeGFS metadata server (based on the template parameters)
  • Slurm compute nodes

Deployed in Azure

Alt text

BeeGFS

The BeeGFS storage is mounted on /share/scratch on every nodes

SLURM

Each compute node by default has 1 core avalaible for slurm

You should change the slurm.conf file to adapt it to the real number of cpu:

NodeName=storage[1-number_of_nodes] Procs=16

Then restart the slurm daemon:

systemctl restart slurmctld

And put the nodes on ine with scontrol:

scontrol: update NodeName=storager0 State=RESUME scontrol: update NodeName=storager1 State=RESUME scontrol: exit

Then control with:

sinfo -N -l

Accessing the cluster

Simply SSH to the master node using the IP address.

# ssh [user]@[public_ip_adress]

You can log into the first metadata node using the admin user and password specified.

Still to do

  • check that all package intalled during install_pkgs_slurm fonction in deployazure.sh are mandatory
  • let the user chose how many data disk per VM
  • use VMSS instead of VM
  • use Ganglia for monitoring
  • enble MPI if RDMA instance are used + uses HPC images of CentOS

About

A cluster of CentOS7.2 VM managed by SLURM and sharing storage with BeeGFS


Languages

Language:Shell 100.0%