waliens / alan-cluster

Documentation and guidelines for the Alan GPU cluster at the University of Liège.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Alan

Documentation and guidelines for the Alan GPU cluster at the University of Liège.

The documentation assumes you have access to the private network of the university.

General actions

Cluster-wide datasets

At the moment we provide the following cluster-wide, read-only datasets which are accessible at /data/datasets:

admin@alan-master:~ $ ll /data/datasets

If you would like to propose a new cluster-wide dataset, feel free to submit a proposal.

User account setup

If you do not have an account, please submit this form to request access to the GPU cluster.

SSH keys

Once you have been provided with your account details by e-mail, we strongly recommend to authenticate your access to Alan using SSH keys. Such a key can be generated on your local machine:

you@local:~ $ ssh-keygen -t rsa -b 4096
Generating public/private rsa key pair.
Enter file in which to save the key (/home/you/.ssh/id_rsa): /home/you/.ssh/id_rsa.alan
Enter passphrase (empty for no passphrase): ************
SHA256:b0uJjgkigIbzdli+EiuZ88hvq6REvGThht8EF9SVC+o you@local
The key's randomart image is:
+---[RSA 4096]----+
|   .o. ...       |
|     .o .        |
| .. .. . .       |
|* .o.   .        |
|*O .o   S        |
|B+o*E    o .     |
|.**o=   . =      |
|X+o+ o + o .     |
|o*=+o o . .      |
+----[SHA256]-----+

At this point your public and private keypair should be present in /home/you/.ssh:

you@local:~ $ ll .ssh
-rw-r--r-- 1 you you  60 Jan  7 21:53 config
-rw------- 1 you you  1.7K Apr 29  2018 id_rsa
-rw------- 1 you you  3.4K Apr  9 12:39 id_rsa.alan
-rw-r--r-- 1 you you  737 Apr  9 12:39 id_rsa.alan.pub
-rw-r--r-- 1 you you  393 Apr 29  2018 id_rsa.pub

Finally, copy the identity file to Alan.

you@local:~ $ ssh-copy-id -i .ssh/id_rsa.alan you@alan.calc.priv

Now you should be able to login to the cluster using your Alan identity file.

you@local:~ $ ssh -i .ssh/id_rsa.alan you@alan.calc.priv

To prevent you from having to type the -i flag every time you log in, you can simply add the following to .ssh/config.

Host alan
  HostName alan.calc.priv
  IdentityFile ~/.ssh/id_rsa.alan

Transferring datasets

This section shows you how to transfer your datasets to the GPU cluster. It is a good practice to centralize your datasets in a common folder:

you@alan-master:~ $ mkdir datasets
you@alan-master:~ $ cd datasets

Next, the transfer is initiated using scp from the machine storing the data (e.g., your desktop computer) to the cluster:

you@local:~ $ scp -r my_amazing_dataset alan.calc.priv:~/datasets/

Alternatively, one can rely on rsync:

you@local:~ $ rsync -r -v --progress my_amazing_dataset -e ssh you@alan.calc.priv:~/datasets/

Preparing an Anaconda environment

Recommended. This installs a Python 3 environment by default.

you@alan-master:~ $ wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
you@alan-master:~ $ sh Miniconda3-latest-Linux-x86_64.sh

Preparing your Deep Learning environment

PyTorch

TODO

TensorFlow

TODO

Cluster usage

CECI cluster documentation features a thorough Slurm guide.

Useful slurm commands

  • sbatch: submitting a job to the cluster
  • for reserving gpu(s) use: --gres=gpu:N_GPUS
  • scancel: cancelling queued or running jobs
  • srun: launching a job step
  • squeue: displaying jobs currently in the queue and their associated metadata
  • sacct: display accounting data for jobs (including finished/cancelled jobs)
  • sinfo: getting information about the cluster and its nodes

Tutorials

  1. Hello world

About

Documentation and guidelines for the Alan GPU cluster at the University of Liège.

License:BSD 3-Clause "New" or "Revised" License