s2t2 / pegasus-notes

notes for connecting to GW's high performance computing cluster

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

pegasus-notes

Notes for connecting to GW's high performance computing cluster, Pegasus.

Consult this Getting Started guide for more info.

Getting Started

Generating SSH Keys

Generate a new SSH public / private key pair by following this guide.

The public key can be shared with anyone, without concern. The private key should never be shared.

Verify you see some files ("id_rsa" and "id_rsa.pub"):

ll ~/.ssh

Print the contents of the public key:

cat ~/.ssh/id_rsa.pub

Access Reqest Form

Fill out this access requet form, with info like the following:

  • PI: Michael Rossetti
  • Research Group: "Data Science Research Group" (rossettigrp)
  • Clusters: "Pegasus" only should be fine for now

Paste your public key contents, or upload the public key file directly.

Secure Network

You must be connected to GWireless on campus, or remotely through the GW VPN.

For either option, you will need a GW email account. Researchers and assistants at other universities can use this form to request a GW account.

GWireless

Login success when connected to GWireless!

VPN

In terms of downloading the VPN, here are some notes from the site:

Palo Alto GlobalProtect 6.0.7 for macOS

Download PaloAltoGlobalProtect-6.0.7-Mac.pkg (76 MB) (File will begin downloading in a few seconds) Palo Alto GlobalProtect allows remote access to GW resources through an encrypted connection to GW. Portal Address: gwvpn.gwu.edu

Compatible with macOS 11 and higher. Note: During installation, also install System Extensions.

After you have downloaded and installed the VPN, you may need to give access to it via the Security and Privacy settings.

To use the VPN, launch the "GlobalProtect" program, and enter the portal address. Then sign in with your GW microsoft account ("example@gwu.edu").

Logging In

Log in using the SSH credentials you submitted via the access request form:

ssh <username>@pegasus.arc.gwu.edu

# ssh <username>@pegasus.arc.gwu.edu -i ~/.ssh/id_rsa.pub

If you see a "Permission Denied" issue, email support, and they might say... "You will need to use a one-time multifactor code to log in. Please use one of these codes when prompted..." and provide you with some codes. Try to login again, and supply the code. It works. Great!

Every time you log in, it will prompt you for a 2FA code, so store those OTP codes somewhere, and setup multifactor auth as soon as possible (see section below).

Multifactor Auth

After using one of the OTP codes to login for the first time, run google-authenticator to configure multifactor auth via your Authenticator app. Answer "y" for all the questions. Scan the QR code with your Authenticator app. In the future you will use a code generated by the Authenticator app instead of the OTP / scratch codes.

Navigating the Filesystem

Your home folder is /SEAS/home/<username>.

Be aware of your SSH keys and SSH config:

ll ~/.ssh
#> authorized_keys
#> cluster
#> cluster.pub
#> config
#> id_ecdsa
#> id_ecdsa.pub
#> known_hosts

Python Environment Setup

There are many installable "modules" available, including a module for Python 3.10. However we want project-specific virtual environments, so let's use the miniconda module:

# module load python3/3.10.11

module load miniconda/miniconda3

You will need to run this every time you login to the server?

Once loaded, we should have access to anaconda command line tool.

Listing environments:

conda info --envs

Creating and activating an environment:

conda create -n my-first-env python=3.10

#conda activate my-first-env
# conda command may require some bashrc setup, they want us to use source instead:

source activate my-first-env

Verify the environment is setup properly:

python --version
pip --version

python -i # enter into python shell, test things out
# exit()

NOTE: "the python virtual environment can be built in your home directory (/SEAS/home/), or group directory (/SEAS/groups/) to share with others in your group, or on the lustre (/lustre/groups/), and similar to any packages you need."

Version Control (Git)

We need to clone repositories from GitHub. We'll use git. It looks like git is pre-installed on the server:

which git
#> /usr/bin/git

git --version
#> git version 2.39.3

Attempting to clone a repo from GitHub:

git clone git@github.com:s2t2/pegasus-notes.git

You may run into permissions issues the first time, in which case you'll need to configure SSH connection from server to GitHub (see section below).

Generating SSH Keys for GitHub

Generating a new SSH key (run this on the server):

ssh-keygen -t ed25519 -C "your_email@example.com"

This creates a new "id_ed25519.pub" key pair. Upload the resulting public key via your GitHub account's SSH settings.

cat ~/.ssh/id_ed25519.pub
#> copy then paste in GitHub

Also setup the ssh agent:

eval "$(ssh-agent -s)"
#> Agent pid 123456


ssh-add ~/.ssh/id_ed25519

Afterwards, when you try to clone again, it should work.

Running Python Applications

We can use this example Python application, a game of tic tac toe. Verify you should be able to setup the app, install requirements, and play a game:

Repo setup:

git clone git@github.com:s2t2/tic-tac-toe-py.git
cd tic-tac-toe-py/

conda create -n tictactoe-env python=3.8
source activate tictactoe-env

pip install -r requirements.txt

NOTE: environment creation and package installation can take a long... long ... long ... long ... time :-/

Usage:

python -m app.game

X_STRATEGY="COMPUTER-HARD" O_STRATEGY="COMPUTER-EASY" GAME_COUNT=100 python -m app.jobs.play_games

Downloading Files

Use scp to upload or download files to/from the server. Note: you may be asked for your multifactor code whenever you initiate a file transfer.

scp <username>@pegasus.arc.gwu.edu:/SEAS/home/<username>/projects/tic-tac-toe-py/data/games/x_minimax_vs_o_random_100.csv ~/Downloads

Scheduling Jobs

NOTE: "please do not run jobs against the NFS shares (/SEAS/home/) and groups (/SEAS/groups) but use /lustre instead, i.e. files should be read from and written to /lustre/groups/ directory."

NOTE: "we use slurm for job scheduling which can run into interactive move (salloc) or batch mode (sbatch). The batch mode requires a shell script as a wrapper to call your python code."

Example shell script:

#!/bin/bash
#SBATCH --time 00:10:00
#SBATCH -p nano
#SBATCH -o temperature.out
#SBATCH -e temperature.err
#SBATCH -N 1

. ~/miniconda3/etc/profile.d/conda.sh
python3 ~/temperature.py | sort -n

TBD - verify this when we need to schedule some jobs

About

notes for connecting to GW's high performance computing cluster