Para_sweep_CC

This is a demo of how to fulfill parallel hyperparameter training on Compute Canada. Here, I use META for parallel training.

META setting for Compute Canada

Login in Cedar and cd to the project folder.
Load the meta-farm module:
```
$ module load meta-farm
```
Choose a name for farm directory, e.g. para, and create it with command:
```
$ farm_init.run  para
```
Then cd para and ls, you will see files of config.h, job_script.sh, resubmit_script.sh, single_case.sh, table.dat

job_script.sh is for slurm commands, ;ile your account information and how much source you want to use. For example, you could edit it in this way:

#!/bin/bash
# Here you should provide the sbatch arguments to be used in all jobs in this serial farm
# It has to contain the runtime switch (either -t or --time):
#SBATCH -t 0-00:10 # 10 minutes training for all nodes
#SBATCH --mem=4G # 4G memory for each node
#  You have to replace Your_account_name below with the name of your account:
#SBATCH --account= $your_supervisor_name
#SBATCH --mail-user= $your_email_to_be_notified
#SBATCH --mail-type=ALL
source ../$ENV/bin/activate # activate the virtual environment for your code
# Don't change this line: #don't edit afterwards
task.run

single_case.sh needs not to be changed this case.
table.dat is for your job arrangement. You need to write command line commands here. For this demo, in order to generate/modify table.dat, the following commands could be applied:
```
for ((i=1;i<=2;i++)); do echo python $your_folder_path/test01.py; done > table.dat
```
Then the command line will be written into table.dat. In this demo, I only use two nodes for parallel training. Commands like cp and mkdir and etc can also be written here.
If there comes up with error of permission denied, you could run chmod 777 for the file to give the system right to run it.
Submit the job by submit.run <number of cycles you want to run, ie. 10/20/30>, clean by clean.run, terminate by kill.run

Useful link:

Sweep from Wandb

In your virtual environment,
```
pip install wandb
wandb login
```
you need to paste your authentication code this step, just follow the instruction from code.
Define your sweep configuration in sweep_configuration in test01.py.
Don't forget wandb.log in your code.
You will see the real time result in your wandb web page

CarlZOUbit / Para_sweep_CC

Para_sweep_CC

META setting for Compute Canada

Useful link:

Sweep from Wandb

About

Languages