young-geng / brc_cql_example

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Example Project for Running SAC on BRC with Singularity Container

This is an example project showcasing a practical workflow on BRC Savio cluster. For demonstration, we will use my SAC implementation as an example, and provide instructions for building the container and running it on BRC cluster.

Project Structure

  • brc_cql_example
    • CQL: Python SAC implementation directly taken from this repo
      • environment.yml: anaconda environment file listing all the dependencies
      • ...
    • base_container.def: singularity definition file for the base container, with all the dependencies installed but without the code
    • code_container.def: singularity definition file for the code container, copying the code to base container
    • launch_sweep.sh: script for launching hyperparameter sweep job with slurm on BRC

Instructions

Here we provide step by step instructions for building the container and running it on BRC. In order to reproduce this steps, you will need a machine running Linux.

Install Singularity Container

The first step is to install singularity container locally. Please follow the instruction here. I recommend version no earlier than 3.7.

Build the Base Container

The base container is the container that packages all the dependencies for this project. It is built on top of a public singularity image with anconda and mujoco pre-installed. Run the following command to build the base container. You only need to do this once in the beginning, unless you change the required python packages. For detailed information about the building process, see base_container.def.

singularity build --fakeroot base_img.sif base_container.def

Build the Code Container

Run the following command to build the code container that package our research project. For detailed information on how container is built, see code_container.def.

singularity build --fakeroot code_img.sif code_container.def

Copy the Container and Slurm Job Script to BRC

First ssh into the BRC DTN node, and create the project directory.

cd /global/scratch/users/<YOUR BRC USER NAME>
mkdir brc_cql_example

Then use scp to copy the job script and container to BRC.

scp ./code_img.sif ./launch_sweep.sh \
    <YOUR BRC USER NAME>@dtn.brc.berkeley.edu:/global/scratch/users/<YOUR BRC USER NAME>/brc_cql_example/

Launch the Hyperparmeter Sweep Job

We provide an example job script that launches a hyperparameter sweep of 12 configurations on 4 GPUs. The job script uses a combination of slurm array job and GNU parallel to evenly distributed hyperparameter configurations to 4 array tasks, with each array task running 3 configurations in parallel. To launch the hyperparmeter sweep, run the following command on BRC login node:

sbatch launch_sweep.sh

About


Languages

Language:Python 68.2%Language:JavaScript 14.1%Language:HTML 12.0%Language:CSS 4.1%Language:Shell 1.6%