Math_RLHF

Repository Setup Guide

This guide will walk you through the process of setting up this repository, including cloning it, copying model checkpoints to specific folders, navigating to the correct location, and running a Bash command.

0. Important Prerequisites

a. Please ensure that transformers version is 4.30.2 before proceeding. b. In the file paths used, please include the keyword "llama" in the path name if the model is of llama class.

1. Clone the Repository

To get started, clone this repository to your local machine using the following command:

git clone https://github.com/sarahpannn/Math_RLHF.git

2. Copy Over Model Checkpoints

Generator Model

Original file path: /mnt/shared_home/span/lets-reinforce-step-by-step/training/ONLY_MATH_SFT/four_epochs

New file path: /ONLY_MATH_SFT/four_epochs

Outcome-supervised Reward Model (ORM)

Original file path: /mnt/shared_home/span/lets-reinforce-step-by-step/training/model/llama-3b-ORM/hf_directory

New file path: /llama-3b-ORM/hf_directory

Process-supervised Reward Model (PRM)

Original file path: /mnt/shared_home/span/lets-reinforce-step-by-step/training/model/deberta-v3-large-800k-3

3. CD into the right location

cd DeepSpeed-Chat/training/step3_rlhf_finetuning

4. Run bash command

If RLHF using ORM, run:

bash training_scripts/single_node/ORM/ORM_dump.sh

If RLHF using PRM delivery method avg, run:

bash training_scripts/single_node/PRM/PRM_avg_dump.sh

If RLHF using PRM delivery method product, run:

bash training_scripts/single_node/PRM/PRM_prod_dump.sh

If RLHF using PRM delivery method fine-grained, run:

bash training_scripts/single_node/PRM/real_prm.sh

5. Adjust batch size

I'm not sure how much RAM the A100's will have, but if it's more than 24 gb like Inanna, I assume increasing the per_device_batch_size will maximize usage.

sarahpannn / Math_RLHF