ise-uiuc / magicoder

Magicoder: Source Code Is All You Need

Home Page:https://arxiv.org/abs/2312.02120

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Confusion about the code of train

jaywongs opened this issue · comments

First of all, thank you for your amazing work!
I'm attempting to replicate the training process, and I have a question regarding the train.py file. In your paper, you mentioned using two A100-80G GPUs, but I couldn't find any mention of multiprocessing or distributed training in your code. I'm curious if you used deepspeed for training? If not, could you provide guidance on modifying the code to make it compatible with a multi-GPU setup?
Thanks once again!

Hi, such options are given in the shell command, which we have not documented yet. Roughly here is how the training is invoked:

accelerate launch -m src/magicoder/train.py \
	--model_key $MODEL_KEY \
	--model_name_or_path $MODEL_KEY \
	--use_flash_attention True \
	--datafile_paths $DATASET_PATH \
	--output_dir $OUTPUT_DIR \
	--bf16 True \
	--num_train_epochs 2 \
	--per_device_train_batch_size 2 \
	--gradient_accumulation_steps 128 \
	--group_by_length False \
	--ddp_find_unused_parameters False \
	--optim adafactor \
	--max_grad_norm -1 \
	--warmup_steps $WARMUP_STEP \
	--learning_rate 5e-5 \
	--lr_scheduler_type linear

We will give a more clear documentation later.

Thank you for your reply, it worked! Looking forward to the clear documentation~

Hey thx for the answer, looking forward to the whole scripts

Hi, such options are given in the shell command, which we have not documented yet. Roughly here is how the training is invoked:

accelerate launch -m src/magicoder/train.py \
	--model_key $MODEL_KEY \
	--model_name_or_path $MODEL_KEY \
	--use_flash_attention True \
	--datafile_paths $DATASET_PATH \
	--output_dir $OUTPUT_DIR \
	--bf16 True \
	--num_train_epochs 2 \
	--per_device_train_batch_size 2 \
	--gradient_accumulation_steps 128 \
	--group_by_length False \
	--ddp_find_unused_parameters False \
	--optim adafactor \
	--max_grad_norm -1 \
	--warmup_steps $WARMUP_STEP \
	--learning_rate 5e-5 \
	--lr_scheduler_type linear

We will give a more clear documentation later.

Hey thx for the answer, Is clear documentation done?