LLM-jp Model-WG working directory
https://github.com/llm-jp/modelwg
Direct commits to main branch are prohibited unless absolutely necessary.
Megatron to Hugging Face Llama2 Model Converter
We're using Megatron to Hugging Face Llama2 converter implemented by Fujii-san.
https://github.com/rioyokotalab/Megatron-Llama2
Install
$ ./install_Megatron_Llama2.sh
Conversion
- Input Megatron-LM checkpoint path
/data/checkpoints_7b/model_name/
- required files
iter_NNNNNNN/
latest_checkpointed_iteration.txt
- Output Hugging Face model path
/model/7B_HF/llm-jp-7b-model-name/
- Hugging Face tokenizer model path
/model/llm-jp-tokenizer/hf/ver2.2/tokenizer_model/
Example:
./convert_megatron_to_hf_llama.sh /data/checkpoints_7b/model_name/ /model/7B_HF/llm-jp-7b-model-name/ /model/llm-jp-tokenizer/hf/ver2.2/tokenizer_model/
Upload to HF
Before upload, you need to log in to Hugging Face via huggingface-cli login
with an access token having write permission.
You can create new access token here.
$ huggingface-cli login
Token: [Your Write Token Here]
Add token as git credential? (Y/n) Y
Token is valid (permission: write).
Your token has been saved in your configured git credential helpers (store).
Your token has been saved to /home/username/.cache/huggingface/token
Login successful
After logging in to Hugging Face, run script to upload models like below:
$ source Megatron-Llama2/venv/bin/activate
$ python upload_to_hf.py /model/7B_HF/llm-jp-7b-63500step.code10K_en20K_ja30K_ver2.2/ llm-jp/7b-v1.0.1-63500step.code10K_en20K_ja30K_ver2.2 main
If the base_model
in the model card within the README.md of the model points to a path of a model located locally, the base_model
line will be removed from README.md and a TODO memo will be added to the top of the model page like below:
**TODO: Add base_model description to model card section in Hugging Face Hub**
In this case, you need to edit README.md at the published model page to add base_model
line and remove TODO line.
Megatron-DeepSpeed to Hugging Face GPT2 converter
175B version
Install
$ ./install_convert2hf-175b.sh
Conversion
- Input Megatron-LM checkpoint path
/model/175B/global_step21000/
- Output Hugging Face model path
/model/175B_HF/llm-jp-175b-21k/
- Hugging Face tokenizer model path
/model/llm-jp-tokenizer/hf/ver2.2/code20K_en40K_ja60K.ver2.2_hf_fast.b4/
Example:
$ ./convert_mds-13b_to_hf_gpt2.sh /model/175B/global_step21000/ /model/175B_HF/llm-jp-175b-21k/ /model/llm-jp-tokenizer/hf/ver2.2/code20K_en40K_ja60K.ver2.2_hf_fast.b4/
13B version
Install
$ ./install_convert2hf-13b.sh
Conversion
- Input Megatron-LM checkpoint path
/model/13B/ds_gpt_v101_fattn_nfs_0825-gpt_1.3B_fold00_gpu96_node12_lr2.0e-4_gbs1536_mbs4_nwk8_zero1_pp1/global_step8654/
- Output Hugging Face model path
/model/13B_HF/llm-jp-13b-v1.0/
- Hugging Face tokenizer model path
/model/llm-jp-tokenizer/hf/ver2.1/code10k_en20k_ja30k.ver2.1_hf_fast/
Example:
$ ./convert_mds-13b_to_hf_gpt2.sh /model/13B/ds_gpt_v101_fattn_nfs_0825-gpt_1.3B_fold00_gpu96_node12_lr2.0e-4_gbs1536_mbs4_nwk8_zero1_pp1/global_step8654/ /model/13B_HF/llm-jp-13b-v1.0/ /model/llm-jp-tokenizer/hf/ver2.1/code10k_en20k_ja30k.ver2.1_hf_fast/
Supervised Fine-tuning with llm-jp-sft
https://github.com/llm-jp/llm-jp-sft/
Usage in MDX environment:
https://github.com/llm-jp/llm-jp-sft/blob/main/mdx/README.md
Install
$ ./install_llm-jp-sft.sh
Enabling venv
$ cd llm-jp-sft/
$ source venv/bin/activate
Single-GPU LoRA SFT
Note
Due to the design of the SFTTrainer
class used in the SFT script, the information from the run will be stored in a project titled "huggingface" within the wandb account of the user who executed it.
To modify the target project for storing this information, you have the option to use the wandb init
command to configure the project settings, or alternatively, you can set the environment variables as shown below:
export WANDB_ENTITY=llm-jp
export WANDB_PROJECT=project_name
For Llama models:
$ CUDA_VISIBLE_DEVICES=0 mdx/train_peft_single_gpu.sh /model/7B_HF/model_name/ /model/7B_HF/model_name/ dataset/ mdx/dataset_jaster.sh 5 /model/7B_HF/model_name-jaster-lora-all 2 32 --peft_target_model llama-all
$ CUDA_VISIBLE_DEVICES=0 mdx/train_peft_single_gpu.sh /model/7B_HF/model_name/ /model/7B_HF/model_name/ dataset/ mdx/dataset_gpt4_self_inst_ja.sh 5 /model/7B_HF/model_name-self-inst-lora-all 2 32 --peft_target_model llama-all
For GPT-2 models:
$ CUDA_VISIBLE_DEVICES=0 mdx/train_peft_single_gpu.sh llm-jp/llm-jp-13b-v1.0 llm-jp/llm-jp-13b-v1.0 dataset/ mdx/dataset_jaster.sh 5 results/llm-jp/llm-jp-13b-v1.0-jaster-lora 1 64
$ CUDA_VISIBLE_DEVICES=0 mdx/train_peft_single_gpu.sh llm-jp/llm-jp-13b-v1.0 llm-jp/llm-jp-13b-v1.0 dataset/ mdx/dataset_gpt4_self_inst_ja.sh 5 results/llm-jp/llm-jp-13b-v1.0-self-inst-lora 1 64
Multi-GPU Full-parameter SFT
$ mdx/train_full_single_node.sh configs/accelerate_config_zero3.yaml /model/7B_HF/model_name/ /model/7B_HF/model_name/ dataset/ mdx/dataset_jaster_ja.sh 3 /model/7B_HF/smodel_name-jaster-full 2 16
$ mdx/train_full_single_node.sh configs/accelerate_config_zero3.yaml /model/7B_HF/model_name/ /model/7B_HF/model_name/ dataset/ mdx/dataset_gpt4_self_inst_ja.sh 3 /model/7B_HF/model_name-self-inst-full 2 16
llm-jp-eval
https://github.com/llm-jp/llm-jp-eval
Install
$ ./install_llm-jp-eval.sh
Enabling venv
$ cd llm-jp-eval/
$ source venv/bin/activate
Single-GPU Evaluation
$ CUDA_VISIBLE_DEVICES=0 python scripts/evaluate_llm.py model.pretrained_model_name_or_path=/model/7B_HF/model_name/ tokenizer.pretrained_model_name_or_path=/model/7B_HF/model_name/ target_dataset=all wandb.run_name=model_name
$ CUDA_VISIBLE_DEVICES=0 python scripts/evaluate_llm.py model.pretrained_model_name_or_path=/model/7B_HF/model_name/ tokenizer.pretrained_model_name_or_path=/model/7B_HF/model_name/ dataset_dir=dataset/tuning/dev target_dataset=gpt4_self_inst_ja wandb.run_name=model_name
Pre-training Settings
Run Monitoring
Set up
Install
$ ./install_run_sitter.sh
Initialize
Set SLACK_WEBHOOK_URL
to ./run_sitter_url.txt
.
The format of SLACK_WEBHOOK_URL
is like https://hooks.slack.com/services/foo/bar
.
$ sudo su llmjp0
$ nano run_sitter_url.txt
> [save SLACK_WEBHOOK_URL]
$ cd run_sitter/
$ source venv/bin/activate
$ wandb init
Which team should we use?
(2) llm-jp
Which project should we use?
[select any project]
$ cd ..
$ exit
Launch Monitoring Process
Before launching ./run_sitter.sh
, get the Run path
value from wandb Run Overview page.
The format of Run path
is like llm-jp/megatron-lm-13B-2023-1225/o2uz07wk
.
$ sudo su llmjp0
$ nohup ./run_sitter.sh RUN_PATH &
llm-jp-llama-2
13b-llm-jp-v2_CC_50k
$ sudo su llmjp0
$ tmux ls
$ tmux a -t SESSION_NAME
$ cd /model/llmjp0/Megatron-LM
$ bash scripts/mdx/llm-jp-llama-2-13b/13b-llm-jp-v2_CC_50k.sh >> 13b-llm-jp-v2_CC_50k_log 2>&1 &
$ tail -f 13b-llm-jp-v2_CC_50k_log