huggingface / nanotron

Minimalistic large language model 3D-parallelism training

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Continued Pretraining on Llama7b.

wiseyy opened this issue · comments

I want to do continued pretraining on my custom dataset, using the weights of Llama7b in the HF format. How do I initialize the model with those weights? I think there isn't a function for that yet.

Hey, you have to convert it to nanotron checkpoint format!!

Start by randomly initializing a llama model, then save the model checkpoint with dp=2, tp=2, pp=2, and you will see how Nanotron splits it. Then reformat the Hugging Face checkpoint in this way