Continued Pretraining on Llama 7b.
wiseyy opened this issue · comments
In continuation to #78 (comment),
I converted the weights as you mentioned, but unfortunately, I cannot get the same sane outputs for the pre-trained llama weights as I get when using HF Api. I am trying to figure out why that is happening. Conversion is straightforward except for the gate_up and qkv weights of nanotron, since the structure of the weights is not mentioned. I assume that concatenating the hf weights in the 0th dimension in the order (gate, up) and (q,k,v) should give the same behaviour for nanotron weights.
The sources of errors I could think of are (assuming there is no bug in run_generate.py):
- Order of qkv matrices in the nanotron format.
- Storing the transpose of qkv matrices?
- Difference in rotary embeddings as compared to HF API.
Could you please help me out?
Update :
The outputs look somewhat sane. However, they are far from acceptable.
Here, for example, it tries to speak but then moves on to generating gibberish. This leads me to believe that the weight mapping is correct and that there is some error in the Generation code.
I want to point out that you are not passing the arguments to the sampler in the decode_text
function in generation/decode.py.
![Screenshot 2024-02-24 at 1 29 43 AM](https://private-user-images.githubusercontent.com/77568830/307437742-5bdee436-5995-437a-9870-10dc7c9bc7b0.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTgwNTkyMTMsIm5iZiI6MTcxODA1ODkxMywicGF0aCI6Ii83NzU2ODgzMC8zMDc0Mzc3NDItNWJkZWU0MzYtNTk5NS00MzdhLTk4NzAtMTBkYzdjOWJjN2IwLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA2MTAlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNjEwVDIyMzUxM1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTRhZDExODRiNjU1M2M1YWQwMzFhZjYyMWNhMmE3YTI5MTgwYzEzMGM2YWI2YTE5MjBhYTM5ODBkN2NiZWQ3NDYmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.YSZNAKDY2Y80dWnI_IR3n4shohhRy0XAx5SAJtGBq8k)
The above outputs were generated using decode_tokenized(), which does that. The GenerationArgs were as follows:
The output that HF API generates for the same weights and input tokens is as follows:
The quality is a lot better than the text generated by nanotron.
Also, when I try to prompt the 7b-chat version with a system prompt and user input (the default way), nanotron output breaks altogether.
This is HF->
This is nanotron->
- Can you suggest reasonable values for GenerationArguments that can be used to reproduce similar-quality text generation?
- Is the generation code doing what it is supposed to do?
@NouamaneTazi do we have a conversion script from transformers
to nanotron checkpoint?
Any updates? @xrsrke
@wiseyy I'm facing a similar challenge. Any way we can join forces on this and try to make it work? :)
Glad to know I'm not alone :)
I already chose the easier route to use Megatron-LLM and Meditron. The training throughput, however, is ~2/3 of what nanotron provides. Also, you would have to convert the weights to hf format after you finish training and infer using hf/vllm.
I hope that helps you.