epfml / landmark-attention

Landmark Attention: Random-Access Infinite Context Length for Transformers

Home Page:https://arxiv.org/abs/2305.16300

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Question about Training vs LLaMA Fine Tuning

eugenepentland opened this issue · comments

What is the difference between "Training" vs the "LLaMA fine-tuning"? I was able to get the repo all setup and was running the "Training" section using PG19 with just 1 test and 1 evaluation page as a test. Am I running the correct thing? I

I'm not sure I understand if I need to run the training before being able to do the LLaMA fine tuning. Running on 48GB of VRAM RXT 8000" for 2 hours, this is the progress so far will all of the provided settings. If my math is correct, it will take roughly 16 hours to complete.

Compiling model ...
228/200 [train] loss=2.680 [val] loss=8.843, pp=6925.13, acc=0.112956 [time per itr] 4010.14ms [lr] 0.00003
457/400 [train] loss=0.331 [val] loss=8.885, pp=7225.27, acc=0.106801 [time per itr] 3848.56ms [lr] 0.00005
685/600 [train] loss=2.082 [val] loss=8.234, pp=3768.09, acc=0.094640 [time per itr] 3847.03ms [lr] 0.00010
914/800 [train] loss=0.053 [val] loss=8.778, pp=6487.26, acc=0.084274 [time per itr] 3845.98ms [lr] 0.00015
1142/1000 [train] loss=0.418 [val] loss=8.470, pp=4770.87, acc=0.082754 [time per itr] 3847.87ms [lr] 0.00022
1371/1200 [train] loss=0.046 [val] loss=9.254, pp=10444.46, acc=0.070409 [time per itr] 3848.61ms [lr] 0.00031
1600/1400 [train] loss=0.034 [val] loss=9.761, pp=17348.07, acc=0.070140 [time per itr] 3846.69ms [lr] 0.00041
1828/1600 [train] loss=2.333 [val] loss=10.768, pp=47481.72, acc=0.054616 [time per itr] 3844.40ms [lr] 0.00052
2057/1800 [train] loss=0.367 [val] loss=12.007, pp=163920.19, acc=0.050720 [time per itr] 3844.43ms [lr] 0.00063

I also had an issue where it said I my cuda device didn't have bfloat16, so I had to change it to torch.float16 to get it running. Running Cuda 12.1, not sure if that's just too new of a version.

I also had an issue where it said I my cuda device didn't have bfloat16, so I had to change it to torch.float16 to get it running. Running Cuda 12.1, not sure if that's just too new of a version.

I think that's an AMD specific issue iirc

these are two separate things, you don't need both: 1) the training (from scratch) on gutenberg books, and 2) the finetuning of llama (to add the landmarks to an existing model).

guess we can close this as in the meantime you were able to run llama as you described in the other issues?