Pretraining (with CPUs)

Question

Pretraining (with CPUs)

bitmarkcc opened this issue 23 days ago · comments

I'm new to deep learning but have some experience with training boosted-decision-trees.

Is this just for fine-tuning or pretraining as well? When I look inside train_gpt2.c I see the first thing it does is it loads weights from a bin file (gpt2_124M.bin). Where did this bin file come from? Is this an official file released by OpenAI? I would like to be able to start from scratch.

I would like to first see how pretraining works, even if it's just a small dataset, and it doesn't need to be GPUs. I would like to start with CPUs first, and maybe add CPU-only nodes that can work on 'parts' of the training.

bitmarkcc · Answer 1 · Tue Jul 02 2024 17:46:05 GMT+0800 (China Standard Time)

I see in train_gpt2.cu there is a gpt2_build_from_random() for training from scratch. I can attempt to copy that into the train_gpt2.c, but not sure how easy it will be. Any forks doing this?

What I would like to see is code that is platform independent (no reliance on Nvidia or AMD), though if people have those devices (or ASICs) they can use optimized code, but there should be a fallback to the platform independent code.

Edit: I think this will do mainly what I want. Though I need to add a way to pass the model and training parameters to the command line: bitmarkcc@bdff450

Aleksa Gordić · Answer 2 · Sat Jul 06 2024 04:40:12 GMT+0800 (China Standard Time)

Hey @bitmarkcc! Did you follow the README?

You should first run the Python code, it'll generate all the necessary bin/state files before you run C/CUDA code.

If something is not clearly explained in the README either open up a PR fixing it or reply back here, happy to help.

bitmarkcc · Answer 3 · Sat Jul 06 2024 16:42:52 GMT+0800 (China Standard Time)

Ya so according to the README, these can be generated with the train_gpt2.py and it references the official implementations of GPT-2 from OpenAI and HuggingFace. So these were generated from that python script? And if you run the C program it reproduces the same bin files?

In any case, I am still wondering if my code is fine for how I implemented pretraining for CPU mode (bitmarkcc@bdff450). I want to make more changes and I can put a pull request later on.

Edit: I think now it actually randomizes the parameters (2nd commit): bitmarkcc@7581695

almao · Answer 4 · Tue Jul 09 2024 00:24:04 GMT+0800 (China Standard Time)

nebody know where to get nice SHA?