train configuration

Question

train configuration

je1lee opened this issue 8 months ago · comments

train_config={
"lr":args.lr,
"bs":args.bs,
"gradient_accumulation_steps":args.gradient_accumulation_steps,
"datapath":f"{args.tmpdir}",
"is_warmup":True,
"num_epochs":200,
"num_warmup_steps":2000,
"total_steps":800000,
"p_w":0.1,
"v_w":1.0,
"head_w":0.1,
"num_workers":2,
"embeding":True,
"act":"No",
"data_noise":True,
"noise":"uniform",
"mean":0.0,
"std":0.2,
"residual":"true,norm",
"max_len":2048,
"config_path":args.configpath,
"b1":0.9,
"b2": 0.95,
"grad_clip": 0.5,
}
I'm trying to retrain the autoregression head with your train code.

Is this train_config used for the every autoregression head in hf_hub??
epochs seems too much for me.. If it's not the exact train config used for training this model(https://huggingface.co/yuhuili/EAGLE-llama2-chat-70B) could you share the train_config used for training yuhuili/EAGLE-llama2-chat-70B??

yuhuili · Answer 1 · Wed Dec 20 2023 11:24:42 GMT+0800 (China Standard Time)

We did not stop training according to the "num_epochs" parameter, which was arbitrarily set. In reality, we only trained for 20 epochs. Limited by VRAM (as we didn't have A100 80G), we also set "max_len" to 1200. This parameter truncates the training sequences; the larger it is set, the more training data is used, and the better the results. If you have sufficient resources, you can try a larger "max_len". Below are our training configurations for LLaMA2-Chat 70B.

train_config={
"lr":3e-5,
"bs":4,
"gradient_accumulation_steps":8,
"datapath":f"{args.tmpdir}",
"is_warmup":True,
"num_epochs":200,
"num_warmup_steps":2000,
"total_steps":800000,
"p_w":0.1,
"v_w":1.0,
"head_w":0.1,
"num_workers":2,
"embeding":True,
"act":"No",
"data_noise":True,
"noise":"uniform",
"mean":0.0,
"std":0.2,
"residual":"true,norm",
"max_len":1200,
"config_path":"config.json",
"b1":0.9,
"b2": 0.95,
"grad_clip": 0.5,
}