Question about full-parameter finetuning

Question

Question about full-parameter finetuning

dydxdt opened this issue a month ago · comments

Thx for your great work!
I have a question about training arguments, i.e. is the parameter max_steps=10000 proper for full-parameter finetuning?

I use my own train datset for full-parameter finetuning and the dataset has around 240,000 data. After I use the default training setting to train the model, I see the training log shows : "epoch: 0.32", which means it uses 1/3 data of the training data. My training dataset contains 3 different tasks(caption,ocr,...). Then I use the num_train_epochs(=5, same as qwen) instead of max_steps to train, but I found the model with 5 epochs performs worse than that with 10000 steps when testing on my caption testset. The loss seems normal. So Can you give some advice for this situation? Thx!

10000 step: (corresponding to red line, ignore blue line)

~5 epoch:

1SingleFeng · Answer 1 · Thu Jun 20 2024 11:19:41 GMT+0800 (China Standard Time)

请问你使用了多少资源进行全参数微调，我使用2张v100和4张v100均不行