OpenBMB / MiniCPM-V

MiniCPM-Llama3-V 2.5: A GPT-4V Level Multimodal LLM on Your Phone

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Question about full-parameter finetuning

dydxdt opened this issue · comments

Thx for your great work!
I have a question about training arguments, i.e. is the parameter max_steps=10000 proper for full-parameter finetuning?

I use my own train datset for full-parameter finetuning and the dataset has around 240,000 data. After I use the default training setting to train the model, I see the training log shows : "epoch: 0.32", which means it uses 1/3 data of the training data. My training dataset contains 3 different tasks(caption,ocr,...). Then I use the num_train_epochs(=5, same as qwen) instead of max_steps to train, but I found the model with 5 epochs performs worse than that with 10000 steps when testing on my caption testset. The loss seems normal. So Can you give some advice for this situation? Thx!

10000 step: (corresponding to red line, ignore blue line)
image

~5 epoch:
企业微信截图_e7e709a1-4fab-4034-a4b1-3591945db1dc

请问你使用了多少资源进行全参数微调,我使用2张v100和4张v100均不行