how to continue model training?
phamkhactu opened this issue · comments
phamkhactu commented
Check before submitting issues
- Make sure to pull the latest code, as some issues and bugs have been fixed.
- Due to frequent dependency updates, please ensure you have followed the steps in our Wiki
- I have read the FAQ section AND searched for similar issues and did not find a similar problem or solution
- Third-party plugin issues - e.g., llama.cpp, text-generation-webui, LlamaChat, we recommend checking the corresponding project for solutions
- Model validity check - Be sure to check the model's SHA256.md. If the model is incorrect, we cannot guarantee its performance
Type of Issue
Model training and fine-tuning
Base Model
LLaMA-7B
Operating System
Linux
Describe your issue in detail
I train model using run_clm_pt_with_peft.py
, but my machine shutdown suddenly, model had trained some step. Now I want to resume from checkpoint lora to continue training. I've read the readme, I not found anything.
Many thanks for your help.
Dependencies (must be provided for code-related issues)
No response
Execution logs or screenshots
No response
Gokul NC (Sarvam.AI) commented
Hi @phamkhactu, how did you solve the problem?
phamkhactu commented
Hi @phamkhactu, how did you solve the problem?
Hi @GokulNC-Sarvam, I use trainer and I resume from checkpoint
trainer.train(resume_from_checkpoint=resume_from_checkpoint)