ymcui / Chinese-LLaMA-Alpaca

中文LLaMA&Alpaca大语言模型+本地CPU/GPU训练部署 (Chinese LLaMA & Alpaca LLMs)

Home Page:https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

how to continue model training?

phamkhactu opened this issue · comments

Check before submitting issues

  • Make sure to pull the latest code, as some issues and bugs have been fixed.
  • Due to frequent dependency updates, please ensure you have followed the steps in our Wiki
  • I have read the FAQ section AND searched for similar issues and did not find a similar problem or solution
  • Third-party plugin issues - e.g., llama.cpp, text-generation-webui, LlamaChat, we recommend checking the corresponding project for solutions
  • Model validity check - Be sure to check the model's SHA256.md. If the model is incorrect, we cannot guarantee its performance

Type of Issue

Model training and fine-tuning

Base Model

LLaMA-7B

Operating System

Linux

Describe your issue in detail

I train model using run_clm_pt_with_peft.py , but my machine shutdown suddenly, model had trained some step. Now I want to resume from checkpoint lora to continue training. I've read the readme, I not found anything.

Many thanks for your help.

Dependencies (must be provided for code-related issues)

No response

Execution logs or screenshots

No response

Hi @phamkhactu, how did you solve the problem?

Hi @phamkhactu, how did you solve the problem?

Hi @GokulNC-Sarvam, I use trainer and I resume from checkpoint

    trainer.train(resume_from_checkpoint=resume_from_checkpoint)