Help with Finetune

Question

Help with Finetune

EmailScraper opened this issue a month ago · comments

Hi, team.
I fine tuned with 15 <image, Q&A> pairs of one document type. (12 pairs as train, 3 as test)
After 500 epoches, both of training and eval_loss are nearly zero like following.

{'loss': 0.0, 'grad_norm': 7.342097887885757e-06, 'learning_rate': 1e-06, 'epoch': 500.0}
{'eval_loss': 7.8905128475526e-07, 'eval_runtime': 2.179, 'eval_samples_per_second': 1.377, 'eval_steps_per_second': 0.459, 'epoch': 500.0}

But even I pick up image and ask same questions from training set, still get different result from training set like following

Training Data

{
      "role": "user",
      "content": "What is the address breakdown city?"
    },
    {
      "role": "assistant",
      "content": "CALWELL"
    },
    {
      "role": "user",
      "content": "What is the address breakdown street?"
    },
    {
      "role": "assistant",
      "content": "65 MARTLEY CCT"
    },
    {
      "role": "user",
      "content": "What is the address?"
    },
    {
      "role": "assistant",
      "content": "65 MARTLEY CCT, CALWELL, ACT, 2905"
    },

Model Output

<User>: What is the address breakdown city?
<Assistant>: 65 MARTLEY CCT
<User>: What is the address breakdown street?
<Assistant>: 65 MARTLEY CCT, suburb CALWELL ACT 2905.
<User>: What is the address?
<Assistant>: 65 MARTLEY CCT CALWELL ACT 2905

I used batch_size 8, and model_max_length 1024.

I loaded model like following.

model_path = '/home/paperspace/.../OpenBmb/MiniCPM-V/finetune/output/output_minicpmv2_lora/checkpoint-1000'
model = AutoPeftModelForCausalLM.from_pretrained(model_path, device_map="auto", trust_remote_code=True).to(dtype=torch.float16)
model = model.to(device=device)
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model.eval()

Tensorboard

Any help would be appreciated to get correct finetuning result.

addsion commented a month ago

插眼

WAILMAGHRANE · Answer 1 · Sat Jun 08 2024 03:43:51 GMT+0800 (China Standard Time)

Hi, team. I fine tuned with 15 <image, Q&A> pairs of one document type. (12 pairs as train, 3 as test) After 500 epoches, both of training and eval_loss are nearly zero like following.

{'loss': 0.0, 'grad_norm': 7.342097887885757e-06, 'learning_rate': 1e-06, 'epoch': 500.0}
{'eval_loss': 7.8905128475526e-07, 'eval_runtime': 2.179, 'eval_samples_per_second': 1.377, 'eval_steps_per_second': 0.459, 'epoch': 500.0}

But even I pick up image and ask same questions from training set, still get different result from training set like following

Training Data

{
      "role": "user",
      "content": "What is the address breakdown city?"
    },
    {
      "role": "assistant",
      "content": "CALWELL"
    },
    {
      "role": "user",
      "content": "What is the address breakdown street?"
    },
    {
      "role": "assistant",
      "content": "65 MARTLEY CCT"
    },
    {
      "role": "user",
      "content": "What is the address?"
    },
    {
      "role": "assistant",
      "content": "65 MARTLEY CCT, CALWELL, ACT, 2905"
    },

Model Output

<User>: What is the address breakdown city?
<Assistant>: 65 MARTLEY CCT
<User>: What is the address breakdown street?
<Assistant>: 65 MARTLEY CCT, suburb CALWELL ACT 2905.
<User>: What is the address?
<Assistant>: 65 MARTLEY CCT CALWELL ACT 2905

I used batch_size 8, and model_max_length 1024.

I loaded model like following.

model_path = '/home/paperspace/.../OpenBmb/MiniCPM-V/finetune/output/output_minicpmv2_lora/checkpoint-1000'
model = AutoPeftModelForCausalLM.from_pretrained(model_path, device_map="auto", trust_remote_code=True).to(dtype=torch.float16)
model = model.to(device=device)
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model.eval()

Tensorboard

Any help would be appreciated to get correct finetuning result.

Can you give me the code ?

Jin-Qiang Wang · Answer 2 · Sun Jun 09 2024 12:47:02 GMT+0800 (China Standard Time)

Hi, team. I fine tuned with 15 <image, Q&A> pairs of one document type. (12 pairs as train, 3 as test) After 500 epoches, both of training and eval_loss are nearly zero like following.

{'loss': 0.0, 'grad_norm': 7.342097887885757e-06, 'learning_rate': 1e-06, 'epoch': 500.0}
{'eval_loss': 7.8905128475526e-07, 'eval_runtime': 2.179, 'eval_samples_per_second': 1.377, 'eval_steps_per_second': 0.459, 'epoch': 500.0}

But even I pick up image and ask same questions from training set, still get different result from training set like following

Training Data

{
      "role": "user",
      "content": "What is the address breakdown city?"
    },
    {
      "role": "assistant",
      "content": "CALWELL"
    },
    {
      "role": "user",
      "content": "What is the address breakdown street?"
    },
    {
      "role": "assistant",
      "content": "65 MARTLEY CCT"
    },
    {
      "role": "user",
      "content": "What is the address?"
    },
    {
      "role": "assistant",
      "content": "65 MARTLEY CCT, CALWELL, ACT, 2905"
    },

Model Output

<User>: What is the address breakdown city?
<Assistant>: 65 MARTLEY CCT
<User>: What is the address breakdown street?
<Assistant>: 65 MARTLEY CCT, suburb CALWELL ACT 2905.
<User>: What is the address?
<Assistant>: 65 MARTLEY CCT CALWELL ACT 2905

I used batch_size 8, and model_max_length 1024.

I loaded model like following.

model_path = '/home/paperspace/.../OpenBmb/MiniCPM-V/finetune/output/output_minicpmv2_lora/checkpoint-1000'
model = AutoPeftModelForCausalLM.from_pretrained(model_path, device_map="auto", trust_remote_code=True).to(dtype=torch.float16)
model = model.to(device=device)
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model.eval()

Tensorboard

Any help would be appreciated to get correct finetuning result.

@EmailScraper Hello, may I ask if you have trained with images? When I fine tuned, it kept showing loading the data. Have you encountered this? May I inquire with you? Thank you

LDLINGLINGLING · Answer 3 · Mon Jun 17 2024 16:24:24 GMT+0800 (China Standard Time)

我认为有可能是你的训练数据集太小了，而且训练epoch设置为500，又太多了导致模型严重过拟合

Alan · Answer 4 · Tue Jun 18 2024 09:19:49 GMT+0800 (China Standard Time)

Hi, @LDLINGLINGLING , thanks for your reply.
Btw, I just pick up image and ask same questions from training set, still get different result from training set.
Is it possible?