InternLM / InternLM

Official release of InternLM2 7B and 20B base and chat models. 200K context support

Home Page:https://internlm.intern-ai.org.cn/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[QA] Question about phase 2 long context pretraining batch size

skyshine102 opened this issue · comments

Describe the question.

Hi internLM team,
I was reading your great paper about internLM2 and saw that

  • phase 1: 4k pretraining batch size = 4M (tokens) | 50% of data | 90% training steps
  • phase 2: 32k pretraining batch size = ? (tokens) <--- is this still 4M tokens? | 50% data (?) | 9% training steps.

Can you provide more details about whether phase 2 batch size in terms of tokens remains constant? I cannot match the data quantity with training steps :(

This issue is marked as stale because it has been marked as invalid or awaiting response for 7 days without any further response. It will be closed in 7 days if the stale label is not removed or if there is no further response.

Hi @skyshine102 , the batch_size in phase 2 remains at 4M.

@00INDEX Thank you for clarification. It's my reading problem. My apologies.

Here is the correct table for future readers.

  • phase 1: 4k pretraining batch size = 4M (tokens) | 90% training steps --> 90% of total data
  • phase 2: 32k pretraining batch size = 4M (tokens) | 9% training steps. --> ~=10% of total data but with mix length. Around 50% of this 10% data is <=4k length.