[QA] Question about phase 2 long context pretraining batch size
skyshine102 opened this issue · comments
Describe the question.
Hi internLM team,
I was reading your great paper about internLM2 and saw that
- phase 1: 4k pretraining batch size = 4M (tokens) | 50% of data | 90% training steps
- phase 2: 32k pretraining batch size = ? (tokens) <--- is this still 4M tokens? | 50% data (?) | 9% training steps.
Can you provide more details about whether phase 2 batch size in terms of tokens remains constant? I cannot match the data quantity with training steps :(
This issue is marked as stale because it has been marked as invalid or awaiting response for 7 days without any further response. It will be closed in 7 days if the stale label is not removed or if there is no further response.
Hi @skyshine102 , the batch_size in phase 2 remains at 4M.
@00INDEX Thank you for clarification. It's my reading problem. My apologies.
Here is the correct table for future readers.
- phase 1: 4k pretraining batch size = 4M (tokens) | 90% training steps --> 90% of total data
- phase 2: 32k pretraining batch size = 4M (tokens) | 9% training steps. --> ~=10% of total data but with mix length. Around 50% of this 10% data is <=4k length.