kingoflolz / mesh-transformer-jax

Model parallel transformers in JAX and Haiku

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

sequence_length=2049 or 2048?

leejason opened this issue · comments

Should the sequence_length be 2049 or 2048? In gpt-neo, the chunk_size is 2048 for the split_list() function, but it is 2049 in your repository? Why?

You can use this script (the input expects exactly the same format as gptneo) https://github.com/EleutherAI/gpt-neo/blob/master/data/create_tfrecords.py

Originally posted by @kingoflolz in #68 (comment)

commented

2049 is likely correct here. The trainer uses the first 2048 tokens of each 2049-token chunk as the "context" and the last 2048 tokens of each 2049-token chunk as the "target" so that the first token in the chunk predicts the second token in the chunk, the second predicts the third, and the 2048th predicts the 2049th. That's 2048 pairs of tokens in total.

Also, if you look at the bottom of the gpt-neo create_tfrecords script, you can see that it's increasing whatever you set the chunk size as by 1, so the default there is also 2049.

Thats right, thanks @vfbd for the answer

It is illuminating & helpful. Thank you very much.