Question about vocabulary file
htw2012 opened this issue · comments
H.Tongwen commented
Hello,
There's no script to generate the vocabulary file vocab. Could you tell us how the vocabulary file is generated in detail?
H.Tongwen commented
As paper mentions,We learn codes and tokenize the data using fastBPE, but we use a large vocabulary of roughly 250K tokens
.