salesforce / ctrl

Conditional Transformer Language Model for Controllable Generation

Home Page:https://arxiv.org/abs/1909.05858

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Question about vocabulary file

htw2012 opened this issue · comments

Hello,

There's no script to generate the vocabulary file vocab. Could you tell us how the vocabulary file is generated in detail?

As paper mentions,We learn codes and tokenize the data using fastBPE, but we use a large vocabulary of roughly 250K tokens.