salesforce / ctrl

Conditional Transformer Language Model for Controllable Generation

Home Page:https://arxiv.org/abs/1909.05858

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How to add new control code into vocabulary?

leejason opened this issue · comments

Is it possible or is there any code for adding new control code into the vocabulary file?

parser.add_argument('--control_code', type=str, required=True,
                                        help='control code to use for this file. must be in the vocabulary, else it will error out.')

You can't add a control code that isn't in the vocabulary. Typically, exact specification of the token is not needed, a synonym (or really, any unused code) should work. What did you have in mind?

Patent classification codes which may have 600+ codes. Assuming that such codes are not in vocabulary, would CTRL model work?

No, they certainly need to be in the vocabulary at the moment. But, maybe what can work is that you can just start using word2idx[0], word2idx[1], ..., word2idx[600] as the control codes? It's convenient for the word used for the control code to have some relevance to the domain but that's not important to the model.

Interesting ideas & thanks. Let me try later.