About the BPE file
kugwzk opened this issue · comments
Hi~ @zlccccc @SlotherCui
I notice that there isn't BPE file here. In your token embedding weight, the shape is [49409, 512], but the shape in CLIP is [49408, 512]. Are yours BPE file consistent with CLIP?
If I missed something, please comment~ Thanks a lot!
We add an '<[mask]>' token to perform Masked-Language-Modeling in the language self-supervision. Please refer to:
https://github.com/Sense-GVT/DeCLIP/blob/main/prototype/model/utils/text_utils/simple_tokenizer.py#L73
https://github.com/Sense-GVT/DeCLIP/blob/main/prototype/model/text_encoder/text_transformer.py#L38
@kugwzk You can download the BPE file at here:
https://github.com/Sense-GVT/DeCLIP/blob/main/docs/dataset_prepare.md