linjieli222 / HERO

Research code for EMNLP 2020 paper "HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training"

Home Page:https://arxiv.org/abs/2005.00200

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

question about vocab size

taeho-kil opened this issue · comments

In pre-train configuration file "hero_pretrain.json", the vocab size of f_config is 50,265 (it may be from RoBERTa model).

However, The pre-trained model 'hero-tv-ht100.pt" has the vocab size of f_config as 50,272 (I check the dimension of the model.v_encoder.f_encoder.lm_head.decoder)

When the 'hero-tv-ht100.pt' model is trained, which configuration file is used?

@xellows1305

Thank you for your interests in our project and sorry for the late response.

When we pre-train the model, ht_pretrain.json is used as the config file.
The vocab size change comes from

self.v_encoder.f_encoder.pad_vocab()

In pad_vocab(), we pad the word embeddings to be multiple of 8 to fully utilize the tensor cores in our GPUs.

You can also refer to this function, where the padding is implemented at:

def pad_tensor_to_mul(tensor, dim=0, mul=8):
""" pad tensor to multiples (8 for tensor cores) """
t_size = list(tensor.size())
n_pad = mul - t_size[dim] % mul
if n_pad == mul:
n_pad = 0
padded_tensor = tensor
else:
t_size[dim] = n_pad
pad = torch.zeros(*t_size, dtype=tensor.dtype, device=tensor.device)
padded_tensor = torch.cat([tensor, pad], dim=dim)
return padded_tensor, n_pad

Thanks.

Closed due to inactivity.