linjieli222 / HERO

In pre-train configuration file "hero_pretrain.json", the vocab size of f_config is 50,265 (it may be from RoBERTa model).

However, The pre-trained model 'hero-tv-ht100.pt" has the vocab size of f_config as 50,272 (I check the dimension of the model.v_encoder.f_encoder.lm_head.decoder)

When the 'hero-tv-ht100.pt' model is trained, which configuration file is used?

@xellows1305

Thank you for your interests in our project and sorry for the late response.

When we pre-train the model, ht_pretrain.json is used as the config file.
The vocab size change comes from

HERO/model/model.py

Line 363 in f938515

self.v_encoder.f_encoder.pad_vocab()

In pad_vocab(), we pad the word embeddings to be multiple of 8 to fully utilize the tensor cores in our GPUs.

You can also refer to this function, where the padding is implemented at:

HERO/model/modeling_utils.py

Lines 124 to 135 in f938515

    
           def pad_tensor_to_mul(tensor, dim=0, mul=8): 
        
               """ pad tensor to multiples (8 for tensor cores) """ 
        
               t_size = list(tensor.size()) 
        
               n_pad = mul - t_size[dim] % mul 
        
               if n_pad == mul: 
        
                   n_pad = 0 
        
                   padded_tensor = tensor 
        
               else: 
        
                   t_size[dim] = n_pad 
        
                   pad = torch.zeros(*t_size, dtype=tensor.dtype, device=tensor.device) 
        
                   padded_tensor = torch.cat([tensor, pad], dim=dim) 
        
               return padded_tensor, n_pad

Thanks.

Closed due to inactivity.

	def pad_tensor_to_mul(tensor, dim=0, mul=8):
	""" pad tensor to multiples (8 for tensor cores) """
	t_size = list(tensor.size())
	n_pad = mul - t_size[dim] % mul
	if n_pad == mul:
	n_pad = 0
	padded_tensor = tensor
	else:
	t_size[dim] = n_pad
	pad = torch.zeros(*t_size, dtype=tensor.dtype, device=tensor.device)
	padded_tensor = torch.cat([tensor, pad], dim=dim)
	return padded_tensor, n_pad

question about vocab size