mit-han-lab / lite-transformer

[ICLR 2020] Lite Transformer with Long-Short Range Attention

Home Page:https://arxiv.org/abs/2004.11886

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Applying factorized embedding

asharma20 opened this issue · comments

What is get_input_transform() in fairseq/models/transformer_multibranch_v2.py used for? I'm trying to apply factorized embedding parameterization like the ALBERT model and was wondering if I could somehow use this function.

Hi, thank you for your question! We are sorry, but the function is deprecated and not used in the current model. Originally, it is used for generating several layers to transform the input embeddings. If you want to apply factorized embedding parameterization, you may also need to change the encoder_embed_tokens and decoder_embed_tokens, which are embedding lookup tables mapping the word indices to embeddings.

Thank you for the quick response. I've tried changing encoder_embed_tokens and decoder_embed_tokens to a different dimension (128) and then adding Linear() layer in the encoder and decoder to project the embeddings to the original embed dimension (320) but I'm getting a shape mismatch in the decoder output_layer() since features is now (320) and self.embed_out is based on the embed_tokens (128). I'm not sure what else I need to change to resolve that. Do you have any suggestions?

return F.linear(features, self.embed_out)

By default, the translation tasks for large datasets use the same weight of the embedding lookup table (or embed_tokens) for the generator. If you want to apply a linear transform to the input embeddings, you may also need another linear transform for the output features to change the dimension back. You can add the following code just before the return of TransformerDecoder.extract_features for that purpose.

if self.project_out_dim is not None:
    x = self.project_out_dim(x)

Yes that was what I was missing. Thank you!