zhenlan0426/NLP_notes

Save and Load transformer model:

    torch.save(model_to_save.state_dict(), os.path.join(OUTPUT_DIR,"pytorch_model.bin"))
    model_to_save.config.to_json_file(os.path.join(OUTPUT_DIR,"config.json"))
    tokenizer.save_vocabulary(OUTPUT_DIR)
    
    tokenizer = transformers.DistilBertTokenizer.from_pretrained(OUTPUT_DIR)
    model = transformers.DistilBertModel.from_pretrained(OUTPUT_DIR)

GPT-2: labels are shifted inside the model, i.e. you can set lm_labels = input_ids

 Example for GPT2DoubleHeadsModel:
 '''inputs (N,C,L)
    mc_token_ids (N,C) last token to use for cls
    mc_labels (N,)
    lm_labels (N,C,L)'''
 import torch
 from transformers import GPT2Tokenizer, GPT2DoubleHeadsModel
 tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
 model = GPT2DoubleHeadsModel.from_pretrained('gpt2')
 
 # Add a [CLS] to the vocabulary (we should train it also!)
 tokenizer.add_special_tokens({'cls_token': '[CLS]'})
 model.resize_token_embeddings(len(tokenizer))  # Update the model embeddings with the new vocabulary size
 print(tokenizer.cls_token_id, len(tokenizer))  # The newly token the last token of the vocabulary
 
 choices = ["Hello, my dog is cute [CLS]", "Hello, my cat is cute [CLS]"]
 encoded_choices = [tokenizer.encode(s) for s in choices]
 cls_token_location = [tokens.index(tokenizer.cls_token_id) for tokens in encoded_choices]
 input_ids = torch.tensor(encoded_choices).unsqueeze(0)  # Batch size: 1, number of choices: 2
 mc_token_ids = torch.tensor([cls_token_location])  # where cls is to be used for classification
 outputs = model(input_ids, mc_token_ids=mc_token_ids)
 lm_prediction_scores, mc_prediction_scores = outputs[:2]

Transformer XL:

 Transformer XL introduces recursive relationship between attention blocks to model dependency longer that one attention block. It also introduce relatives position embedding.
 mem_len should be set to smaller number during training
 same_length. Was set to True as default as original paper. Might be worthwhile to check otherwise on data.

zhenlan0426 / NLP_notes

About