abhimishra91 / transformers-tutorials

Github repo with tutorials to fine tune transformers for diff NLP tasks

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Training based on Teacher forcing technique

omidvaramin opened this issue · comments

commented

Hi,
Thank you for your code,
I have a question regarding the way the model is being trained,
In the paper it is mentioned T5 is being trained based on the teacher forcing technique which for each time stamp in the decoding part the input should be from the ground truth data not the previously generated token, but in your code your model will generate the entire output by itself trough the following line:
outputs = model(input_ids = ids, attention_mask = mask, decoder_input_ids=y_ids, lm_labels=lm_labels)
loss = outputs[0]
Is my assumption correct that you do not use teacher forcing technique? thanks