Training based on Teacher forcing technique

Question

Training based on Teacher forcing technique

omidvaramin opened this issue 2 years ago · comments

Hi,
Thank you for your code,
I have a question regarding the way the model is being trained,
In the paper it is mentioned T5 is being trained based on the teacher forcing technique which for each time stamp in the decoding part the input should be from the ground truth data not the previously generated token, but in your code your model will generate the entire output by itself trough the following line:
outputs = model(input_ids = ids, attention_mask = mask, decoder_input_ids=y_ids, lm_labels=lm_labels)
loss = outputs[0]
Is my assumption correct that you do not use teacher forcing technique? thanks