nshepperd / gpt-2

Code for the paper "Language Models are Unsupervised Multitask Learners"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Why the label of training is like this

SchenbergZY opened this issue · comments

From the code in train.py i found the loss function:

        loss = tf.reduce_mean(
            tf.nn.sparse_softmax_cross_entropy_with_logits(
                labels=context[:, 1:], logits=output['logits'][:, :-1]))

But why does it have the slice [:, 1:] in labels and [:, :-1] in logits? why the slices are not the same?