karpathy / minGPT

A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

play_math AdditionDataset.__get_item__ return value?

SpeedCoder5 opened this issue · comments

In the case of train_dataset[0], wherein self.permutation_array[0] contains 4717, why does get_item return x, y as
(tensor([4, 7, 1, 7, 0, 6]), tensor([-100, -100, -100, 0, 6, 4]))
and not
(tensor([4, 7, 1, 7]), tensor([-100, -100, -100, -100, 0, 6, 4]))
or
(tensor([4, 7, 1, 7, -100, -100, -100]), tensor([-100, -100, -100, -100, 0, 6, 4]))

This question is not about the implementation of the function, rather it is about how the return value is used with minGPT. Is minGPT only trying to predict the the last digit, i.e. '4'? Not the last 3 digits '064'? Why are the last 3 digits not entirely excluded from x ? Why does y not include the first digit masked out?