karpathy / minGPT

A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

TPU/GPU training: KeyError 'pos_emb'

tech509201941 opened this issue · comments

Hi,

I am currently testing the char notebook.
Everything works fine while CPU training, but if I try to execute the same code on a GPU/TPU the following error occurs:

Exception has occurred: KeyError 'pos_emb'

If I simply remove the problematic code line:

no_decay.add('pos_emb')

It kind of works also in GPU/TPU training but the loss oscillation gets stuck and practically no improvement (or opposite) is made while training like it happens while CPU training where the loss is obviously oscillating with same code base.

Can anyone explain to me how it is possible to solve this KeyError without corrupting the no_decay set?
Thanks a lot! :)