model self-attention hardcoded to 4 heads
SpeedCoder5 opened this issue · comments
SpeedCoder5 commented
The self attention block is hard-coded to 4 heads. Suggest using n_heads from config instead.
SpeedCoder5 commented
submitted PR #72
A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training
SpeedCoder5 opened this issue · comments
The self attention block is hard-coded to 4 heads. Suggest using n_heads from config instead.
submitted PR #72