karpathy / minGPT

A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

model self-attention hardcoded to 4 heads

SpeedCoder5 opened this issue · comments

The self attention block is hard-coded to 4 heads. Suggest using n_heads from config instead.

submitted PR #72