model self-attention hardcoded to 4 heads

Question

SpeedCoder5 opened this issue 2 years ago · comments

The self attention block is hard-coded to 4 heads. Suggest using n_heads from config instead.

SpeedCoder5 · Answer 1 · Tue Apr 26 2022 05:07:31 GMT+0800 (China Standard Time)

submitted PR #72