用Numpy复现可训练的LLaMa3
为了获得更好的体验,请移步飞书文档: https://aw8o2u3n233.feishu.cn/wiki/Pc7swzMMZiYnP5krcrzcHkQUn1a?from=from_copylink
作业原本提供: https://github.com/naklecha/llama3-from-scratch
llama3(不可训练)numpy实现: https://github.com/likejazz/llama3.np
Baby llama: https://github.com/DLLXW/baby-llama2-chinese
Atom7b: https://github.com/LlamaFamily/Llama-Chinese
RMSnorm论文: https://arxiv.org/abs/1910.07467
llama3的ffn SwiGLU论文: https://arxiv.org/abs/2002.05202
注意力论文: https://arxiv.org/abs/1706.03762
注意力介绍博文: https://spaces.ac.cn/archives/4765
注意力介绍: https://armanasq.github.io/nlp/self-attention/
RoPE论文RoFormer: https://arxiv.org/abs/2104.09864
RoPE原作者亲自讲解: https://spaces.ac.cn/archives/8265
RoPE介绍与实现: https://blog.eleuther.ai/rotary-embeddings/
RoPE两种实现的引发的不同: huggingface/transformers#25199
关于权重共享: https://spaces.ac.cn/archives/9698
『ゼロから作る Deep Learning ❸』(O'Reilly Japan, 2020): https://github.com/oreilly-japan/deep-learning-from-scratch-3
上面那本书的读者实现的stack: https://github.com/laksjdjf/dezero-diffusion/blob/main/modules/unet.py