llama2.c-to-ncnn

A converter for llama2.c legacy models to ncnn models. Currently, this is only tested on the 7B and 13B model.

Compiling

Set the NCNN_DIR directory to your directory for your ncnn source tree or it will search for ncnn in the parent directory, build ncnn first.
The ncnn source tree must contain code for the LinearInt8 layer, at Tencent/ncnn#5007.

git clone --depth=1 https://github.com/lrw04/llama2.c-to-ncnn
cd llama2.c-to-ncnn
mkdir build
cd build
cmake ..
make -j$(nproc)

You will get two binaries in the current folder, convert and inference. convert is the converter from llama2.c legacy format to ncnn format, and inference is an example of how to use the resulting ncnn models.

Converting Meta's weights

Use convert.py:

python convert.py --outfile <output file> <model directory>

Converting weights into ncnn's format

./convert <output file> <ncnn model name>

# example
./convert stories15M.bin stories15M.ncnn

You will get stories15M.ncnn.bin, stories15M.ncnn.param and stories15M.ncnn.desc. For any model with the three files, the common name 7b.ncnn is used to denote the model.

Get tokenizer model

Please retrieve it from https://github.com/karpathy/llama2.c. It is under the name of tokenizer.bin. Preferredly, obtain it from an older commit and you will have proper newlines instead of <0x0A>.

Complete text using the resulting model

./inference <MODEL> <PROMPT> <N-TOKENS>

# exapmle
./inference stories15M.ncnn "Tell Something" 64

Example outputs

Iwasawa theory is an isomorphism of the following four categories: (i) Category of $k$-rational points of $\mathcal{G}$ and (ii) Category of $k$-rational points of $\mathcal{G}/\mathcal{K}_H$, where $\mathcal{K}_H$ denotes the relative kernel of the Hilbert modular abelian scheme $\mathcal{H}$, (iii)

~~Typical nonsense generated by a small LLM~~

Whoo. Finally, finally, finally, FINALLY the first episode of the second season of “Sweet Enemy” is out.<0x0A>This episode is two hours long and covers 4 chapters of the manga.<0x0A>I must say, I’ve never seen a dramatic adaptation of a manga that covers so much manga volume in just one episode. But I’m definitely not complaining.<0x0A>Please enjoy the episode. I can’t wait for the next one!<0x0A>P.S. Next time, we’ll be adding Subtitles, Captions and Chinese to English.<0x0A>P.S.S. We’re also considering adding a new audio stream for the “original” audio, so that you can switch between Japanese and English audio.

All of the above are generated with the 7B model.

TODO

KV cache
Chat completion support
int8 quantization
Reduce memory usage during conversion

Ma-Dan / llama2.c-to-ncnn