All things 1.58 Bit

Roadmap

Check if we can pretrain from scratchwith 1.58 Bit (random initialized) (we are here)
Initialize 1.58 Bit from Mixtral/Mistral Weights (we are here)
Continued pretraining
Move to ASIC
AGI (in 1.58 bit, on ASIC)

python3 -m venv venv
. ./venv/bin/activate
cd hqq && pip install -e .

test HQQ -> fork -> 1.58bit
Compare performance of trained bitnet model with 1.58-hqq-quantized pretrained model
- Given data $D$: Compare $Bitnet(D)$ with $HQQ_{1.58}(Model_{fp16/fp32}(D))$
2bit quant llama / bitsandbytes

Model	Dataset	Quant	Groupsize	PPL
TheBloke/Llama-2-7B-fp16	wikitext + wikitext_wikitext-2-raw-v1, validation splits	HQQ 1.58	16	400.46
TheBloke/Llama-2-7B-fp16	wikitext + wikitext_wikitext-2-raw-v1, validation splits	HQQ 1.58	8	8.69
TheBloke/Llama-2-7B-fp16	wikitext + wikitext_wikitext-2-raw-v1, validation splits	FP16	-	5.18
TheBloke/Llama-2-13B-fp16	wikitext + wikitext_wikitext-2-raw-v1, validation splits	HQQ 1.58	16	48.23
TheBloke/Llama-2-13B-fp16	wikitext + wikitext_wikitext-2-raw-v1, validation splits	HQQ 1.58	8	7.2732

Other

Language:Python 89.5%Language:Cuda 7.9%Language:C++ 2.7%