Qubitium's repositories
alpaca-lora
Instruct-tune LLaMA on consumer hardware
android-app
Official ProtonVPN Android app
flash-attention
Fast and memory-efficient exact attention
flashinfer
FlashInfer: Kernel Library for LLM Serving
gemma_pytorch
The official PyTorch implementation of Google's Gemma models
lm-format-enforcer
Enforce the output format (JSON Schema, Regex etc) of a language model
sglang
SGLang is a structured generation language designed for large language models (LLMs). It makes your interaction with models faster and more controllable.
auto-round
SOTA Weight-only Quantization Algorithm for LLMs
AutoAWQ
AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
C4_200M-synthetic-dataset-for-grammatical-error-correction
This dataset contains synthetic training data for grammatical error correction. The corpus is generated by corrupting clean sentences from C4 using a tagged corruption model. The approach and the dataset are described in more detail by Stahlberg and Kumar (2021) (https://www.aclweb.org/anthology/2021.bea-1.4/)
GPTQ-for-LLaMa
4 bits quantization of LLaMa using GPTQ
GPTQ-triton
GPTQ inference Triton kernel
hyperDB
A hyper-fast local vector database for use with LLM Agents. Now accepting SAFEs at $35M cap.
llama.cpp
Port of Facebook's LLaMA model in C/C++
protonvpn-cli-ng
Linux command-line client for ProtonVPN. Written in Python.
qlora
QLoRA: Efficient Finetuning of Quantized LLMs
the-algorithm
Source code for Twitter's Recommendation Algorithm
unsloth
5X faster 60% less memory QLoRA finetuning
vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
ZeroTierOne
A Smart Ethernet Switch for Earth