Qubitium

User data from Github https://github.com/Qubitium

followers

following

stars

ModelCloud.ai

Earth/Epoch 2.0

https://modelcloud.ai

Qubitium-ModelCloud's repositories

AutoGPTQ

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

Language:PythonApache-2.0200

alpaca-lora

Instruct-tune LLaMA on consumer hardware

Language:Jupyter NotebookApache-2.0000

flash-attention

Fast and memory-efficient exact attention

Language:PythonBSD-3-Clause000

flashinfer

FlashInfer: Kernel Library for LLM Serving

Language:CudaApache-2.0000

gemma_pytorch

The official PyTorch implementation of Google's Gemma models

Language:PythonApache-2.0000

lm-format-enforcer

Enforce the output format (JSON Schema, Regex etc) of a language model

Language:PythonMIT000

sglang

SGLang is a structured generation language designed for large language models (LLMs). It makes your interaction with models faster and more controllable.

Language:PythonApache-2.0000

accelerate

🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support

Language:PythonApache-2.0000

auto-round

SOTA Weight-only Quantization Algorithm for LLMs

Language:PythonApache-2.0000

AutoAWQ

AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:

Language:PythonMIT000

BitBLAS

BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.

Language:PythonMIT000

clod-code

rot13 version of claw code

Language:Grammatical Framework000

ethos-paper

Language:Jupyter NotebookMIT000

evalplus

Rigourous evaluation of LLM-synthesized code - NeurIPS 2023 & COLM 2024

Language:PythonApache-2.0000

FastChat

The release repo for "Vicuna: An Open Chatbot Impressing GPT-4"

Language:PythonApache-2.0000

GPT-4-LLM

Apache-2.0000

GPTQ-for-LLaMa

4 bits quantization of LLaMa using GPTQ

Language:PythonApache-2.0000

GPTQ-triton

GPTQ inference Triton kernel

Language:Jupyter NotebookApache-2.0000

GPTQModel

Production ready LLM model compression/quantization toolkit with accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.

Apache-2.0000

hqq

Official implementation of Half-Quadratic Quantization (HQQ)

Language:PythonApache-2.0000

hyperDB

A hyper-fast local vector database for use with LLM Agents. Now accepting SAFEs at $35M cap.

Language:PythonMIT000

llama.cpp

Port of Facebook's LLaMA model in C/C++

Language:CMIT000

mav

model activation visualiser

MIT000

pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Language:PythonNOASSERTION000

qlora

QLoRA: Efficient Finetuning of Quantized LLMs

Language:Jupyter NotebookMIT000

QQQ

QQQ is an innovative and hardware-optimized W4A8 quantization solution for LLMs.

Language:Python000

the-algorithm

Source code for Twitter's Recommendation Algorithm

Language:ScalaAGPL-3.0010

transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Language:PythonApache-2.0000

unsloth

5X faster 60% less memory QLoRA finetuning

Language:PythonApache-2.0000

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Language:PythonApache-2.0000