Qubitium-ModelCloud (Qubitium)

Qubitium

User data from Github https://github.com/Qubitium

Company:ModelCloud.ai

Location:Earth/Epoch 2.0

Home Page:https://modelcloud.ai

GitHub:@Qubitium

Twitter:@qubitium

Qubitium-ModelCloud's repositories

AutoGPTQ

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

Language:PythonLicense:Apache-2.0Stargazers:2Issues:0Issues:0

alpaca-lora

Instruct-tune LLaMA on consumer hardware

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:0Issues:0Issues:0

flash-attention

Fast and memory-efficient exact attention

Language:PythonLicense:BSD-3-ClauseStargazers:0Issues:0Issues:0

flashinfer

FlashInfer: Kernel Library for LLM Serving

Language:CudaLicense:Apache-2.0Stargazers:0Issues:0Issues:0

gemma_pytorch

The official PyTorch implementation of Google's Gemma models

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

lm-format-enforcer

Enforce the output format (JSON Schema, Regex etc) of a language model

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

sglang

SGLang is a structured generation language designed for large language models (LLMs). It makes your interaction with models faster and more controllable.

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

accelerate

🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

auto-round

SOTA Weight-only Quantization Algorithm for LLMs

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

AutoAWQ

AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

BitBLAS

BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

clod-code

rot13 version of claw code

Language:Grammatical FrameworkStargazers:0Issues:0Issues:0
Language:Jupyter NotebookLicense:MITStargazers:0Issues:0Issues:0

evalplus

Rigourous evaluation of LLM-synthesized code - NeurIPS 2023 & COLM 2024

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

FastChat

The release repo for "Vicuna: An Open Chatbot Impressing GPT-4"

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0
License:Apache-2.0Stargazers:0Issues:0Issues:0

GPTQ-for-LLaMa

4 bits quantization of LLaMa using GPTQ

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

GPTQ-triton

GPTQ inference Triton kernel

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:0Issues:0Issues:0

GPTQModel

Production ready LLM model compression/quantization toolkit with accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.

License:Apache-2.0Stargazers:0Issues:0Issues:0

hqq

Official implementation of Half-Quadratic Quantization (HQQ)

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

hyperDB

A hyper-fast local vector database for use with LLM Agents. Now accepting SAFEs at $35M cap.

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

llama.cpp

Port of Facebook's LLaMA model in C/C++

Language:CLicense:MITStargazers:0Issues:0Issues:0

mav

model activation visualiser

License:MITStargazers:0Issues:0Issues:0

pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Language:PythonLicense:NOASSERTIONStargazers:0Issues:0Issues:0

qlora

QLoRA: Efficient Finetuning of Quantized LLMs

Language:Jupyter NotebookLicense:MITStargazers:0Issues:0Issues:0

QQQ

QQQ is an innovative and hardware-optimized W4A8 quantization solution for LLMs.

Language:PythonStargazers:0Issues:0Issues:0

the-algorithm

Source code for Twitter's Recommendation Algorithm

Language:ScalaLicense:AGPL-3.0Stargazers:0Issues:1Issues:0

transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

unsloth

5X faster 60% less memory QLoRA finetuning

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0