justheuristic's starred repositories

llama

Inference code for LLaMA models

Language:PythonLicense:NOASSERTIONStargazers:50895Issues:499Issues:872

text-generation-webui

A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.

Language:PythonLicense:AGPL-3.0Stargazers:37330Issues:324Issues:3441

numpy

The fundamental package for scientific computing with Python.

Language:PythonLicense:NOASSERTIONStargazers:26604Issues:596Issues:12435

llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.

peft

🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.

Language:PythonLicense:Apache-2.0Stargazers:14329Issues:106Issues:912

qlora

QLoRA: Efficient Finetuning of Quantized LLMs

Language:Jupyter NotebookLicense:MITStargazers:9560Issues:85Issues:244

al-folio

A beautiful, simple, clean, and responsive Jekyll theme for academics

Language:HTMLLicense:MITStargazers:9417Issues:22Issues:513

llama-recipes

Scripts for fine-tuning Llama2 with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization & question answering. Supporting a number of candid inference solutions such as HF TGI, VLLM for local or cloud deployment.Demo apps to showcase Llama2 for WhatsApp & Messenger

Language:Jupyter NotebookLicense:NOASSERTIONStargazers:7850Issues:68Issues:227

StableCascade

Official Code for Stable Cascade

Language:Jupyter NotebookLicense:MITStargazers:6382Issues:58Issues:117

GitTorrent

A decentralization of GitHub using BitTorrent and Bitcoin

Language:JavaScriptLicense:MITStargazers:4743Issues:234Issues:63

AutoGPTQ

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

Language:PythonLicense:MITStargazers:3921Issues:34Issues:422

alpa

Training and serving large-scale neural networks with auto parallelization.

Language:PythonLicense:Apache-2.0Stargazers:2995Issues:45Issues:295

GPTQ-for-LLaMa

4 bits quantization of LLaMA using GPTQ

Language:PythonLicense:Apache-2.0Stargazers:2931Issues:42Issues:216

mixtral-offloading

Run Mixtral-8x7B models in Colab or consumer desktops

Language:PythonLicense:MITStargazers:2262Issues:30Issues:25

llm-awq

[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Language:PythonLicense:MITStargazers:1939Issues:23Issues:149

ytsaurus

YTsaurus is a scalable and fault-tolerant open-source big data platform.

Language:C++License:Apache-2.0Stargazers:1777Issues:30Issues:289

aesara

Aesara is a Python library for defining, optimizing, and efficiently evaluating mathematical expressions involving multi-dimensional arrays.

Language:PythonLicense:NOASSERTIONStargazers:1165Issues:20Issues:692

FBGEMM

FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/

Language:C++License:NOASSERTIONStargazers:1084Issues:52Issues:137

AQLM

Official Pytorch repository for Extreme Compression of Large Language Models via Additive Quantization https://arxiv.org/pdf/2401.06118.pdf

Language:PythonLicense:Apache-2.0Stargazers:872Issues:18Issues:51
Language:PythonLicense:Apache-2.0Stargazers:512Issues:22Issues:21

hqq

Official implementation of Half-Quadratic Quantization (HQQ)

Language:PythonLicense:Apache-2.0Stargazers:509Issues:14Issues:56
Language:PythonLicense:GPL-3.0Stargazers:434Issues:10Issues:44

H2O

[NeurIPS'23] H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models.

knn-transformers

PyTorch + HuggingFace code for RetoMaton: "Neuro-Symbolic Language Modeling with Automaton-augmented Retrieval" (ICML 2022), including an implementation of kNN-LM and kNN-MT

Language:PythonLicense:MITStargazers:265Issues:4Issues:11

torch_cg

Preconditioned Conjugate Gradient in Pytorch

Language:PythonLicense:MITStargazers:124Issues:5Issues:5

FastBinarySearch

Fast and vectorizable algorithms for searching in a vector of sorted floating point numbers

Language:C++License:MITStargazers:100Issues:2Issues:2

local-search-quantization

State-of-the-art method for large-scale ANN search as of Oct 2016. Presented at ECCV 16.

Language:JuliaLicense:MITStargazers:73Issues:4Issues:5

fast-hadamard-transform

Fast Hadamard transform in CUDA, with a PyTorch interface

Language:CLicense:BSD-3-ClauseStargazers:56Issues:3Issues:3

huffman

Generate Huffman codes with Python

Language:PythonLicense:MITStargazers:20Issues:4Issues:3

weighted-low-rank-bert-compression

Using weighted low-rank approximation to compress BERT.

Language:Jupyter NotebookStargazers:2Issues:0Issues:0