Michael Goin's repositories
learned_indexes
Experiments on ideas proposed in Tim Kraska's "The Case for Learned Index Structures"
MPT-Medical-Chatbot
This is a medical bot built using MPT and Sentence Transformers. The bot is powered by DeepSparse, Langchain, and Chainlit. The bot runs on a decent CPU machine with a minimum of 16GB of RAM.
torch_bitmask
Implementations for fast bitmask compression for weight sparsity in PyTorch
AutoGPTQ
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
BitBLAS
BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.
clip-retrieval
Easily compute clip embeddings and build a clip retrieval system with them
flash-attention
Fast and memory-efficient exact attention
huggingface.js
Utilities to use the Hugging Face Hub API
inference
Reference implementations of MLPerf™ inference benchmarks
langchain
⚡ Building applications with LLMs through composability ⚡
llama-cpp-python
Python bindings for llama.cpp
llmperf
LLMPerf is a library for validating and benchmarking LLMs
lm-evaluation-harness
A framework for few-shot evaluation of autoregressive language models.
mteb
MTEB: Massive Text Embedding Benchmark
optimum
🏎️ Accelerate training and inference of 🤗 Transformers with easy to use hardware optimization tools
sparsegpt
Code for the ICML 2023 paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot".
transformers
🤗Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.
vllm
A high-throughput and memory-efficient inference and serving engine for LLMs