Andrej's starred repositories

build-your-own-x

Master programming by recreating your favorite technologies from scratch.

langchain

🦜🔗 Build context-aware reasoning applications

Language:Jupyter NotebookLicense:MITStargazers:89812Issues:675Issues:7269

whisper

Robust Speech Recognition via Large-Scale Weak Supervision

Language:PythonLicense:MITStargazers:65124Issues:543Issues:0

llama.cpp

LLM inference in C/C++

tesseract

Tesseract Open Source OCR Engine (main repository)

Language:C++License:Apache-2.0Stargazers:59947Issues:1686Issues:2625

Open-Assistant

OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.

Language:PythonLicense:Apache-2.0Stargazers:36866Issues:430Issues:1641

llama_index

LlamaIndex is a data framework for your LLM applications

Language:PythonLicense:MITStargazers:33966Issues:242Issues:4596

stanford_alpaca

Code and documentation to train Stanford's Alpaca models, and generate the data.

Language:PythonLicense:Apache-2.0Stargazers:29200Issues:341Issues:267

diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.

Language:PythonLicense:Apache-2.0Stargazers:24275Issues:192Issues:3830

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Language:PythonLicense:Apache-2.0Stargazers:23735Issues:221Issues:3629

llamafile

Distribute and run LLMs with a single file.

Language:C++License:NOASSERTIONStargazers:17816Issues:163Issues:375

mlx

MLX: An array framework for Apple silicon

unsloth

Finetune Llama 3.1, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory

Language:PythonLicense:Apache-2.0Stargazers:13168Issues:91Issues:631

litgpt

20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.

Language:PythonLicense:Apache-2.0Stargazers:9072Issues:87Issues:710

xformers

Hackable and optimized Transformers building blocks, supporting a composable construction.

Language:PythonLicense:NOASSERTIONStargazers:8104Issues:80Issues:501

TinyLlama

The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.

Language:PythonLicense:Apache-2.0Stargazers:7406Issues:110Issues:150

xv6-riscv

Xv6 for RISC-V

Language:CLicense:NOASSERTIONStargazers:6664Issues:96Issues:88

lit-llama

Implementation of the LLaMA language model based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed.

Language:PythonLicense:Apache-2.0Stargazers:5904Issues:67Issues:269

gemma.cpp

lightweight, standalone C++ inference engine for Google's Gemma models.

Language:C++License:Apache-2.0Stargazers:5823Issues:38Issues:77

gpt-fast

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.

Language:PythonLicense:BSD-3-ClauseStargazers:5392Issues:63Issues:96

gemma_pytorch

The official PyTorch implementation of Google's Gemma models

Language:PythonLicense:Apache-2.0Stargazers:5191Issues:38Issues:37

AlphaCodium

Official implementation for the paper: "Code Generation with AlphaCodium: From Prompt Engineering to Flow Engineering""

Language:PythonLicense:AGPL-3.0Stargazers:3315Issues:51Issues:17

cramming

Cramming the training of a (BERT-type) language model into limited compute.

Language:PythonLicense:MITStargazers:1263Issues:22Issues:34

hlb-CIFAR10

Train to 94% on CIFAR-10 in <6.3 seconds on a single A100. Or ~95.79% in ~110 seconds (or less!)

Language:PythonLicense:Apache-2.0Stargazers:1203Issues:20Issues:3

yet-another-applied-llm-benchmark

A benchmark to evaluate language models on questions I've previously asked them to solve.

Language:PythonLicense:GPL-3.0Stargazers:811Issues:17Issues:9

fine-tune-mistral

Fine-tune mistral-7B on 3090s, a100s, h100s

Language:PythonLicense:MITStargazers:696Issues:6Issues:5

attorch

A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.

Language:PythonLicense:MITStargazers:424Issues:8Issues:2

inbox_cleaner

A python script to help manage a Gmail inbox by filtering out promotional emails using GPT-3 or GPT-4.

llm_rules

RuLES: a benchmark for evaluating rule-following in language models

Language:PythonLicense:Apache-2.0Stargazers:202Issues:2Issues:3

bpeasy

Fast bare-bones BPE for modern tokenizer training

Language:PythonLicense:MITStargazers:129Issues:2Issues:0