berlino

Bailin's starred repositories

llama.cpp

LLM inference in C/C++

Language:C++MIT58764 507 3050

tinygrad

You like pytorch? You like micrograd? You love tinygrad! ❤️

Language:PythonMIT24282 266 614

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Language:PythonApache-2.019928 192 2817

shap-e

Generate 3D objects conditioned on text or images

Language:PythonMIT11366 238 109

mistral-src

Reference implementation of Mistral AI 7B v0.1 model.

Language:Jupyter NotebookApache-2.08772 116 115

Scripts for fine-tuning Llama2 with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization & question answering. Supporting a number of candid inference solutions such as HF TGI, VLLM for local or cloud deployment.Demo apps to showcase Llama2 for WhatsApp & Messenger

Language:Jupyter NotebookNOASSERTION7850 68 227

open_llama

OpenLLaMA, a permissively licensed open source reproduction of Meta AI’s LLaMA 7B trained on the RedPajama dataset

Apache-2.07217 117 90

gpt-fast

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.

Language:PythonBSD-3-Clause5210 59 86

GPU-Puzzles

Solve puzzles. Learn CUDA.

Language:Jupyter NotebookMIT5132 27 25

cutlass

CUDA Templates for Linear Algebra Subroutines

Language:C++NOASSERTION4672 106 864

keops

KErnel OPerationS, on CPUs and GPUs, with autodiff and without memory overflows

Language:PythonMIT1007 14 298

LookaheadDecoding

Language:PythonApache-2.0998 10 53

safari

Convolutions for Sequence Modeling

Language:AssemblyApache-2.0842 36 38

m2

Repo for "Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture"

Language:AssemblyApache-2.0501 21 24

H3

Language Modeling with the H3 State Space Model

Language:AssemblyApache-2.0496 32 26

rellm

Exact structure out of any language model completion.

Language:PythonMIT492 11 4

attention_with_linear_biases

Code for the ALiBi method for transformer language models (ICLR 2022)

Language:PythonMIT480 11 18

SGEMM_CUDA

Fast CUDA matrix multiplication from scratch

Language:CudaMIT289 2 4

liquid-s4

Liquid Structural State-Space Models

Language:PythonApache-2.0253 15 2

GenSim

GenSim: Generating Robotic Simulation Tasks via Large Language Models

Language:PythonMIT247 7 9

ModuleFormer

ModuleFormer is a MoE-based architecture that includes two different types of experts: stick-breaking attention heads and feedforward experts. We released a collection of ModuleFormer-based Language Models (MoLM) ranging in scale from 4 billion to 8 billion parameters.

Language:PythonApache-2.0216 11 5

berlino

Bailin's starred repositories

llama.cpp

tinygrad

vllm

shap-e

mistral-src

llama-recipes

open_llama

gpt-fast

GPU-Puzzles

cutlass

keops

LookaheadDecoding

safari

m2

H3

rellm

attention_with_linear_biases

SGEMM_CUDA

liquid-s4

GenSim

ModuleFormer

llm_large_context

zoology

BiGS

explain-then-translate

ConvRe

pytorch_linear_rnn

Logical-and-abstract-reasoning

pyccg

rnn_typology