Beast code in Giters

Iman Tabrizian's starred repositories

mlx

MLX: An array framework for Apple silicon

Language:C++MIT16253 140 492

dspy

DSPy: The framework for programming—not prompting—foundation models

Language:PythonMIT16043 134 676

flash-attention

Fast and memory-efficient exact attention

Language:PythonBSD-3-Clause13083 115 983

triton

Development repository for the Triton language and compiler

Language:C++MIT12327 185 1368

nn-zero-to-hero

Neural Networks: Zero to Hero

Language:Jupyter NotebookMIT11384 282 30

the-incredible-pytorch

The Incredible PyTorch: a curated list of tutorials, papers, projects, communities and more relating to PyTorch.

MIT11301 469 23

micrograd

A tiny scalar-valued autograd engine and a neural net library on top of it with PyTorch-like API

Language:Jupyter NotebookMIT9862 149 29

minbpe

Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.

Language:PythonMIT8952 82 36

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

Language:C++Apache-2.07994 87 1739

warp

A Python framework for high performance GPU simulation and graphics

Language:PythonNOASSERTION4031 55 213

paper-qa

LLM Chain for answering questions from documents with citations

Language:PythonApache-2.03821 40 141

pixi

Package management made easy

Language:RustBSD-3-Clause2775 21 801

makemore

An autoregressive character-level language model for making more things

Language:PythonMIT2417 33 8

libcudacxx

[ARCHIVED] The C++ Standard Library for your entire system. See https://github.com/NVIDIA/cccl

Language:C++NOASSERTION2294 68 94

twinny

The most no-nonsense, locally or API-hosted AI code completion plugin for Visual Studio Code - like GitHub Copilot but completely free and 100% private.

Language:TypeScriptMIT2279 14 154

blog

Some notes on things I find interesting and important.

Language:JavaScript1955 259 11

marl

A hybrid thread / fiber task scheduler written in C++ 11

Language:C++Apache-2.01838 54 69

S-LoRA

S-LoRA: Serving Thousands of Concurrent LoRA Adapters

Language:PythonApache-2.01672 24 38

stdexec

`std::execution`, the proposed C++ framework for asynchronous and parallel programming.

Language:C++Apache-2.01473 55 535

Essentials-of-Compilation

A book about compiling Racket and Python to x86-64 assembly

Language:TeX1272 53 104

llama3.np

llama3.np is a pure NumPy implementation for Llama 3 model.

Language:PythonMIT946 13 4

pytriton

PyTriton is a Flask/FastAPI-like interface that simplifies Triton's deployment in Python environments.

Language:PythonApache-2.0706 18 73

rmm

RAPIDS Memory Manager

Language:C++Apache-2.0461 27 390

onnxruntime-genai

Generative AI extensions for onnxruntime

Language:C++MIT392 45 207

extending-jax

Extending JAX with custom C++ and CUDA code

Language:PythonMIT368 10 6

multi-core-python

Enabling CPython multi-core parallelism via subinterpreters.

BSD-3-Clause244 49 82

cuda-checkpoint

CUDA checkpoint and restore utility

Language:CudaNOASSERTION183 22 13

multipy

torch::deploy (multipy for non-torch uses) is a system that lets you get around the GIL problem by running multiple Python interpreters in a single C++ process.

Language:C++NOASSERTION169 15 58

extrainterpreters

Utilities for using Python's PEP 554 subinterpreters

Language:PythonLGPL-3.0106 12 7

vscode-micromamba

A VSCode extension to generate development environments using micromamba and conda-forge package repository

Language:TypeScriptBSD-3-Clause81 6 16