akkikiki

Yoshinari Fujinuma's starred repositories

nanoGPT

The simplest, fastest repository for training/finetuning medium-sized GPTs.

Language:PythonMIT35115 353 305

mamba

Mamba SSM architecture

Language:PythonApache-2.011890 98 436

mistral-inference

Official inference library for Mistral models

Language:Jupyter NotebookApache-2.09328 120 129

trl

Train transformer language models with reinforcement learning.

Language:PythonApache-2.08870 75 1015

DiT

Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"

Language:PythonNOASSERTION5762 46 75

alignment-handbook

Robust recipes to align language models with human and AI preferences

Language:PythonApache-2.04267 111 124

mergekit

Tools for merging pretrained large language models.

Language:PythonLGPL-3.04167 47 261

Vim

[ICML 2024] Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

Language:PythonApache-2.02653 30 101

mamba-minimal

Simple, minimal implementation of the Mamba SSM in one file of PyTorch.

Language:PythonApache-2.02463 23 24

deep_learning_curriculum

Language model alignment-focused deep learning curriculum

1174 17 1

nanotron

Minimalistic large language model 3D-parallelism training

Language:PythonApache-2.0995 40 66

shell-ai

LangChain powered shell command generator and runner CLI

Language:PythonMIT982 14 21

Triton-Puzzles

Puzzles for learning Triton

Language:Jupyter NotebookApache-2.0885 7 9

mamba.py

A simple and efficient Mamba implementation in pure PyTorch and MLX.

Language:PythonMIT813 4 25

template

This is the repository for the distill web framework

Language:JavaScriptApache-2.0777 14 97

ultravox

Language:PythonMIT699 15 17

recurrentgemma

Open weights language model from Google DeepMind, based on Griffin.

Language:PythonApache-2.0578 18 7

gpt_paper_assistant

GPT4 based personalized ArXiv paper assistant bot

Language:PythonApache-2.0442 6 10

open_lm

A repository for research on medium sized language models.

Language:PythonMIT423 22 60

SAELens

Training Sparse Autoencoders on Language Models

Language:HTMLMIT239 9 72

CoLT5-attention

Implementation of the conditionally routed attention in the CoLT5 architecture, in Pytorch

Language:PythonMIT218 8 7

sae

Sparse autoencoders

Language:PythonMIT204 5 1

ocr-post-correction

Language:PythonNOASSERTION130 5 8

mixture-of-attention

Some personal experiments around routing tokens to different autoregressive attention, akin to mixture-of-experts

Language:PythonMIT100 80

train-with-fsdp

Language:PythonMIT89 4 1

function_vectors

Function Vectors in Large Language Models (ICLR 2024)

Language:Python88 3 6

kotomamba

Mamba training library developed by kotoba technologies

Language:PythonApache-2.061 50

parallelizing_linear_rnns

Language:TeXMIT40 4 1

mamba-triton

Language:Python39 1 1

MLLM-Judge

[ICML 2024 Oral] Official code repository for MLLM-as-a-Judge.

Language:Python2500