akkikiki

Yoshinari Fujinuma's starred repositories

nanoGPT

The simplest, fastest repository for training/finetuning medium-sized GPTs.

Language:PythonMIT35930 368 312

mamba

Mamba SSM architecture

Language:PythonApache-2.012317 100 484

mistral-inference

Official inference library for Mistral models

Language:Jupyter NotebookApache-2.09457 120 136

trl

Train transformer language models with reinforcement learning.

Language:PythonApache-2.09129 72 1063

DiT

Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"

Language:PythonNOASSERTION5923 47 78

alignment-handbook

Robust recipes to align language models with human and AI preferences

Language:PythonApache-2.04407 109 133

mergekit

Tools for merging pretrained large language models.

Language:PythonLGPL-3.04398 49 288

Vim

[ICML 2024] Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

Language:PythonApache-2.02753 30 107

mamba-minimal

Simple, minimal implementation of the Mamba SSM in one file of PyTorch.

Language:PythonApache-2.02518 23 26

deep_learning_curriculum

Language model alignment-focused deep learning curriculum

1204 17 1

nanotron

Minimalistic large language model 3D-parallelism training

Language:PythonApache-2.01069 42 74

Triton-Puzzles

Puzzles for learning Triton

Language:Jupyter NotebookApache-2.0931 7 9

mamba.py

A simple and efficient Mamba implementation in pure PyTorch and MLX.

Language:PythonMIT874 7 35

ultravox

A fast multimodal LLM for real-time voice

Language:PythonMIT788 21 19

template

This is the repository for the distill web framework

Language:JavaScriptApache-2.0785 13 98

recurrentgemma

Open weights language model from Google DeepMind, based on Griffin.

Language:PythonApache-2.0582 18 7

gpt_paper_assistant

GPT4 based personalized ArXiv paper assistant bot

Language:PythonApache-2.0465 6 10

open_lm

A repository for research on medium sized language models.

Language:PythonMIT463 21 63

SAELens

Training Sparse Autoencoders on Language Models

Language:HTMLMIT341 8 87

sae

Sparse autoencoders

Language:PythonMIT272 7 10

CoLT5-attention

Implementation of the conditionally routed attention in the CoLT5 architecture, in Pytorch

Language:PythonMIT222 8 7

blogcaster

Python tools for easily translating your blog content to podcasts & YouTube

Language:PythonApache-2.0196 2 11

ocr-post-correction

Language:PythonNOASSERTION131 5 8

function_vectors

Function Vectors in Large Language Models (ICLR 2024)

Language:Python102 4 12

mixture-of-attention

Some personal experiments around routing tokens to different autoregressive attention, akin to mixture-of-experts

Language:PythonMIT101 80

train-with-fsdp

Language:PythonMIT89 4 1

kotomamba

Mamba training library developed by kotoba technologies

Language:PythonApache-2.062 50

MLLM-Judge

[ICML 2024 Oral] Official code repository for MLLM-as-a-Judge.

Language:Python43 10

parallelizing_linear_rnns

Language:TeXMIT41 4 1

mamba-triton

Language:Python39 1 1