okoge-kaz

followers

following

stars

Tokyo Institute of Technology

Tokyo Japan

Organizations

llm-jp

rioyokotalab

SakanaAI

sbintuitions

turingmotors

Kazuki Fujii's repositories

Megatron-LM

Ongoing research training transformer models at scale

Language:PythonNOASSERTION100

NeMo-Aligner

Scalable toolkit for efficient model alignment

Language:PythonApache-2.0000

llm-recipes

Ongoing Research Project for continaual pre-training LLM(dense mode)

Language:Python1500

moe-recipes

Ongoing research training Mixture of Expert models.

Language:Python1600

hpsc-2024

Language:Shell000

ppcomp24

Language:C000

nanotron

Minimalistic large language model 3D-parallelism training

Language:PythonApache-2.0000

torchtitan

A native PyTorch Library for large model training

BSD-3-Clause000

llama3v

A SOTA vision model built on top of llama3 8B.

000

TSUBAME-4.0-hands-on

000

llm-node-tests

Language:Python000

swallow-project-parper-graph

Language:Python000

grouped_gemm

PyTorch bindings for CUTLASS grouped GEMM.

Apache-2.0000

llama-recipes

Examples and recipes for Llama 2 model

Language:Jupyter Notebook000

deploymentmanager-samples

Deployment Manager samples and templates.

Apache-2.0000

okoge-kaz

000

llm-jp-Megatron-DeepSpeed

Language:PythonNOASSERTION000

levanter

Legible, Scalable, Reproducible Foundation Models with Named Tensors and Jax

Apache-2.0000

ml-engineering

Machine Learning Engineering Open Book

CC-BY-SA-4.0000

swallow-tuning

Language:Shell100

mistral-hackathon

Apache-2.0000

axlearn

Apache-2.0000

multi-gpu-programming-models

Examples demonstrating available options to program multiple GPUs in a single node or a cluster

BSD-3-Clause000

megablocks

Language:PythonApache-2.0000

Megatron-LM-ABCI

NVIDIA Megatron-LM fork

Language:PythonNOASSERTION000

NeMo

NeMo: a toolkit for conversational AI

Language:PythonApache-2.0000

NeMo-Megatron-Launcher

NeMo Megatron launcher and tools

Language:PythonApache-2.0000

TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.

Language:PythonApache-2.0000

MoEfication

000

llm-jp-dpo

Apache-2.0000