Trainy

Trainy's repositories

llm-atc

Fine-tuning and serving LLMs on any cloud

Language:PythonApache-2.082 3 3

nodify

Profiling tools for distributed training

Language:HTMLNOASSERTION37 2 1

trainy

A simple Pure Python/PyTorch performance daemon for training workloads

Language:Python12 10

Dynolog is a telemetry daemon for performance monitoring and tracing. It exports metrics from different components in the system like the linux kernel, CPU, disks, Intel PT, GPUs etc. Dynolog also integrates with pytorch and can trigger traces for distributed training applications.

Language:C++MIT100

airoboros

Customizable implementation of the self-instruct paper.

Language:PythonApache-2.0000

dashboard

Language:HTML000

FastChat

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

Language:PythonApache-2.0000

konduktor

cluster/scheduler health monitoring for GPU jobs on k8s

Language:PythonApache-2.0000

RWKV-LM

RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.

Language:PythonApache-2.0000

training

Reference implementations of MLPerf™ training benchmarks

Language:PythonApache-2.0000

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Language:PythonApache-2.0000

Trainy

Trainy-ai

Trainy's repositories

llm-atc

nodify

trainy

dynolog

airoboros

dashboard

FastChat

konduktor

RWKV-LM

training

vllm