Trainy's repositories
dynolog
Dynolog is a telemetry daemon for performance monitoring and tracing. It exports metrics from different components in the system like the linux kernel, CPU, disks, Intel PT, GPUs etc. Dynolog also integrates with pytorch and can trigger traces for distributed training applications.
airoboros
Customizable implementation of the self-instruct paper.
FastChat
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
konduktor
cluster/scheduler health monitoring for GPU jobs on k8s
RWKV-LM
RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.
training
Reference implementations of MLPerf™ training benchmarks
vllm
A high-throughput and memory-efficient inference and serving engine for LLMs