Trainy (Trainy-ai)

Trainy

Trainy-ai

Geek Repo

Tools to make distributed training easy.

Home Page:trainy.ai

Github PK Tool:Github PK Tool

Trainy's repositories

llm-atc

Fine-tuning and serving LLMs on any cloud

Language:PythonLicense:Apache-2.0Stargazers:82Issues:3Issues:3

nodify

Profiling tools for distributed training

Language:HTMLLicense:NOASSERTIONStargazers:37Issues:2Issues:1

trainy

A simple Pure Python/PyTorch performance daemon for training workloads

Language:PythonStargazers:12Issues:1Issues:0

dynolog

Dynolog is a telemetry daemon for performance monitoring and tracing. It exports metrics from different components in the system like the linux kernel, CPU, disks, Intel PT, GPUs etc. Dynolog also integrates with pytorch and can trigger traces for distributed training applications.

Language:C++License:MITStargazers:1Issues:0Issues:0

airoboros

Customizable implementation of the self-instruct paper.

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0
Language:HTMLStargazers:0Issues:0Issues:0

FastChat

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

konduktor

cluster/scheduler health monitoring for GPU jobs on k8s

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

RWKV-LM

RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

training

Reference implementations of MLPerf™ training benchmarks

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0