Vladislav Kruglikov's repositories
kubernetes-pytorch-distributed-data-parallel
Create kubernetes distributed data parallel task
llm-foundry
LLM training code for Databricks foundation models
Apache-2.0000
composer
Learn composer tool
000
how-to-trace-cuda
How to trace cuda applications and debug data movements between different devices
how-to-trace-torch
How to trace torch applications and debug data movements between different devices