carmocca

A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.

Language:C++Apache-2.0000

DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Language:PythonMIT000

faster-pytorch-blog

Outlining techniques for improving the training performance of your PyTorch model without compromising its accuracy

Language:Python000

ffcv

FFCV: Fast Forward Computer Vision (and other ML workloads!)

Language:PythonApache-2.0000

Fuser

A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")

Language:C++NOASSERTION000

lightning

Build and train PyTorch models and connect them to the ML lifecycle using Lightning App templates, without handling DIY infrastructure, cost management, scaling, and other headaches.

Language:PythonApache-2.0000

lightning-thunder

Source to source compiler for PyTorch. It makes PyTorch programs faster on single accelerators and distributed.

Language:PythonApache-2.0000

litdata

Blazingly fast, distributed streaming of training data from any cloud storage for training AI models

Apache-2.0000

lm-evaluation-harness

A framework for few-shot evaluation of autoregressive language models.

Language:PythonMIT000

Megatron-LM

Ongoing research training transformer models at scale

Language:PythonNOASSERTION000

neurips_llm_efficiency_challenge

NeurIPS Large Language Model Efficiency Challenge: 1 LLM + 1GPU + 1Day

Language:Python000

pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Language:PythonNOASSERTION000

stable-diffusion

A latent text-to-image diffusion model

Language:Jupyter NotebookNOASSERTION000

taming-transformers

Taming Transformers for High-Resolution Image Synthesis

Language:Jupyter NotebookMIT000

toolbox

Essential guides and programming tools in my toolbox (with focus on ML Training)

Language:PythonCC-BY-SA-4.0000

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper GPUs, to provide better performance with lower memory utilization in both training and inference.

Language:PythonApache-2.0000

Windifier

Wind [instrument] Classifier. Using CNNs.

Language:Python020

xla

Enabling PyTorch on Google TPU

Language:C++NOASSERTION000

carmocca

Carlos Mocholí's repositories

PyLaia-examples

UVA

fp8-benchmark

nnutils

probot

PyLaia

AdventOfCode2016

algorhythmHashCode

DALI

DeepSpeed

faster-pytorch-blog

ffcv

Fuser

lightning

lightning-quick-start

lightning-thunder

litdata

lm-evaluation-harness

Megatron-LM

neurips_llm_efficiency_challenge

pytorch

stable-diffusion

taming-transformers

toolbox

TransformerEngine

Windifier

xla