Mehdi Cherti's repositories
feed_forward_vqgan_clip
Feed forward VQGAN-CLIP model, where the goal is to eliminate the need for optimizing the latent space of VQGAN for each input prompt
vqgan_nodep
VQGAN from LDM without hell of dependencies
scaling-laws-openclip
Reproducible scaling laws for contrastive language-image learning (https://arxiv.org/abs/2212.07143)
any2dataset
Turn any collection of files into a dataset
Automodel
Fine-tune any Hugging Face LLM or VLM on day-0 using PyTorch-native features for GPU-accelerated distributed training with superior performance and memory efficiency.
clip-retrieval
Easily compute clip embeddings and build a clip retrieval system with them
denoising-diffusion-gan
Tackling the Generative Learning Trilemma with Denoising Diffusion GANs https://arxiv.org/abs/2112.07804
img2dataset
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
litellm
Call all LLM APIs using the OpenAI format. Use Bedrock, Azure, OpenAI, Cohere, Anthropic, Ollama, Sagemaker, HuggingFace, Replicate (100+ LLMs)
ml-engineering
Machine Learning Engineering Guides and Tools
open-diffusion
Simple large-scale training of stable diffusion with multi-node support.
VILA
VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and cloud.