Romain Beaumont's repositories
img2dataset
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
clip-retrieval
Easily compute clip embeddings and build a clip retrieval system with them
cc2dataset
Easily convert common crawl to a dataset of caption and document. Image/text Audio/text Video/text, ...
laion-prepro
Get hundred of million of image+url from the crawling at home dataset and preprocess them
image_embeddings
Using efficientnet to provide embeddings for retrieval
embedding-reader
Efficiently read embedding in streaming from any filesystem
gpu-tester
gpu tester detects broken and slow gpus in a cluster
any2dataset
Turn any collection of files into a dataset
python-template
Simple python template
audio2dataset
Easily turn large sets of audio urls to an audio dataset.
slurm-tracking-bot
Simple slurm tracking bot to check usage
rom1504.github.io
Personal website
accelerate
🚀 A simple way to train and use PyTorch models with multi-GPU, TPU, mixed-precision
distributed-shuffle
A simple implementation of distributed shuffle, intended for learning
k-diffusion
Karras et al. (2022) diffusion models for PyTorch
video2numpy
Optimized library for large-scale extraction of frames and audio from video.
rom1504.fr
My site
EnMicroMsg.db-Password-Cracker
Crack the password of EnMicroMsg.db with brute-force attack.
prismarine-web-client
mineflayer, running in your browser
v-diffusion-pytorch
v objective diffusion inference code for PyTorch.
wechat-dump
Dump wechat messages from android