Aarush Katta's repositories
watermark-detection
A repository containing datasets and tools to train a watermark classifier.
webdataset
A high-performance Python-based I/O system for large (and small) deep learning problems, with strong support for PyTorch.
img2dataset
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
crawlingathome
A client library for Crawling@Home's effort to filter CommonCrawl with CLIP, building a large scale image-text dataset.
crawlingathome-gpu-hcloud
GPU controlled Hetzner Cloud workers swarm for Crawling@Home project
SmartDataset
A Dataset, made to be smart and efficient :)
crawlingathome-wheels
The package wheels for the Crawling At Home project
crawlingathome-server
A server powering Crawling@Home's effort to filter CommonCrawl with CLIP, building a large scale image-text dataset.
firebase-test
This is a test for firebase hosting
crawlingathome-fileserver
A server used for storing CPU workers' uploads, ready for GPU workers to complete.
CLIP
Contrastive Language-Image Pretraining