ARKseal / crawlingathome-server

A server powering Crawling@Home's effort to filter CommonCrawl with CLIP, building a large scale image-text dataset.

Home Page:http://crawlingathome.duckdns.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Crawling@Home Server

A server powering Crawling@Home's effort to filter CommonCrawl with CLIP, building a large scale image-text dataset.

UPDATE

jobs/open.json is now too big to store on GitHub. You can download it from here.

Installation

git clone https://github.com/TheoCoombes/crawlingathome-server
cd crawlingathome-server
pip install -r requirements.txt

Usage

The jobs data is already compiled for Common Crawl. To use, simply run main.py:

python main.py

You can edit the server's host and port by editing config.py.

About

A server powering Crawling@Home's effort to filter CommonCrawl with CLIP, building a large scale image-text dataset.

http://crawlingathome.duckdns.org/

License:MIT License


Languages

Language:Python 67.5%Language:HTML 27.1%Language:JavaScript 5.4%