ArtificialOSS / WebCrawl

Crawls the web to generate a huge dataset for training

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

WebCrawl

WebCrawl is a free and open source tool to crawl through the website and generate a huge dataset that can be used to train your ai

it's inspired by: CommonCrawl

Setup

you require:

requests
BeautifulSoup
tqdm

then you can just run the app with:

python main.py 

About

Crawls the web to generate a huge dataset for training

License:Creative Commons Zero v1.0 Universal


Languages

Language:Python 100.0%