tieutantan / 1m-Alexa-Website-Titles-Crawler

Top 1000000 Alexa Website's Titles Crawler - The crawler will be get list URLs from .txt file and then save the result to .txt with Python multiple threads.

Geek Repo

Github PK Tool

1m Alexa Website's Titles Crawler

This package is compatible with Python 3.8.2. You can choose your threads number to process on console. And configuration the user agent + crawl timeout in config.py file.

Download top-1m.csv.zip and unzip top-1m.csv to root folder.
Install modules.
Run it!

Source 1m websites http://s3.amazonaws.com/alexa-static/top-1m.csv.zip

Install Modules pip install -r requirements.txt

Run python run.py

Thank you for reading!

About

Top 1000000 Alexa Website's Titles Crawler - The crawler will be get list URLs from .txt file and then save the result to .txt with Python multiple threads.

GNU General Public License v3.0

Languages

Language:Python 100.0%