tieutantan / 1m-Alexa-Website-Titles-Crawler

Top 1000000 Alexa Website's Titles Crawler - The crawler will be get list URLs from .txt file and then save the result to .txt with Python multiple threads.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

1m Alexa Website's Titles Crawler

This package is compatible with Python 3.8.2. You can choose your threads number to process on console. And configuration the user agent + crawl timeout in config.py file.

  1. Download top-1m.csv.zip and unzip top-1m.csv to root folder.
  2. Install modules.
  3. Run it!

Source 1m websites http://s3.amazonaws.com/alexa-static/top-1m.csv.zip

Install Modules pip install -r requirements.txt

Run python run.py

Thank you for reading!

About

Top 1000000 Alexa Website's Titles Crawler - The crawler will be get list URLs from .txt file and then save the result to .txt with Python multiple threads.

License:GNU General Public License v3.0


Languages

Language:Python 100.0%