TANSixu / SimpleCrawler

This is a simple crawler to grasp pages from search engine.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Simple crawler

This helps to grasp and analyze information from search engine.

Installation

git clone https://github.com/TANSixu/SimpleCrawler.git
conda create -n [&choose_your_own_name&] python=2.7
conda activate [&choose_your_own_name&]
pip install -r requirements.txt

Usage example

  1. Go into source code simplecrawler.py, set $kw$ at line 20 to the search keyword. (use this ugly way to support kanji search across all platform)
  2. Run the following command:
python simple_crawler.py
  1. Optional arguments:

-h, --help show this help message and exit

-n NUM, --num NUM number of pages to craw

-d DIR_NAME, --dir_name DIR_NAME directory name to save the crawled files

About

This is a simple crawler to grasp pages from search engine.

License:GNU General Public License v3.0


Languages

Language:Python 100.0%