billy-inn / CrawlerForGoogleScholar

Crawler for Google Scholar

Prerequsites

Scrapy
pymongo

Usage

Mode 1: Crawl profiles on Google Scholar via bfs

./run.sh 1 start_url target_database

start_url is the start point of the bfs; and target_database is the collection in the MongoDB.

Mode 2: Crawl the publications of the corresponding profiles

./run.sh 2 profile_database start_index

profile_database is the database crawled in the first step; start_index is the start profile's index.

Mode 3: Crawl the publications via the titles directly

To be updated~

About

Languages

Language:Python 97.1%Language:Shell 2.9%