DuyenDo / data-study-google-scholar

The First Data Study of Google Scholar

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Semester Project] The First Data Study of Google Scholar

Member:

DO Thi Duyen - LE Ta Dang Khoa

Outline

1. Collect data from Google Scholar

  • Packages:
  • Scrapy for crawling: http://doc.scrapy.org/en/latest/

    $pip install scrapy

  • Selenium and Webdriver for JavaScript actions (e.g. click 'Show more'): https://pypi.org/project/selenium/

    $pip install -U selenium

    Download webdriver (chromedriver.exe/geckodriver.exe/...) and put it in google_scholar/libs For linux server: $sudo apt-get install -y chromium-browser

  • Spiders: google_scholar/spiders
  • Get list of papers given user URLs google_scholar/spiders/papers_spider.py
  • Get list of papers which cited the given paper google_scholar/spiders/citations_spider.py
  • Run: $python3 google_scholar/runner.py

2. Process and analyse

About

The First Data Study of Google Scholar


Languages

Language:Jupyter Notebook 68.8%Language:HTML 31.0%Language:Python 0.2%