ashen007 / LiteratureReview

scrapper for various science databases

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

LiteratureReview

scrapper for various science databases, supported databases are IEEE Xplore, Science Direct and ACM. theses scrapping bots will retrieve link to each search results aka paper, title and some other meta-data such as keywords and abstract, type of paper (conference, journal ect.) which useful to do the systematic literature review process make easy.

If you find this work usefully, put a star on this repo ⭐

Prerequisites

  • python 3.9 or higher
  • Chrome browser
  • Chrome web driver which matches your Chrome version. download from here

How to use

  1. go to the official site (advance search page), create a search query using their form,

    Science Direct

    IEEE Xplore

    ACM

  2. copy that query text and use it to configure the tool
  3. clone the repo (create virtual environment is recommended way) and complete the configuration can use a single bot or all the bots at one by one configuration.
git clone https://github.com/ashen007/LiteratureReview.git
  • all bots with single configuration
{
  "BINARY_LOCATION": "C:\\Program Files\\Google\\Chrome\\Application\\chrome.exe",
  "EXECUTABLE_PATH": "D:\\chromedriver.exe",
  "SCIDIR": {
    "search_term": "insert query string here",
    "link_file_save_to": "./temp/scidir_search_term.json",
    "abs_file_save_to": "./abs/scidir_search_term.json",
    "use_batches": true,
    "batch_size": 8,
    "keep_link_file": true
  },
    "ACM": {
    "search_term": "insert query string here",
    "link_file_save_to": "./temp/acm_search_term.json",
    "abs_file_save_to": "./abs/acm_search_term.json",
    "use_batches": true,
    "batch_size": 8,
    "keep_link_file": true
  },
    "IEEE": {
    "search_term": "insert query string here",
    "link_file_save_to": "./temp/ieee_search_term.json",
    "abs_file_save_to": "./abs/ieee_search_term.json",
    "use_batches": false,
    "batch_size": 8,
    "keep_link_file": true
  }
}      
  • or can use one bot as well
{
  "BINARY_LOCATION": "C:\\Program Files\\Google\\Chrome\\Application\\chrome.exe",
  "EXECUTABLE_PATH": "D:\\chromedriver.exe",
  "SCIDIR": {
    "search_term": "insert query string here",
    "link_file_save_to": "./temp/scidir_search_term.json",
    "abs_file_save_to": "./abs/scidir_search_term.json",
    "use_batches": true,
    "batch_size": 8,
    "keep_link_file": true
  }
}
  • config BINARY_LOCATION use a path to chrome.exe file location

  • config EXECUTABLE_PATH use a path where you download and extract the Chrome web driver

  1. install dependencies run the main.py
pip install -r ./requirements.txt
python main.py
  1. that's it
  2. save results into excel workbook, automatically saved into ./SLR.xlsx file.
   from src.utils import to_excel
   to_excel({"acm":'./abs/acm_search_term.json', "ieee": './abs/ieee_search_term.json', "science_direct": './abs/scidir_search_term.json'})

About

scrapper for various science databases


Languages

Language:Python 100.0%