meflm / bibnet-google-scholar-scraper

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

bibnet-google-scholar-scraper

This is a meteor project, so you'll need to install Meteor to run it: https://www.meteor.com/install

How to make it work

You'll be presented with a text area. You can paste search terms into this area. Each new line constitutes a new search term.

When you click 'Find Papers' Bibnet records each paper or book that is returned by Google Scholar (up to 10 results per term) for each of the search terms. This information generates a list of publications in a database. In the same database, it also records who wrote each publication as a list of authors.

When you click 'Add citations' Using Google Scholar’s ‘search within citations’ it checks to see if any of the authors recorded to the database have cited any of the publications. This process will only trigger 40 queries at a time. Due to rate limiting it should not be run faster than this.

Results

Three tables are provided as part of the GUI. I think the authors and publications speak for themselves. The edges table represents three kinds of edge:

  1. author connects authors to the publications they wrote.

  2. cite connects one paper to another paper which it cites.

  3. citation_checked is a record of which combinations of publication and author have been checked for citations when the 'add citation' button is clicked.

Export

You can export two kinds of .dot file suitable for use in Gephi and also a human readable list of the citations. The exportable text appears in a text box at the end of the page.

Google and rate limiting

This software should be used in compliance with Google's rules. Much as Zotero uses Google Scholar's results pages to populate it's metadata fields, this seems like a reasonable use of their service.

There is a keys.js where you can provide cookie details, so that you are querying Google as a logged in user. I don't think this adds any particular advantage.

About


Languages

Language:JavaScript 94.2%Language:CSS 3.8%Language:HTML 2.0%