marchelbling / papper

A collection of python tools to crawl, store, transform open-access scholar material.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Papper

An open-source python collection of tools to

  • crawl
  • parse
  • store
  • request

open-access scholarly documents.

arXiv

The initial purpose of this project is to make arXiv more "modern" and offer

  • a json API without throttling
  • HTML version for articles available under CC licences
  • more metadata parsing (most arXiv articles come with their LaTeX source which is way easier to parse than PDF documents) e.g. bibliography or forumlas.

About

A collection of python tools to crawl, store, transform open-access scholar material.

License:MIT License


Languages

Language:Python 100.0%