An open-source python collection of tools to
- crawl
- parse
- store
- request
open-access scholarly documents.
The initial purpose of this project is to make arXiv more "modern" and offer
- a json API without throttling
- HTML version for articles available under CC licences
- more metadata parsing (most arXiv articles come with their LaTeX source which is way easier to parse than PDF documents) e.g. bibliography or forumlas.