soumith/arxiv-tools

arXiv-keyword-searcher

Prerequisites

Only tested with Python 3

ArXiv provides bulk data access through Amazon S3. You need an account with Amazon AWS to be able to download the data.

Downloading and search arXiv documents for keywords

1- Install s3cmd which is a command line tool for interacting with S3

pip install s3cmd

2- Configure your s3cmd by entering credentials found in the account management tab of the Amazon AWS website

s3cmd --configure

3- Install pdfminer.six to get text from a pdf on the fly

pip install pdfminer.six

4- Search arxiv for particular keywords

For example, searching for "resnet", "googlenet" and "alexnet". The keyword search is case-insensitive

python download.py --keywords "resnet,googlenet,alexnet"

We store the results database in a pickle file (Default: db.pkl). When you run download.py again, it checks for this file and skips processing the files from arxiv that were already processed.

soumith / arxiv-tools

arXiv-keyword-searcher

Prerequisites

Downloading and search arXiv documents for keywords

About

Languages