soumith / arxiv-tools

Tool to search all arxiv data for keywords

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool



Only tested with Python 3

ArXiv provides bulk data access through Amazon S3. You need an account with Amazon AWS to be able to download the data.

Downloading and search arXiv documents for keywords

1- Install s3cmd which is a command line tool for interacting with S3

pip install s3cmd

2- Configure your s3cmd by entering credentials found in the account management tab of the Amazon AWS website

s3cmd --configure

3- Install pdfminer.six to get text from a pdf on the fly

pip install pdfminer.six

4- Search arxiv for particular keywords

For example, searching for "resnet", "googlenet" and "alexnet". The keyword search is case-insensitive

python --keywords "resnet,googlenet,alexnet"

We store the results database in a pickle file (Default: db.pkl). When you run again, it checks for this file and skips processing the files from arxiv that were already processed.


Tool to search all arxiv data for keywords

License:Apache License 2.0


Language:Python 100.0%