soumith / arxiv-tools

Tool to search all arxiv data for keywords

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

arXiv-keyword-searcher

Prerequisites

Only tested with Python 3

ArXiv provides bulk data access through Amazon S3. You need an account with Amazon AWS to be able to download the data.

Downloading and search arXiv documents for keywords

1- Install s3cmd which is a command line tool for interacting with S3

pip install s3cmd

2- Configure your s3cmd by entering credentials found in the account management tab of the Amazon AWS website

s3cmd --configure

3- Install pdfminer.six to get text from a pdf on the fly

pip install pdfminer.six

4- Search arxiv for particular keywords

For example, searching for "resnet", "googlenet" and "alexnet". The keyword search is case-insensitive

python download.py --keywords "resnet,googlenet,alexnet"

We store the results database in a pickle file (Default: db.pkl). When you run download.py again, it checks for this file and skips processing the files from arxiv that were already processed.

About

Tool to search all arxiv data for keywords

License:Apache License 2.0


Languages

Language:Python 100.0%