arxiv files download by topic?
flckv opened this issue · comments
flckv commented
great work. Is there a method for downloading arxiv files for only a specific topic e.g.: physics ?
Martin Czygan commented
Yes, that's a feature of the OAI-PMH protocol: you can download a specific set. To see what sets are available from an endpoint, you can run the following command (formatting requires jq):
$ metha-id http://export.arxiv.org/oai2 | \
jq -r '.sets[]| [.setSpec, .setName] | @tsv' |
column -t -s ' ' # CTRL-v TAB
cs Computer Science
econ Economics
eess Electrical Engineering and Systems Science
math Mathematics
physics Physics
physics:astro-ph Astrophysics
physics:cond-mat Condensed Matter
physics:gr-qc General Relativity and Quantum Cosmology
physics:hep-ex High Energy Physics - Experiment
physics:hep-lat High Energy Physics - Lattice
physics:hep-ph High Energy Physics - Phenomenology
physics:hep-th High Energy Physics - Theory
physics:math-ph Mathematical Physics
physics:nlin Nonlinear Sciences
physics:nucl-ex Nuclear Experiment
physics:nucl-th Nuclear Theory
physics:physics Physics (Other)
physics:quant-ph Quantum Physics
q-bio Quantitative Biology
q-fin Quantitative Finance
stat Statistics
Then only sync what you want with:
$ metha-sync -set physics http://export.arxiv.org/oai2
To inspect the data, pass the -set
as well.
$ metha-cat -set physics http://export.arxiv.org/oai2
There is no support for requesting multiple sets at once in metha, currently.