miku / metha

Command line OAI-PMH harvester and client with built-in cache.

Home Page:https://lab.ub.uni-leipzig.de/metha/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

arxiv files download by topic?

flckv opened this issue · comments

great work. Is there a method for downloading arxiv files for only a specific topic e.g.: physics ?

Yes, that's a feature of the OAI-PMH protocol: you can download a specific set. To see what sets are available from an endpoint, you can run the following command (formatting requires jq):

$ metha-id http://export.arxiv.org/oai2 | \
    jq -r '.sets[]| [.setSpec, .setName] | @tsv' |
    column -t -s '       ' # CTRL-v TAB

cs                Computer Science
econ              Economics
eess              Electrical Engineering and Systems Science
math              Mathematics
physics           Physics
physics:astro-ph  Astrophysics
physics:cond-mat  Condensed Matter
physics:gr-qc     General Relativity and Quantum Cosmology
physics:hep-ex    High Energy Physics - Experiment
physics:hep-lat   High Energy Physics - Lattice
physics:hep-ph    High Energy Physics - Phenomenology
physics:hep-th    High Energy Physics - Theory
physics:math-ph   Mathematical Physics
physics:nlin      Nonlinear Sciences
physics:nucl-ex   Nuclear Experiment
physics:nucl-th   Nuclear Theory
physics:physics   Physics (Other)
physics:quant-ph  Quantum Physics
q-bio             Quantitative Biology
q-fin             Quantitative Finance
stat              Statistics

Then only sync what you want with:

$ metha-sync -set physics http://export.arxiv.org/oai2

To inspect the data, pass the -set as well.

$ metha-cat -set physics http://export.arxiv.org/oai2

There is no support for requesting multiple sets at once in metha, currently.