davidar / oai-sync

Simple script for syncing with OAI repositories

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

This is a quick and dirty script for harvesting metadata from an OAI-PMH enabled repository.

It relies on the Net::OAI::Harvester perl module, and is derived from the oai-listrecords script provided in the examples for that library.

Example for harvesting arXiv metadata (honouring flow control directives):

mkdir arxiv{0,1,2}

# initial sync
./oai-sync.pl --baseURL=http://export.arxiv.org/oai2 --metadataPrefix=arXiv \
              --dumpDir=./arxiv0/

# resume an interrupted sync with the given token
./oai-sync.pl --baseURL=http://export.arxiv.org/oai2 --metadataPrefix=arXiv \
              --dumpDir=./arxiv1/ --resumptionToken='XXXXXX|XXXXXX'

# sync any new changes since the date of the last sync
./oai-sync.pl --baseURL=http://export.arxiv.org/oai2 --metadataPrefix=arXiv \
              --dumpDir=./arxiv2/ --from='2015-03-14'

# split into individual records
for F in ./arxiv*/*.xml; do [ -s $F ] && ./oai-split.sh < $F; done

About

Simple script for syncing with OAI repositories


Languages

Language:Perl 85.1%Language:Shell 14.9%