stuartyeates / shell-oaiharvester

OAI-PMH harvester in shell.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

The OAI-PMH Shell Harvester is able to harvest OAI-PMH targets. It supports multiple configurable targets which can be updated individually. Furthermore, it is able to execute a preset command for each record it updates or deletes.

Installing shell-harvester:
 - dependencies: xsltproc, bc, curl, coreutils (namely for md5sum and shuf)
 - git clone git://github.com/wimmuskee/shell-oaiharvester.git
 - use Makefile or:
  - cp oaiharvester /usr/bin/oaiharvester
  - mkdir /usr/share/shell-oaiharvester
  - cp libs/* /usr/share/shell-oaiharvester/.
  - cp config.xml /etc/shell-oaiharvester/config.xml

Using shell-harvester:

1. Configure the harvester
   Edit the paths in the config.xml file, 
   records are stored in <recordpath>/<repository_id> .

2. Configure a repository
   Edit a repository in the config file. By default, the example file
   in /etc/shell-oaiprovider is used. You use your own by calling:
   $ oaiharvester -c <config file>

   Some notes on the options:
    - deletecmd: executed before a record is deleted, optional
    - updatecmd: executed after a record is updated, optional
      - ${identifier} holds the identifier
      - ${filename) is the identifier with optional xz extension
      - ${path} holds the absolute path of the record
    - repository update and delete cmd's are executed before
      the generic ones if available
    - set, from, until, conditional and recordpath are optional

3. Run the harvester
   $ oaiharvester -r <repository_id>

4. Only retrieve identifiers
   $ oaiharvester -r <repository_id> -n

5. Not harvest, but validate the repository
   $ oaiharvester -r <repository_id> -t

*. Unless customized, the log file of the harvest process is stored at
   /tmp/shell-harvester/log.csv. The logger uses a simple csv format with
   a line for each downloaded page.
   YYYY-MM-DD HH:MM:SS,repository,record count,download time,process time
   
   The logtype can be configured in the harvester config file, either
   "combined" putting all instance logs in one file (default), or
   "instance", and have a log file for each started instance 

About

OAI-PMH harvester in shell.

License:GNU General Public License v3.0


Languages

Language:Shell 70.4%Language:XSLT 26.6%Language:Makefile 1.7%Language:M4 1.3%