miku / metha

Command line OAI-PMH harvester and client with built-in cache.

Home Page:https://lab.ub.uni-leipzig.de/metha/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Implement various harvesting strategies properly.

miku opened this issue · comments

metha should implement various harvesting strategies:

  • normal/default (for standard conform endpoints), harvest windows, daily, monthly, yearly, all
  • single records, so individual records may fail or servers are not overloaded
  • other modes: all at once

Implementation ideas:

Instead of relying only on files, introduce a small manifest.json describing the harvested content (ids, dates, harvesting dates, files).

metha was meant be a very simple program (no database, only files and not even metadata about files). In order to keep it simple, a more resilient harvesting approach has been implemented in a separate program: oaicrawl.