OCR-D / core

Collection of OCR-related python tools and wrappers from @OCR-D

Home Page:https://ocr-d.de/core/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ocrd workspace bag: use attribute CONTENTIDS if present for image resouces

M3ssman opened this issue · comments

Description

In extension of #1137 please look first for attribute CONTENTIDS instead of falling for ID when trying to download image resources (or provide parameter to switch to CONTENTIDS).

In case of ULB's CONTENTIDS (or to any other value which is urn-like) additional processing is required: please exchange any : with a plus-sign (+), so urn:nbn:de:gbv:3:1-113129-p0007-8 becomes urn+nbn+de+gbv+3+1-113129-p0007-8.

One might find many samples in the GT-Testexports using GT-Repo-Template

Note:

Besides ULB Sachsen-Anhalt, digital objects from SBB provide this information, too, when issuing OAI-Records, c.f. OAI-REcord Theoretischer Anarchismus, although it's looks more like a URL than an URN, for example:

<mets:div CONTENTIDS="http://resolver.staatsbibliothek-berlin.de/SBB000202D300000095" ID="PHYS_0095" ORDER="95" ORDERLABEL="91" TYPE="page" >

which resolves straight to the expected page https://digital.staatsbibliothek-berlin.de/werkansicht/?PPN=PPN891267093&PHYSID=PHYS_0095.