Extend metha-cat to extract metadata records
nichtich opened this issue · comments
meta-cat
should allow to extract harvested metadata records without OAI-Envelope (that's oai:record/oai:metadata/*
).
Just curious, is that something documented in the spec?
Another question: aren't there XML tools that would allow specific XML tag extraction? That way, we could keep functionality separate.
As far as I understand the spec, oai:metadata
is nested in oai:OAI-PMH/oai:ListRecords/oai:record
and it may contain any XML elements.
<complexType name="metadataType">
<annotation>
<documentation>Metadata must be expressed in XML that complies
with another XML Schema (namespace=#other). Metadata must be
explicitly qualified in the response.</documentation>
</annotation>
<sequence>
<any namespace="##other" processContents="strict"/>
</sequence>
</complexType>
I'm only interested in this child element(s) of oia:metadata
because this is the actual payload. I'm surprised people keep the OAI envelope.
Suire Workaround is to apply an XSLT to each record (when using find
and xargs
) or to the while set (when using meta-cat
), but this is far from a one-liner:
<?xml version="1.0" encoding="UTF-8"?>
<!-- Extract OAI metadata records from OAI-PMH responses -->
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"
xmlns:oai="http://www.openarchives.org/OAI/2.0/" exclude-result-prefixes="oai">
<xsl:strip-space elements="*"/>
<xsl:output method="xml" indent="yes"/>
<xsl:template match="/Response">
<records>
<xsl:apply-templates select="ListRecords/oai:record"/>
</records>
</xsl:template>
<xsl:template match="/Records">
<records>
<xsl:apply-templates select="oai:record"/>
</records>
</xsl:template>
<xsl:template match="oai:record">
<xsl:copy-of select="oai:metadata/*"/>
</xsl:template>
</xsl:transform>
I suppose the extraction would only be a few lines and an optional command line flag in metha source code.
I'm only interested in this child element(s) of oia:metadata because this is the actual payload. I'm surprised people keep the OAI envelope.I'm surprised people keep the OAI envelope.
Yes, metadata is the most interesting. If the envelop - oai:record/oai:metadata/*
- would be just the metadata then one may mis the "set specifier" that's sometimes useful.
<record>
<header status="">
<identifier>oai:ojs.pkp.sfu.ca:article/2287</identifier>
<datestamp>2020-06-22T11:34:53Z</datestamp>
<setSpec>EJOEH:E</setSpec>
<setSpec>driver</setSpec>
</header>
<metadata>
<oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" ....oai_dc.xsd">
<dc:title xml:lang="it-IT">Editoriale / Editorial</dc:title>
<dc:creator>Szmigielski, S.</dc:creator>
...
</oai_dc:dc>
</metadata>
</record>
Thanks for the XSTL.
but this is far from a one-liner:
I appreciate the desire for one-liners.