miku / metha

Command line OAI-PMH harvester and client with built-in cache.

Home Page:https://lab.ub.uni-leipzig.de/metha/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

conflicting namespace prefixes during ListRecords

bertsky opened this issue · comments

If you do a harvest during which the same prefix will be seen with different URL targets, metha-sync will jumble the prefix – suffixing it by 1 but never declaring that renamed prefix, so the resulting XMLs become invalid.

For example, if I do

metha-sync -format mets -set 17th-century-prints http://digital.slub-dresden.de/oai/

then (because in our MODS the namespace for the extension slub has been changed some time ago and now appears in some records with declaration http://www.slub-dresden.de/namespace but with http://www.slub-dresden.de/ in others) I end up with altered and non-wellformed METS files. For example in oai:de:slub-dresden:db:id-1840307358, instead of…

               <mods:extension>
                  <slub:slub>
                     <slub:id type="digital">1840307358</slub1:id>
                     <slub:id type="source">113051157X</slub1:id>
                     <slub:id type="tsl-ats">Mercgeovg</slub1:id>
                  </slub:slub>
               </mods:extension>
               <mods:recordInfo>
                  <mods:recordIdentifier source="http://digital.slub-dresden.de/oai/">oai:de:slub-dresden:db:id-1840307358</mods:recordIdentifier>
               </mods:recordInfo>

…(which is what you get for a single GetRecord request) I now see…

               <mods:extension>
                  <slub1:slub>
                     <slub1:id type="digital">1840307358</slub1:id>
                     <slub1:id type="source">113051157X</slub1:id>
                     <slub1:id type="tsl-ats">Mercgeovg</slub1:id>
                  </slub1:slub>
               </mods:extension>
               <mods:recordInfo>
                  <mods:recordIdentifier source="http://digital.slub-dresden.de/oai/">oai:de:slub-dresden:db:id-1840307358</mods:recordIdentifier>
               </mods:recordInfo>

…(which is invalid, because slub1 has never been introduced).

Thanks for the detailed bug report - that's certainly an interesting issue and I'll try to take a look at it shortly - it may also be some issue in the stdlib, as per golang/go #48641.

I'm afraid this is a Go stdlib XML issue first, cf. golang/go#13400.

But then, metha is mostly concerned with the envelope and that should be much less problematic. This will requires some internal rewrite and may take a while before it is released, just as a heads up.