DCLP / dclpxsltbox

Sandbox for development, testing, and review of XSLT for DCLP

Home Page:http://dclp.github.io/dclpxsltbox/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Malformed dclp-hybrid values

hcayless opened this issue · comments

There are 233 DCLP documents that have broken dclp-hybrid <idno> values. A full list can be found at https://gist.github.com/hcayless/0c99cb6af2b27239f397ca854e52e677. They all seem to be P.Herc. docs.

This error prevents correct indexing of the documents for search.

@HolgerEssler this needs to be sorted as soon as we can manage. Do you want to have all P.Herc. publications collected under one standardised dclp-hybrid so that they ressemble e.g. p.oxy;12;1234?
If the answer is yes then we have to change
<idno type="dclp-hybrid">P.Herc. 1120</idno>
into
<idno type="dclp-hybrid">p.herc;;1120</idno> (that assumes no volume)

If the answer is no, then we replace these dclp-hybrid with the relevant "na;;23456" value, that is "na" (viz. no author) followed by the TM number.

Yes, please change
<idno type="dclp-hybrid">P.Herc. 1120</idno>
into
<idno type="dclp-hybrid">p.herc;;1120</idno>.
I suppose
<idno type="dclp-hybrid">P.Herc. 1043 + 1045</idno>
should then become
<idno type="dclp-hybrid">p.herc;;1043;1045</idno>
and
<idno type="dclp-hybrid">P.Herc. 419, 697, 1634</idno>
should become
<idno type="dclp-hybrid">p.herc;;419;697;1634</idno>.
Would that be ok?

I would recommend something like:

<idno type="dclp-hybrid">P.Herc. 1043 + 1045</idno> -> <idno type="dclp-hybrid">p.herc;;1043+1045</idno>
and
<idno type="dclp-hybrid">P.Herc. 419, 697, 1634</idno> -> <idno type="dclp-hybrid">p.herc;;419,697,1634</idno>

pretty sure I know how to fix this and will do so

o.frangé;;438 => o.frange;;438; as in ddbdp
o.wångstedt;;80 => o.wangstedt;;80; will have to be added to collection.rdf
p.genève[horssérie];;1 => p.geneve[horsserie];;1; will have to be added to collection.rdf
p.murabba'ât;2;108 => p.mur;2;108; as in ddbdp
p.demarée;;5 => p.demaree;;5; will have to be added to collection.rdf

is that analysis correct @hcayless ?

I'm not sure how the square brackets will play. We'll have to see.

So have now created a new issue #324, to keep these two separate.

@hcayless would you please check that
<idno type="dclp-hybrid">p.herc;;.+</idno>
is now fine.
There should now be no more
<idno type="dclp-hybrid">P.Herc.
left in
https://github.com/DCLP/idp.data/tree/master/DCLP

I have made a number of commits to make the required corrections.

files that still need repair

./63/62411.xml: P.Herc. 228, 403, 407, 1425, 1581
./63/62425.xml: P.Herc. 495
./63/62426.xml: P.Herc. 558
./63/62476.xml: P.Herc. 1471

change to…

./63/62411.xml: p.herc;;228,403,407,1425,1581
./63/62425.xml: p.herc;;495
./63/62426.xml: p.herc;;558
./63/62476.xml: p.herc;;1471

case-sensitive search for P.Herc. in xpath tei:idno[@type='dclp-hybrid'] didn’t bring forth any further idnos of the kind

Files can be viewed on github and will be picked up with the next sync.