Duplicated entries in Nomisma RDF Dump
Msch0150 opened this issue · comments
This is for information only. But maybe it is relevant or might become relevant.
Create a Nomisma RDF Dump (http://numismatics.org/ocre/nomisma.rdf).
There are several duplicate items in the generated file. Mostly:
ric.1(2).ner.*
ric.10.zeno(1)_e.90*
ric.10.zeno(2)_e.9*
ric.2_3(2).hdn*
I detected this during the processing with OpenRefine.
I used:
#grep '<nmo:TypeSeriesItem rdf:about="http://numismatics.org/ocre/id/' nomisma.rdf | sort | uniq -d
to get the list of duplicates and verified it manually for "http://numismatics.org/ocre/id/ric.10.zeno(1)_e.901" in the nomisma.rdf.
I'm not sure if we can do anything about this at the moment. Some RIC numbers in the printed corpus used an asterisk in the numbering convention, and we didn't adapt that in a way that wouldn't lead to conflicts in the filenaming.