ewg118 / numishare

Numishare is an open source suite of applications for managing digital cultural heritage artifacts, with a particular focus on coins and medals.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Duplicated entries in Nomisma RDF Dump

Msch0150 opened this issue · comments

This is for information only. But maybe it is relevant or might become relevant.

Create a Nomisma RDF Dump (http://numismatics.org/ocre/nomisma.rdf).
There are several duplicate items in the generated file. Mostly:

ric.1(2).ner.*
ric.10.zeno(1)_e.90*
ric.10.zeno(2)_e.9*
ric.2_3(2).hdn*

I detected this during the processing with OpenRefine.

I used:
#grep '<nmo:TypeSeriesItem rdf:about="http://numismatics.org/ocre/id/' nomisma.rdf | sort | uniq -d
to get the list of duplicates and verified it manually for "http://numismatics.org/ocre/id/ric.10.zeno(1)_e.901" in the nomisma.rdf.

I'm not sure if we can do anything about this at the moment. Some RIC numbers in the printed corpus used an asterisk in the numbering convention, and we didn't adapt that in a way that wouldn't lead to conflicts in the filenaming.