weso / wdsub

Wikidata Subsetting

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add option to generate RDF dumps

labra opened this issue · comments

At this moment, wdsub takes as input JSON dumps, creates a subset and generates JSON dumps as a result. This feature is nice, in this way it may be possible to chain several wdsub processes.

Some users have asked about the possibility to generate also RDF dumps. This seems doable because wdub is based on wikidata toolkit and there is already an option to generate RDF from items in wikidata toolkit.

We can add an option to generate RDF dumps instead of JSON dumps for people who wants to work with RDF.

We have implemented a first prototype that generates RDF dumps but it needs to be improved. It requires a network connection to do it because it resolves information about properties directly from the wikidata API which is slow.

The reason is that it is using the class PropertyRegister which seems to collect information about properties and searches that information from the API.

There seems to be a default implementation PropertyRegister.getWikidataPropertyRegister() which returns WIKIDATA_PROPERTY_REGISTER and uses the default wikidata API connection.

I found that Wikidata toolkit also defines a MockPropertyRegister for testing. I would like to know if I can define a more basic property register that doesn't need to use the API and works offline.

An alternative solution would be to generate an RDF serialization without information about properties...for that I need to see if there is an option in the RDF serializer to ignore the property register.