You can access the raw_data and the compiled SQLITE3 database on OSF
Pipeline:
- Extract all works that are instance of sub-instance of work of art (Q838948) (extraction/extract_seed.py)
- Extract related meta-data (extraction/extract_raw.py)
- Insert the data into a SQL Database (data_pipeline.ipynb)
Data related to infrastructures and archaeological sites have been extracted buit not their meta-data. They are available as a json in the OST Forum.
The main table is SubjectLabel that contains all the piece of works in Wikidata. A table if made of a wikidata identifier, the label and often linked to another wikidata property.
In this table, the count_all means that the instance (for instance 'Hitsugi no Chaika') has 6 different instance. An instance means the type of work it is. In order to see the different instances, please go to the table subject_instance:
You will find this result:
Subject | SubjectLabel | Instance |
---|---|---|
Q10855446 | Hitsugi no Chaika | Q220898 |
Q10855446 | Hitsugi no Chaika | Q747381 |
Q10855446 | Hitsugi no Chaika | Q1667921 |
Q10855446 | Hitsugi no Chaika | Q2635894 |
Q10855446 | Hitsugi no Chaika | Q21198342 |
Q10855446 | Hitsugi no Chaika | Q63952888 |
In order to access the instance name, please go to the table instanceLabel. For instance, Q220898 means 'original video animation'
In order to read the table using python, here is an example:
import sqlite3
import pandas as pd
conn = sqlite3.connect('works.db')
# Chose a table to extract
table_name = 'SubjectLabel'
# Load the table as a pandas DataFrame
df = pd.read_sql_query(f"SELECT * FROM {table_name}", conn)
- Time period (P2348)
- Inception (P571)
- culture (P2596)
- architectural style (P149)
- architect (P84)
- founded by (P112)
- creator (P170)
- genre (P136)
- movement (P135)
- country (P17)
- located in the administrative territorial entity (P131)
Count is from item and sub-items (wdt:P279*)
- 'architectural structure' (Q811979): 5,133,991 (include gardens) - (inception before 1900)
- 'work of art' (Q838948): 3,629,404 (include sculptures, paintings etc)
- 'tool' (Q39546): 9,297,105 (include scientific instruments)
- 'archaeological site' (Q839954): 248,550
- 'infrastructure' (Q121359): 1,400,027
'human'(Q5): 10,632,484
If you use this project in your research or wish to refer to the benchmark results, please use the following BibTeX entry.
@misc{cultural_items_wikidata,
author = {de Dampierre, Charles},
title = {Cultural Items Wikidata},
year = {Year},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/charlesdedampierre/cultural-items-wikidata}}
}