A collection of open source tools and resources related to Wikibase knowledge graphs.
- Motivation
- Awesome knowledge graphs
- Awesome Wikibase tutorials
- Wikibase Architecture
- Installing Wikibase
- Data Model
- Data Import
- Federated Properties
- Data wrangling
- Data reconciliation
- Named Entity Linking
- Data validation
- Wikibase Ecosystem
- Wikibase Community
- Wikibase Summaries
- Conferences and workshops
- Awesome Master and PhD theses
- Wikibase-Wikidata papers
- Awesome Wikibase instances
- Notes
- Given multiple unlinked datasets describing the same things (entities).
- Need for agile collaborative data integration (Wikidata).
- Need for a semantic layer in data/information/knowledge management in an organization.
- Knowledge graphs by Aidan Hogan et al. [preprint], [HTML book]
- The Knowledge Graph Cookbook. Recipes that work by Andreas Blumauer & Helmut Nagy [book]
- KIT Knowledge Graphs course by Harald Sack & Mehwish Alam [description]
- Stanford Knowledge Graphs course CS 520 [2020], [2021]
- Knowledge Graphs - Foundations and Applications by Harald Sack [description]
- Knowledge Graphs: Methodology, Tools and Selected Use Cases Dieter Fensel et al [book]
- Programmer's guide to Wikibase [guide]
- Wikibase: configure, customize, and collaborate by Dan Scott [tutorial]
- posts about Wikibase & Wikidata and Tech Lead Digests by Adam 'addshore' Shorland
- Wikibase Install Basic Tutorial by Matt Miller [tutorial]
- Wikibase for Research Infrastructure by Matt Miller [post]
- Vanderbilt Heard Library digital scholarship resources on Wikidata and Wikibase [resources]
- Putting Data into Wikidata using Software by Steve Baskauf [post]
- Learning Wikibase
- Get your own copy of WikiData by Wolfgang Fahl [post]
- Transferring Wikibase data between wikis by Jeroen De Dauw [post]
- Wikibase resources by Olaf Janssen & KB national library of the Netherlands GitHub repo
- Manual installation of the Wikibase Suite
- If you already have Mediawiki, install manually the Wikibase extension
docker-compose up -d
of the Wikibase Docker Image- Obsolete: WbStack as a part of the "Wikibase as a service". Ask an invitation from Adam Shorland
- Wikibase Cloud is "Wikibase as a service".
- Ansible playbook for Wikibase [docs]
- Conceptual data model
- Simplified conceptual data model
- RDF Dump Format
- Canonical PHP implementation of the Wikibase Data Model
- Wikibase DataModel Serialization
Before starting with data import please read the following resources:
- Fast Bulk Import Into Wikibase by Jeroen De Dauw
- A protocol for adding knowledge to Wikidata, a case report
- A protocol for adding knowledge to Wikidata: aligning resources on human coronaviruses
- Creating new items: Click Special Pages on the left-hand menu and then Create a new item
- Creating new properties: Click Special Pages on the left-hand menu and then Create a new property
A recommended way to import data into a Wikibase instance is via the Wikibase API. Many wrappers of the Wikibase API exist:
- With Graphical User Interface (GUI)
- QuickStatements [GUI], [Help]
- OpenRefine [homepage], [docs]
- With command Line Interface (CLI)
- wikibase-cli is a CLI interface to the JS modules wikibase-edit and wikibase-sdk
- Libraries
- Wikidata-Toolkit is a Java library [overview]
- WikidataIntegrator is a Python library [zenodo]
- wikibase-edit is a NodeJS library [docs]
- WikibaseIntegrator is a Python library
- Pywikibot is a Python library [manual]
An unrecommended way to import data into a Wikibase intance is via direct inserts into the MySQL database (MariaDB). Then, Wikibase Query Updater sends data from MariaDB to the graph database Blazegraph. It is faster but more risky, because undesired inserts might happen by accident.
- wikibase-insert is a Java tool described in the [FactGrid's post]
- RaiseWikibase is a Python tool described in [preprint], [docs], [poster]
Federated properties in Wikibase are still under development:
- Federated properties in Wikibase: [docs]
- Workboard at Phabricator
Current workaround is getting basic info about the properties from the Wikidata SPARQL endpoint and creating those properties locally:
- WikidataIntegrator Notebook and its parallel implementation
- miniWikibase.py in RaiseWikibase
- wikibase-tools
Every property is associated with a certain datatype in the Wikibase Data Model. Some of the datatypes are not native and require extensions. See:
- OpenRefine is a Java tool for working with messy data and for improving data [homepage], [docs]
- Wikibase reconciliation interface [code], [API], [paper]
- Reconciliation Service API: A protocol for data matching on the Web
- testbench
Named entity linking is widely used for creating and extending knowledge graphs.
SOTA algorithms can be found at paperswithcode. 16 benchmarks are available.
- GENRE & mGENRE is a multilingual entity linker to Wikidata based on BART and mBART, [paper GENRE], [paper mGENRE], [examples GENRE], [examples mGENRE]
- BLINK links entities to Wikipedia based on fine-tuned BERT, links to Wikidata are obtained for free from Wikipedia [paper], [see a fork]
- entity-fishing is a named entity linker on Wikidata [demo], [docs], [presentation]
- spacyfishing is a spaCy wrapper for entity-fishing
- OpenTapioca is a real-time entity linker to Wikidata [live demo], [paper], [docs]
- Spacy Entity Linker is a simple experimental NELinker to Wikidata using queries to a local database
- spaCyOpenTapioca is a spaCy pipeline for OpenTapioca [spaCy Universe]
- Survey on English Entity Linking on Wikidata is a survey paper
- falcon2.0 is a joint entity and relation linking tool over Wikidata [demo], [paper]
SOTA algorithms at Wikidata were tested at SemTab 2020: Semantic Web Challenge on Tabular Data to Knowledge Graph Matching and SemTab 2021:
- MTab is the best algorithm for semantic table interpretation [online demo & APIs], [paper]
- bbw is based on meta-lookup (metasearch over SearX) [docs], [paper]
- JenTab is a modular tool [paper]
- MantisTable 4 is a tool with advanced GUI [paper]
See also:
- table-linker is an entity linkage tool which links the given string to Wikidata Q nodes [table-linker-pipelines]
The first mechanism is using constraints:
- Extension:WikibaseQualityConstraints
- For Wikidata see Help:Property constraints portal
The second mechanism is using Entity Schemas and Shape Expressions (ShEx).
- Extension:EntitySchema
- For Wikidata see Wikidata: Schemas and Wikidata:WikiProject Schemas
Tools for entity schemas:
- WikiShape is a playground (vizualization, querying, validation & extraction) customized for Wikibase instances [code]
- Wikidata Shape Expressions Inference is a tool for automatic inference of ShEx schemas from a set items [code]
- sheXer is an automatic inference of ShEx schemas from a set of items [code]
- YASHE is a ShEx editor [code]
- ShExStatements is a tool for simplified writing the shape expressions in Wikidata [paper], [code]
- ShEx2 (aka shex.js) is a simple online validator [code], [zenodo]
- RDFShape is a general RDF playground for data validation and conversion between semantic formats [paper]
- PyShExy is an API to validate RDF entities against ShEx schemas using PyShEx
Relevant papers:
- Using Shape Expressions (ShEx) to Share RDF Data Models and to Guide Curation with Rigorous Validation
- Using logical constraints to validate information in collaborative knowledge graphs: a study of COVID-19 on Wikidata
- Creating Knowledge Graphs Subsets using Shape Expressions
- The first Federated-Wikibase-Workshop: Antwerp, 2018-04-23/25
- Wikibase Workshop in Berlin, 2018
- The Wikibase Summit: New York, 2018
- Ghent University Wikidata and Wikibase Workshop 2019
- Wikidata Workshop 2020 [papers]
- Wikidata Workshop 2021 [papers]
- Wikidata Workshop 2022
- Schema Inference on Wikidata by Lucas Werkmeister [thesis]
- Modelling and Importing Dynamic Data into Wikibase: A Case Study of the Swiss Transportation System by Samuel Meuli [thesis]
- Fudie Zhao, A systematic review of Wikidata in Digital Humanities projects, Digital Scholarship in the Humanities, Volume 38, Issue 2, June 2023, Pages 852–874, https://doi.org/10.1093/llc/fqac083
- Tharani, Karim. "Much more than a mere technology: A systematic review of Wikidata in libraries." The Journal of Academic Librarianship 47.2 (2021): 102326. https://doi.org/10.1016/j.acalib.2021.102326
- Waagmeester, A., Stupp, G., Burgstaller-Muehlbacher, S., Good, B. M., Griffith, M., Griffith, O. L., ... & Su, A. I. (2020). Wikidata as a knowledge graph for the life sciences. Elife, 9, e52614. https://doi.org/10.7554/eLife.52614
- Nielsen, F.Å., Mietchen, D., Willighagen, E. (2017). Scholia, Scientometrics and Wikidata. In: Blomqvist, E., Hose, K., Paulheim, H., Ławrynowicz, A., Ciravegna, F., Hartig, O. (eds) The Semantic Web: ESWC 2017 Satellite Events. ESWC 2017. Lecture Notes in Computer Science(), vol 10577. Springer, Cham. https://doi.org/10.1007/978-3-319-70407-4_36
- Turki, H., Shafee, T., Taieb, M.A.H., Aouicha, M.B., Vrandečić, D., Das, D. and Hamdi, H., 2019. Wikidata: A large-scale collaborative ontological medical database. Journal of Biomedical Informatics, 99, p.103292. https://doi.org/10.1016/j.jbi.2019.103292
- Wikidata is a general-purpose Wikibase knowledge graph [SPARQL]
- Wikibase Registry is a Wikibase knowledge graph of Wikibase knowledge graphs [SPARQL], [timeline of Wikibase instances]
- Rhizome Artbase is a Wikibase knowledge graph of born-digital artworks from 1999 to the present day [SPARQL]
- FactGrid is a Wikibase knowledge graph for historical research [SPARQL], [Viewer], [fast search via ringgaard.com]
- Lingua Libre is a Wikibase knowledge graph of audiovisual data [SPARQL]
- OpenStreetMap Metadata is a Wikibase knowledge graph of metadata in OpenStreetMap [SPARQL]
- PersonalData.io is a Wikibase knowledge graph about personal data ecosystem [SPARQL]
- EU knowledge graph is a Wikibase knowledge graph about European Union [SPARQL], [Question-Answering over KG], [paper at ISWC2021 "Wikibase as an Infrastructure for Knowledge Graphs: the EU Knowledge Graph"]
- enslaved.org is a Wikibase knowledge graph about people of the historical slave trade [frontend]
- Semlab Wikibase is a Wikibase knowledge graph of Semantic Lab at Pratt Institute with data about their research projects [SPARQL]
- Virus-Taxonomy is a Wikibase knowledge graph of virus taxonomy [SPARQL]
- DataTrek is a Wikibase knowledge graph of open data for Star Trek
- Nonbinary is a Wikibase knowledge graph of concepts relevant to nonbinary identities
- The De Jonge Wiki is a Wikibase knowledge graph of research that has been carried out on the Arenberg Castle
- Biblissima is a Wikibase knowledge graph of the Biblissima authority repositories
- Standartopedia is a Wikibase knowledge graph of Russian legal norms and requirements of standards
- DataCegeSoma is a Wikibase knowledge graph of authority data for CegeSoma / State Archives in Belgium created by Anne Chardonnens as a part of her PhD thesis [SPARQL]
- MaRDi portal is a Wikibase knowledge graph of mathematical research data [SPARQL]
- MiMoTextBase is a Wikibase knowledge graph of the French Enlightenment novel [SPARQL] [MiMoText Project] [Tutorial]
- EURHISFIRM is a sandbox Wikibase knowledge graph of historical high-quality firm level data for Europe [SPARQL], [GitLab]
- Aktienführer is a Wikibase knowledge graph of the German listed stock companies from the Hoppenstedt-Aktienführer from 1956 to 2018 [SPARQL]
More Wikibase instances can be found at Wikibase Registry and WikiAPIary.
The initial version of this repo is based on the slides Wikibase knowledge graphs for data management & data science presented at Data Literacy Snacks 2021.