regisrob / openrefine-wikibase

Exposes a reconciliation service for OpenRefine for a Wikibase instance

Home Page:https://tools.wmflabs.org/openrefine-wikidata/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Wikibase reconciliation interface for OpenRefine

Build Status Coverage Status

An instance of this endpoint for Wikidata can be found at: https://tools.wmflabs.org/openrefine-wikidata/en/api

This is a new reconciliation interface, with the following features:

  • Matching columns with Wikibase properties, to improve the fuzzy matching score ;
  • Autocomplete for properties and items ;
  • Support for SPARQL-like property paths such as "P17/P297" (which fetches the ISO code of the country of an item) ;
  • Language selection (use /$lng/api as endpoint, where $lng is your language code) ;
  • Reconciliation from sitelinks (Wikipedia in the case of Wikidata).

TODO (Pull requests welcome!)

  • Better scoring ;
  • Web-based interface ;
  • More optimization for speed.

Screenshot

MIT license.

Configuring for other Wikibase instances than Wikidata

This service can be configured to run against another Wikibase instance than Wikidata. The Wikibase instance will need to have an associated SPARQL Query Service, and some properties and items will need to be set up. All the relevant values must be configured in the config.py file, and an example of this file for Wikidata is provided in config_wikidata.py.

Running with Docker

You can run this service with Docker:

docker pull pintoch/openrefine-wikibase
docker run -p 8000:8000 pintoch/openrefine-wikibase

On Windows you will need to accept the Windows Firewall popup to expose the port.

Running manually

It is possible to run this web service locally. You will need Python 3 and a redis instance.

  • Clone this repository, either with git (git clone https://github.com/wetneb/openrefine-wikibase) or by downloading the repository from Github as an archive
  • It is recommended to set up a virtualenv to isolate the dependencies of the software from the other python packages installed on your computer. On a UNIX system, virtualenv .venv and source .venv/bin/activate will do. On a Windows system, python.exe -m venv venv followed by venvname\Scripts\activate should work.
  • Install the Python dependencies with pip install -r requirements.txt
  • Copy the configuration file provided: cp config_wikidata.py config.py (copy config_wikidata.py config.py on Windows)
  • Edit the configuration file config.py so that redis_client contains the correct settings to access your redis instance. The default parameters should be fine if you are running redis locally on the default port.
  • Finally, run the instance with python app.py. The service will be available at http://localhost:8000/en/api.

On Debian-based systems, it looks as follows:

sudo apt-get install git redis-server python3 virtualenv
git clone https://github.com/wetneb/openrefine-wikibase
cd openrefine-wikibase
virtualenv .venv
source .venv/bin/activate
pip install -r requirements.txt
cp config_wikidata.py config.py
python app.py

About

Exposes a reconciliation service for OpenRefine for a Wikibase instance

https://tools.wmflabs.org/openrefine-wikidata/

License:Other


Languages

Language:Python 95.7%Language:HTML 4.1%Language:Dockerfile 0.2%Language:Shell 0.1%