UB-Mannheim / madata

A tool for syncing the dataset-metadata between MADATA and Wikidata

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

madata

PyPI version

madata syncs the metadata of datasets between MADATA (Mannheim Data Repository) and Wikidata. It provides access to the MADATA metadata records directly in Python.

Table of contents

Installation

pip install madata

or

git clone https://github.com/UB-Mannheim/madata
cd madata/
pip install .

Initialization

By initialization madata harvests the MADATA OAI-PMH interface, stores the Dublin Core metadata records in records.OAI_DC and queries the Wikidata SPARQL endpoint for the list of metadata records published at MADATA. Example:

from madata import Metadata
records = Metadata()
print(records)
[('OAI', 'https://madata.bib.uni-mannheim.de/cgi/oai2'),
 ('MADATA records from OAI-PMH', 163),
 ('MADATA records at Wikidata', 1),
 ('In sync?', False)]

Every record rec in the the list records.OAI_DChas the following attributes: rec.metadata (structured metadata record), rec.header (structured header for a metadata record) and rec.raw (raw DC metadata record). The raw header is available via rec.header.raw. Additionally, a pandas-dataframe with metadata records is stored in records.OAI_DC_df.

Syncing

In order to upload the MADATA metadata records to Wikidata, you need an account at Wikidata. If you have an account, use

from madata import Metadata
records = Metadata()
records._sync()
>>> Wikidata username: 
>>> Wikidata password: 

Type your username and password, then madata starts to sync the metadata records at MADATA and Wikidata.

SPARQL queries

The MADATA-subset at Wikidata: https://w.wiki/6s7R. MADATA datasets and authors at Wikidata: https://w.wiki/6tYB

About

A tool for syncing the dataset-metadata between MADATA and Wikidata

License:MIT License


Languages

Language:Python 100.0%