- 194.045 Data Stewardship 2019S
- Gerald Weber, 0125536
- Helmuth Breitenfellner, 08725866
https://helmuthb.github.io/dmp-tools-actionable/
We have been setting up an instance of RDMO on a virtualized environment, using Docker. The installation can be used at rdmo.helmuth.at.
For this part of the exercise we have been using the docker-compose version of RDMO as available on GitHub.
We were investigating the following possible ways of connecting with an RDMO instance:
- Database connection
- RDMO API
- XML export from RDMO
Using the database access was considered the last resort, as this would be dependend on internal data structures of the tool.
The RDMO API did originally sound very promising. However, we found that existing installations often disabled the API (e.g. rdmo-demo.uibk.ac.at/), or the API is not accessible for regular users.
We settled on the third option, which is using the XML which can be exported from a project in the RDMO user interface. This allows using the export tool for regular users, even if the API has been disabled in the installation.
The tool is now a simple command line tool. As a precondition, the user has to export the project as XML from RDMO. Then the tool can be started with
python3 xml2madmp.py <rdmo-export.xml> <ma-dmp.json>
The tool is licensed under the MIT license.
For this part of the exercise we have used a two-part mapping.
First, we created a questionaire based on the Horizon 2020 template and FWF template, which maps the questions to the corresponding fields and attributes of the RDMO model.
Then, we were applying our mapping from RDMO attributes to the RDA DMP Common Standard.
Some of the questions from Horizon 2020 do not clearly map to attributes in the RDMO data model. We were using guidelines from the existing Horizon 2020 view which performs the mapping into the other direction.
Similar issues as in the case of Horizon 2020 were encountered. In addition, we did not cater for the last case when no data is processed, as a DMP is then not needed at all.
RDA DMP Field | RDMO source |
---|---|
title |
title |
description |
description |
language |
* always set to en |
created |
created |
modified |
* latest modification date of any value |
ethical_issues_exist |
* loop through datasets |
contact |
coordination/name split into name and email |
project.start |
schedule/project_start |
project.end |
schedule/project_end |
cost |
* loop through cost-entries |
dataset.title |
dataset/id |
dataset.type |
dataset/format |
dataset.description |
* combined from various fields |
dataset.data_quality_assurance |
dataset/quality_assurance |
dataset.personal_data |
dataset/sensitive_data/personal_data_yesno/yesno |
dataset.sensitive_data |
dataset/sensitive_data/other/yesno |
dataset.distribution.description |
* combined from various fields |
dataset.distribution.license.license_ref |
dataset/sharing/sharing_license |
dataset.distribution.license.start_date |
dataset/data_publication_date |
As the RDMO data model does not have a field for the language of a project this field is set to English for all DMPs.
In RDMO, the field updated
more or less corresponds to the modified
field in the RDA DMP Common Standard. However, since RDMO uses a
relational database model, the updated
field of a project is only
changed if the fields in the corresponding table are updated.
To get a more natural interpretation of the last modification date the
tool is looking for the latest modification of any field or value in the
RDMO data and uses this value as the modified
field in the RDA DMP
Common Standard.
The tool uses the following logic:
- If for all datasets the questions for both personal data issues
and sensitive data issues are answered with no then this field is
set to
no
. - If for at least one dataset one of the questions is answered with
yes then the field is set to
yes
. - Otherwise, if the question is never answered with yes but is not
answered for at least one dataset, the field is set to
unknown
.
All cost-related entries in RDMO are parsed. The entries are added as
sub-entries to the cost
element, using the following translation
logic:
title
: This is mapped to the key in RDMO, e.g.ipr/non_personnel
.cost_value
: This is mapped to the value in RDMO.
Since there are fields in RDMO with no direct mapping to RDA DMP Common
Standard, the field dataset.description
is filled with data from other
RDMO fields:
dataset/description
dataset/interoperability
dataset/creation_methods
dataset/metadata
Since there are fields in RDMO with no direct mapping to RDA DMP Common
Standard, the field dataset.distribution.description
is filled with
data from other RDMO fields:
dataset/size/number_files
dataset/versioning_strategy
dataset/structure
dataset/reuse_scenario
dataset/sharing/conditions