joerivandervelde / fair-schemas

FAIR schemas to enable and encourage data interoperability across systems

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

fair-schemas

FAIR schemas to enable and encourage data interoperability across systems. This repository contains best-practice tables that are reused for specific projects or applications via profiles. The aim is to be specific enough to be relatable and useful, but not be unnecessarily specific. For instance, Subject is reusable cross-species by replacing human-biased underlying ontologies.

Rules for tables

  • Tables represent concept archetype such as Study, Biobank, Subject, Biosample, and Cohort.
  • Tables are supersets of reusable columns that belong to that concept.
  • Identical or similar columns should ideally be merged.
  • Columns can have a partOfStandard attribute to indicate they represent an accepted standard. Standard must their own profile with this name.
  • Columns that represent the same concept (e.g. Age) expressed in a different value type (e.g. age in years vs. age range categorical) are not explicitly connected, but should be tagged with the same semantics.
  • Column names are only required to be unique within context of their table.
  • Table and column names must start with letter, followed by letter, number, whitespace or underscore ([a-zA-Z][a-zA-Z0-9_ ]*).
  • Things not supported:
    • Inheritance because it is too limiting. How to solve use cases for inheritance?
      • Instead of querying for an inheritance subtree you can indicate what profile of a table you want to have in your reference
      • When using refLabel for a reference the designer of an instance should ensure all columns exist in the refered to profiles.

Rules for profiles

  • Profiles represent specific projects or applications.
  • Profiles can cherry-pick a combination of:
    • Tables (all columns of that table). We sometimes refer to such table instance as 'flavor'. E.g. Patient is a flavor of Subject.
    • Columns (some columns of that table)
    • Profiles (all columns included or defined by that profile)
    • Standards (all columns included or defined by that standard)
  • Reused columns are chosen by referencing only their name.
  • Reused columns are placed in their original table structure.
  • Reused columns cannot be altered for interoperability purposes. This includes relabeling. If relabeling is required, this should be done via runtime internationalization.
  • Profiles for particular applications can introduce highly specific, non-reusable tables and columns.
  • To add new columns to an existing table, that table should be represented in the profile using only the name.
  • New columns in new tables should be fully specified as expected from tables.

Rules for standards

  • Standards are only comprised of columns annotated with the partOfStandard attribute.
  • Standard names that match the column attribute partOfStandard describe that standard.
  • Standard cannot point to additional columns, tables or profiles.
  • Standards cannot introduce additional columns.

Syntax

Table attributes

Attribute Description
name Name of this table. Required.
definedBy The location of the ontology term that defines this column.
definedAs The column definition according to the ontology term.
columns The columns contained in this table, comparable to class attributes or features.

Column attributes

Attribute Description
name Name of this column. Required.
definedBy The location of the ontology term that defines this column.
definedAs The column definition according to the ontology term.
dataType Data type of this column. Required.
unit Ontology term to denote the unit of measurement.
partOfStandard Mark this column is part of an accepted standard.
example An example value to guide users.

Profile attributes

Attribute Description
name Name of this profile. Required.
description Definition of this profile.
authors A list of contributing profile authors.
copyright A copyright statement about the profile.
license The license under which the profile is released.
reuseColumns Existing columns reused by this profile.
customColumns Tables or columns introduced by this profile.

Standard attributes

Attribute Description
name Name of this profile. Required.
description Definition of this profile, usually adapted from an ontology.
authors A list of contributing profile authors.
copyright A copyright statement about the profile.
license The license under which the profile is released.
url Link to a web address where more information can be found.

Reuse columns

Attribute Description
name Profile or table name, table.columnName for columns.
type "profile", "standard", "table", or "column"

Author attributes

Attribute Description
name Name of this author.
email Email address of this author.
orcid ORCID of this author.

Copyright attributes

Attribute Description
holder Name of the copyright holder.
years Year of publication and latest revision.

License attributes

Attribute Description
name Name of the active license.
url URL where license can be found.

Design principles for the syntax

  • we prefer explicit definitions over magic. E.g. if you are ducktyping (e.g. using different flavors of a table) then the standard should make exlpicit if you assume particular columns to be present (e.g. via refLabel you can indidate what columns you expect in a lookup).
  • semantic data types.

About

FAIR schemas to enable and encourage data interoperability across systems


Languages

Language:Java 100.0%