ESIPFed / stc

Repository for the Semantic Technologies Committee

Home Page:http://wiki.esipfed.org/index.php/Semantic_Technologies

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

use case for data workflow

TWellman opened this issue · comments

Suggest adding a use case for data processing workflow. We have one example to submit.

@lewismc

OK, no problem. I have a branch started but won't edit further or initiate a pull request. Html is Below.

Use of Ontology Information in Data Processing Workflows

Tristan Wellman, Science Analytics and Synthesis, U.S. Geological Survey

▶ Full use case description (click to expand):

An ontology is created in the Darwin Core convention to follow protocol required by OBIS-USA and NOAA NCEI. The instantiation on ESIP COR provides a stable, publically-available endpoint used in the data processing workflow. As part of the workflow, basic ontology information and external supplementary information describing each variable (term) are infused as metadata into NetCDF data files. Real-time feedback could be useful to ensure variable information and ontology information continuously align. As terms are added or modified, ontology versioning is needed to support historical data products which reference this resource.

User profile: A user or institution that expects to evolve ontology records in an automated workflow and requires reproducibility of the resulting data products that use ontology information.

Scenario: An institution in the Earth science community uses semantic vocabularies stored on public endpoints to describe scientific terms and variables in their data products. When these data products are created or revised ontologies should be updated in step. Versioning should be used to reproduce vocabulary information used in historical case studies.

Workflow:

  1. A code-driven analysis package is activated to process a collection of data files.
  2. A series of quality control and processing functions are conducted in the processing workflow.
  3. A processing function calls ESIP COR to match vocabulary terms defined within the cached ontology.
  4. Additional variable (term) information, such as variable type, units, and alias name are retrieved to enhance default information.
  5. Where vocabulary terms are new or vocabulary information has been revised or enhanced, the ESIP COR instantiation is updated to include the latest publically-available scientific information, potentially in real-time.

Requirements implied by this use case:

  1. The ontology portal has automated versioning capabilities used to preserve ontology definitions in real time. Ontologies can be retrieved by version at user request.
  2. The ontology portal allows authenticated users to update, create, or delete ontologies using a simple API, perhaps generating a modified temporary ontology while preserving the original parent ontology until a review has been completed.

,

Excellent @TWellman I'll write this as a PR and commit to document.

addressed via d52523c
Thank you @TWellman

Perhaps a more generic description in the first sentence would be valuable.

"A base ontology is created to describe term identifiers, labels, and definitions, which are used for processing data records through OBIS-USA and NOAA NCEI."

Thanks @TWellman this has been accommodated in current documentation.