isamplesorg / metadata

Collation of metadata examples and notes for the project

Home Page:https://isamplesorg.github.io/metadata/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

NSF-2004562 NSF-2004815 NSF-2004839 NSF-2004642

metadata

Defines the core metadata model for iSamples.

src/schemas/iSamplesCoreSchema.yml defines the iSamples core model in linkml. It references vocabularies contained in src/vocabularies/ which define terms for the Material Type, Sampled Feature, and Specimen Type vocabularies.

The following artifacts are generated from the linkml and vocabulary sources:

Development

Linkml and associated tools require a python environment, version 3.9 or newer, and uses poetry for dependency management. Poetry can be installed with pip install poetry.

To work on project contents and run artifact generators, first grab the source and switch to the develop branch:

git clone https://github.com/isamplesorg/metadata.git
cd metadata
checkout develop
pull

Setup a virtual environment (e.g. using poetry or mkvirtualenv):

poetry shell
poetry install

(To exit poetry shell, use exit).

Artifacts in the generated/ folder are produced by running make or make all.

Documentation is rendered with Quarto rather than the defaults mkdocs or Sphinx (Quarto offers many additional features for including computed examples which are planned). To generate the documentation, install a version of Quarto >= 1.2, then run make, make all or make gen-docs.

This will generate markdown intermediate files in the build/docs folder then invoke quarto render to generate the HTML docs in the docs/ folder.

Note that this project uses a version of the linkml docgen tool and templates modified to render markdown for quarto. The modified docgen and templates is located in the tools/ folder.

Older notes below

Collation of metadata examples and notes for the project

  • background: contains diagrams and information about some existing models that include metadata for samples; files are organized broadly by domain.
  • examples: example metadata documents from different systems. Subfolders are
    • raw: metadata from the originating system
    • test: corresponding records generated manually using the iSamples basic template
    • transform: corresponding records generated by automated ETL process from raw records
  • vocabulary: vocabularies related to sample metadata from various systems

linkML (Current version 1.1.15)

This branch implments how to use linkML to generate various output and operations for iSamples.

Current workflow (01/01/2022)

workflow

iSamples YAML schema to JSON schema

We could use the following command to convert iSamples YAML schema to JSON schema.

gen-json-schema -t PhysicalSampleRecord --not-closed iSamplesSchemaBasic0.3.yaml > iSamplesSchemaBasic0.3.schema.json 

In this command, -t PhysicalSampleRecord means to make "physicalSampleRecord" class become the top level class. And the prepoerties of the class become the top level properties in the JSON-schema. The converted JSON scheme file is "iSamplesSchemaBasic0.3.schema.json".

Generating JSON-LD context

gen-jsonld-context iSamplesSchemaBasic0.3.yaml > iSampleSchemaBasic0.3.jsonld

The command will save the result in the jsonld file. After we have the converted JSON-LD context. The enumeration part of JSON-context should be modified by us manually.

Modified JSON-LD context example
   "@context": {
      "dct": "http://purl.org/dc/terms/",
      "isam": "http://resource.isamples.org/schema/",
      "mat": "http://resource.isamples.org/vocabulary/material/",
      "pur": "http://resource.isamples.org/vocabulary/samplepurpose/",
      "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
      "sf": "http://resource.isamples.org/vocabulary/sampledFeature/",
      "skos": "http://www.w3.org/2004/02/skos/core#",
      "spt": "http://resource.isamples.org/vocabulary/specimentype/",
      "w3cpos": "http://www.w3.org/2003/01/geo/wgs84_pos#",
      "xsd": "http://www.w3.org/2001/XMLSchema#",
      "@vocab": "http://resource.isamples.org/schema/",
      "curation": {
         "@type": "@id"
      },
      "hasContextCategory": {
         "@type":"contextcategory"
      },
      "hasMaterialCategory": {
         "@type":"materialtype"
      },
      "hasSpecimenCategory": {
         "@type":"specimencategory"
      },
      "id": "@id",
      "latitude": {
         "@type": "xsd:decimal"
      },
      "location": {
         "@type": "@id"
      },
      "longitude": {
         "@type": "xsd:decimal"
      },
      "producedBy": {
         "@type": "@id"
      },
      "relatedResource": {
         "@type": "@id"
      },
      "resultTime": {
         "@type": "xsd:date"
      },
      "samplingSite": {
         "@type": "@id"
      }
   }
This is an example of modified JSON-LD context. For each enumeartion, we use `@type` to declare enumeration type.

Validating schema and instance file

Before we valideting all instance files, we need to add modified JSON-LD context to the front of instances properties.

Full instance example
{
   "@context": {
      "dct": "http://purl.org/dc/terms/",
      "isam": "http://resource.isamples.org/schema/",
      "mat": "http://resource.isamples.org/vocabulary/material/",
      "pur": "http://resource.isamples.org/vocabulary/samplepurpose/",
      "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
      "sf": "http://resource.isamples.org/vocabulary/sampledFeature/",
      "skos": "http://www.w3.org/2004/02/skos/core#",
      "spt": "http://resource.isamples.org/vocabulary/specimentype/",
      "w3cpos": "http://www.w3.org/2003/01/geo/wgs84_pos#",
      "xsd": "http://www.w3.org/2001/XMLSchema#",
      "@vocab": "http://resource.isamples.org/schema/",
      "curation": {
         "@type": "@id"
      },
      "hasContextCategory": {
         "@type":"contextcategory"
      },
      "hasMaterialCategory": {
         "@type":"materialtype"
      },
      "hasSpecimenCategory": {
         "@type":"specimencategory"
      },
      "id": "@id",
      "latitude": {
         "@type": "xsd:decimal"
      },
      "location": {
         "@type": "@id"
      },
      "longitude": {
         "@type": "xsd:decimal"
      },
      "producedBy": {
         "@type": "@id"
      },
      "relatedResource": {
         "@type": "@id"
      },
      "resultTime": {
         "@type": "xsd:date"
      },
      "samplingSite": {
         "@type": "@id"
      }
   },
"@schema": "../../iSamplesSchemaBasic0.2.json",
"@id": "metadata/21547/Car2PIRE_0334",
"label": "PIRE_0334",
"sampleidentifier": "ark:/21547/Car2PIRE_0334",
"description": "",
"hasContextCategory": ["Marine Biome"],
"hasMaterialCategory": ["Organic Material"],
"hasSpecimenCategory": ["Whole Organism"],
"informalClassification": ["Gastropoda"],
"keywords": ["Aceh", "Sumatra","Indonesia","Asia", "Mollusca"],
"producedBy": {
    "@id":"ark:/21547/Cas2INDO_2016_SEU_1B",
    "label": "INDO_2016_SEU_1B",
    "description": "expeditionCode: INDO_PIRE | samplingProtocol: ARMS | taxonomy team: MINV | projectId: 80",
    "hasFeatureOfInterest": "coral reef",
    "responsibility": ["Aji Wahyu Anggoro","Andrianus Sembiring"],
    "resultTime": "2016-08-09",
    "samplingSite": {
        "description": "Shallow, coastal reef. Apparent exposure to current, Porites dominated. Less impacted bleaching site, high recruitment, 12 m.",
        "label": "",
        "location": {
            "elevation": "maximumDepthInMeters: 12",
            "latitude": 5.89430,
            "longitude": 95.25293
        },
        "placeName": ["Pulau Seulako"]
    }
},
"registrant": "Chris Meyer",
"samplingPurpose": "genomic analysis",
"curation": {
    "accessConstraints": "",
    "curationLocation": "",
    "responsibility": ""
},
"relatedResource": {
    "label":"subsample tissue",
    "description":"",
    "target":"ark:/21547/Cat2INDO106431.1",
    "relationship":"subsample"
}

}

We need to use the following command to validate our instance files with schema.

linkml-validate -s iSamplesSchemaBasic0.3.yaml instance.json
jsonschema -i instance.json iSamplesSchemaBasic0.3.schema.json

The first command is to validate instance file with yaml schema. The second command is to validate instance file with json schema.

Run tools in a Docker container

The iSamples Metadata Docker container is based on the Docker container from the LinkML project [https://hub.docker.com/r/monarchinitiative/linkml/tags]

First you'll build the image: docker build -t isamples_linkml .

Then, running it will open a bash shell opened to /work, which is the Docker container volume representing the iSamples metadata repository: docker run -a stdin -a stdout -i -t -v `pwd`:/work isamples_linkml

Then use the following commands to generate LinkML:

  • Command 1
  • Command 2
  • Command 3

To do

  • We still focus on implementing the iSamples schema under linkML requirements.
  • There are some bugs or unimplemented parts in the linkML.
  • The different pc platform will have different results or errors. We prefer to use docker to run linkML. Please follow the linkML tutorial

About

Collation of metadata examples and notes for the project

https://isamplesorg.github.io/metadata/


Languages

Language:Python 37.8%Language:HTML 27.5%Language:Jinja 20.1%Language:Jupyter Notebook 6.8%Language:Makefile 6.6%Language:SCSS 1.2%