earthpulse / ml-dataset

STAC extension for ML Training Datasets

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Machine Learning Dataset Extension Specification

This document explains the Machine Learning Dataset Extension to the SpatioTemporal Asset Catalog (STAC) specification.

Fields

The fields in the table below can be used in these parts of STAC documents:

  • Catalogs
  • Collections
  • Item Properties
  • Assets (for both Collections and Items, incl. Item Asset Definitions in Collections)
  • Links
Field Name Type Description
ml-dataset:name string The name of the dataset
ml-dataset:tasks array List of (suggested) tasks that can be solved with the dataset
ml-dataset:inputs-type string Type of the inputs (text, image, satellite image, video, ... or combination)
ml-dataset:annotations-type string Type of annotations (raster, vector, ...) (not present for unsupervised learning)
ml-dataset:quality-metrics array List of quality metrics that define the quality of the dataset
ml-dataset:version string Dataset version
ml-dataset:splits array List of the splits names. Suggested are Training, Validation, Test, Legacy, Benchmark
ml-dataset:split-items array List of the Items that conform the split
ml-dataset:split string Name of the split the Item is included

Additional Field Information

template:new_field

This is a much more detailed description of the field template:new_field...

XYZ Object

This is the introduction for the purpose and the content of the XYZ Object...

Field Name Type Description
x number REQUIRED. Describe the required field...
y number REQUIRED. Describe the required field...
z number REQUIRED. Describe the required field...

Relation types

The following types should be used as applicable rel types in the Link Object.

Type Description
ml-dataset:splits Links to train, test, validation splits if defined

Contributing

All contributions are subject to the STAC Specification Code of Conduct. For contributions, please follow the STAC specification contributing guide Instructions for running tests are copied here for convenience.

Running tests

The same checks that run as checks on PR's are part of the repository and can be run locally to verify that changes are valid. To run tests locally, you'll need npm, which is a standard part of any node.js installation.

First you'll need to install everything with npm once. Just navigate to the root of this repository and on your command line run:

npm install

Then to check markdown formatting and test the examples against the JSON schema, you can run:

npm test

This will spit out the same texts that you see online, and you can then go and fix your markdown or examples.

If the tests reveal formatting problems with the examples, you can fix them with:

npm run format-examples

About

STAC extension for ML Training Datasets

License:Apache License 2.0