akvaplan-niva / gbif-no-darwin-core

Reproducible Darwin Core data pipelines for GBIF Norway

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Darwin Core biodiversity data pipelines

This repository contains data production pipelines for building Darwin Core datasets for publication in the Global Biodiversity Information Facility, with permanent archiving in Zenodo

EcoTaxa

Datasets

Notice: These are pre-production URLs, for testing purposes only

Workflow

  • Export EcoTaxa data as TSV (using DOI export with images)

  • Publish untreated TSV and images to Zenodo

  • Create Darwin Core occurrences in NDJSON from EcoTaxa TSV, using ecotaxa-darwin-core

  • Create unique Darwin Core sampling events in NDJSON by reducing the occurrences

  • @todo Merge with other/authoritative event metadata (eg. sampling volumes)

  • Create lists of ignored (not-living) and rejected (non-Eukaryota) objects

  • Create lists of rejected events (non-unique or invalid/non-consistent metadata)

  • Finish local processing by executing Darwin Core pipelines below

gbif-no-darwin-core$ ./bin/ecotaxa-pipeline 1420

Darwin Core pipelines

Taxonomy

  • Create taxonomy NDJSON by extracting occurrence taxa and checking against GBIF Species API using WoRMS
  • Create lists of possible taxonomy issues (not found or incertae sedis)

Metadata

  • Extract time coverage (start/end, years, months, days, dates)
  • Extract space coverage (bounding box/depths)
  • Extract sampling protocols
  • @todo Create EML XML

Archive

Metafile

  • Create meta.xml with file metadata for event core (event.tsv) and extensions (occurrence.tsv taxonomy.tsv)
  • Set default fields for occurrenceStatus ("present"), basisOfRecord (MO?) and organismQuantityType ("individuals")

Event Core

  • Reduce occurrences by rolling up to one line per taxon per sample and summing organismQuantity

Occurrences extension

  • Update resulting occurrences by appending authorship into scientific name and merge-in relevant fields from taxonomy (in particular taxonID)
  • Publish NDJSON distribution with zipped Darwin Core archive in Zenodo

Taxonomy

Dependencies

@todo

Project

This project was co-funded by GBIF Norway, see Data management plan for further details.

About

Reproducible Darwin Core data pipelines for GBIF Norway

License:MIT License


Languages

Language:Shell 85.1%Language:JavaScript 14.9%