EMODnet / esas2obis

Darwin Core mapping of ESAS data for publication to OBIS

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ESAS2OBIS

funding

Rationale

This repository contains the functionality to standardize the data of the European Seabirds at Sea (ESAS) to a Darwin Core Archive that can be harvested by OBIS and GBIF.

Workflow

To republish the data:

  1. Clone this repository to your computer.
  2. Download all public ESAS data from ICES.
  3. Unzip the download and move the files to the repository in a data/raw directory. The directory (and the files it contains) is ignored by git, so you will have to create it.
  4. Open the repository in RStudio by opening the esas2obis.Rproj file.
  5. Open the Darwin Core mapping script dwc_mapping.Rmd.
  6. Click Run > Run All to transform the data to Darwin Core files using SQL. This will take a while.
  7. Verify that all steps in the the mapping script ran without errors.
  8. Verify in git or GitHub Desktop that the sample data are not affected (changes would indicate updates or issues in the mapping).
  9. Upload the Darwin Core files to the VLIZ "upload" IPT.
  10. Validate the Darwin Core Archive (by EurOBIS staff).
  11. Publish the dataset to OBIS and GBIF (by EurOBIS staff).

Published dataset

Darwin Core transformation

ESAS data is structured in 4 hierarchical tables: campaigns, samples, positions and observations.

Event core

The Event core contains three types of events:

  • Campaigns (type=cruise) with an eventID, date range, and remarks.
  • Samples (type=sample) with an eventID, parentEventID (the campaign), single date and remarks.
  • Positions (type=subSample) with an eventID, parentEventID (the sample), datetime and location.

The eventIDs are created by concatenating the parent identifiers, e.g. <campaignID>_<sampleID>_<positionID> for a position. This makes them unique within the dataset and easy to understand.

Record-level terms such as institutionCode, datasetName, license and rightsHolder are included as well.

See the SQL file for the full transformation.

Occurrence extension

The Occurrence extension contains the observations, with the following terms:

  • eventID (the position) and occurrenceID.
  • basisOfRecord (always HumanObservation) and occurrenceStatus (always present).
  • scientificName, scientificNameID (WoRMS identifier), kingdom (always Animalia) and vernacularName.
  • individualCount, sex, lifeStage, behavior, associatedTaxa (also expressed as measurements or facts).
  • occurrenceRemarks.

The occurrenceIDs are created similarly to the eventIDs, as <campaignID>_<sampleID>_<positionID>_<observationID>.

See the SQL file for the full transformation.

Extended Measurement Or Fact extension (EMOF)

The EMOF extension contains all other ESAS data, with the following terms:

  • eventID: identifier of sample or position (there are no campaign measurements).
  • occurrenceID (where applicable): identifier of the occurrence.
  • measurementType: lowercase description of the measurement.
  • measurementTypeID (where applicable): link to a definition of the measurement. Where possible, we use the BODC Parameter Usage Vocabulary (P01) or fall back to ESAS vocabularies maintained by ICES (e.g. https://vocab.ices.dk/services/rdf/collection/UseOfBinoculars).
  • measurementValue: human readable value or description, lowercased where appropriate.
  • measurementValueID (where applicable): IRI for the value. These mostly link to values in ESAS vocabularies maintained by ICES (e.g. https://vocab.ices.dk/services/rdf/collection/UseOfBinoculars/2), except for platform code (C17), sex (S10) and life stage (S11).
  • measurementUnit (where applicable): unit of the measurement.
  • measurementUnitID: link to a definition of the unit, with XXXX for not applicable and UUUU for dimensionless (e.g. individualCount).

The ESAS terms behaviour and association can contain multiple values for a single observation and are split into maximum 3 measurements or facts records.

See Table 1 for an overview and the SQL file for the full transformation.

Table 1: ESAS terms that are expressed as measurement or fact

table measurement or fact type example
sample platform code vocab BELGICA
sample platform class vocab ship
sample platform side vocab left
sample platform height number
sample transect width integer 300
sample sampling method vocab ship-based transect method with distance estimation and snapshot for flying birds
sample primary sampling boolean True
sample target taxa vocab all species recorded (standard)
sample distance bins string 0|50|100|200|300
sample use of binoculars vocab Binoculars used extensively for scanning ahead and to the side, naked eye used for close observations (e.g. for cetacean monitoring)
sample number of observers integer 2
position distance number 0.7
position area number 0.21
position wind force vocab moderate breeze
position visibility vocab C
position glare vocab weak
position sun angle integer
position cloud cover vocab
position precipitation vocab none
position ice cover integer 0
position observation conditions vocab
observation group identifier string 12
observation in transect boolean True
observation individual count integer 1
observation observation distance vocab 100-200
observation life stage vocab adult
observation moult vocab  active primary moult
observation plumage vocab non-breeding (winter) plumage
observation sex vocab female
observation travel direction vocab  45
observation prey vocab medium fish, unidentified (ca. 2-5x bill length)
observation association x 3 vocab associated with observation base
observation behaviour x 3 vocab scavenging

Repo structure

The repository structure is based on Cookiecutter Data Science and the Checklist recipe. Files and directories indicated with GENERATED should not be edited manually.

├── README.md              : Description of this repository
├── LICENSE                : Repository license
├── esas2obis.Rproj        : RStudio project file
├── .gitignore             : Files and directories to be ignored by git
│
├── src
│   └── dwc_mapping.Rmd    : Darwin Core mapping script
|
├── sql                    : Darwin Core transformations
│   ├── dwc_event.sql
│   ├── dwc_occurrence.sql
│   └── dwc_mof.sql
|
└── data
    ├── processed          : Darwin Core output of mapping script GENERATED
    └── processed_sample   : Darwin Core sample output of mapping script for git comparison GENERATED

License

MIT License for the code and documentation in this repository. The included data is released under another license.

About

Darwin Core mapping of ESAS data for publication to OBIS

License:MIT License


Languages

Language:R 100.0%