rebipp / ppi

REBIPP: Plant-Pollinator Interactions Data Vocabulary

Home Page:https://ppi.rebipp.org.br

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Defining a data model for Plant-Pollinator Interactions

zedomel opened this issue · comments

I'm creating this issue for discussions related to a data model for Plant-Pollinator Interactions (PPI).

We have discussed many options to fit PPI into DwC-Archives, and their advantages and disadvantages. Here I will summarized what have been discussed, so we have tracking of this discussion and other people can also participate on that.

Data Model

  • Event Core Model: uses dwc:Event class as core in DwC-A's to represent interactions. So, an interaction is a dwc:Event with spatial (dwc:Location) and temporal information (e.g. dwc:eventData, dwc:eventTime).
  • Resource Relationship: uses dwc:ResourceRelationship extension and the core class can be dwc:Occurrence or dwc:Taxon. The terms in dwc:ResourceRelationship class are used to link dwc:Occurrences or dwc:Taxon's in order to represent an interaction between organisms/species.
  • Interaction extension: create a new DwC Extension called Interaction extension (or PPI extension), so it will allow to specify the interaction partner of a core dwc:Occurrence or dwc:Taxon.

Event Core Model

How it works

The same dwc:eventID is used to link two dwc:Occurrences or dwc:Taxons represeting an interaction between linked resources. The DwC-A contains a dwc:Event class as core and dwc:Occurrence and/or dwc:Taxon as extensions (see figure bellow).
The dwc:MeasurementOrFact class is then used within Plant Pollinator Interactions Vocabulary to describe many characteristics about interacting organism/species and interactions.

Issues:

  1. dwc:Event does not provide a term which can be used to specify the type of an interaction (e.g. visitsFlowersOf, pollinates).
  2. No direction of the recored interaction can be provided. So, from two occurrences/taxa that share the same dwc:eventID we don't know which one is the subject and the object of the interaction (who visits who?, who pollinates who?).

Issue 1) can be resolved defining a term interactionType in the Plant Pollinator Interactions Vocabulary and use dwc:MeasurementOrFact linked to the dwc:Event, but we still do not have the direction of the interaction.

ppi-event-data-model

Resource Relationship

This looks the more "natural" way to use DwC to specify a relationship (aka interaction) between two or more resources.
The dwc:ResourceRelationship solves the issues of previous data model by the usage of dwc:relationshipOfResource (ie. interaction type) and the direction of an interaction is given by the terms dwc:resourceID and dwc:relatedResourceID.
New term dwc:relationshipOfResourceID tdwg/dwc#283 (comment) will allow the adoption of an URI (instead of a literal value) and so, a term in a vocabulary or ontology can be used here (e.g. RO).

Some critics have been made about the complexity of using dwc:ResourceRelationship class, but I don't see it much different (and more complex) than using dwc:Event as core, since in any scenario we will need a relation model to capture interactions.

The dwc:MeasurementOrFact class is used in the same way as in Event Model to specify the characteristics of occurrences/taxa, but the interaction data is linked directly to the dwc:Occurrence/dwc:Taxon since **Extended Measurement Or Fact does not define a term for **dwc:resourceRelationshipID(oppsed to thedwc:occurrenceID). The alternative here is to define a new class Interaction` (next approach).

ppi-resource-relat-data-model

Interaction Measurement Or Fact extension

Since we can not link dwc:MeasurementOrFact to a dwc:ResourceRelationship, we discussed the creation of a new extension Interaction. The ideia is similar to Extended Measurement Or Fact extension, but instead of defining the term dwc:occurrenceID as part of the extension, it will define the dwc: resourceRelationshipID.
With that approach the star schema can be expanded to a snowflake schema, and the relationships (aka interactions) could have some measurements and facts directly attached to them.

ppi-interaction-ext-data-model

Data sample

Event Data Model

Core: interactions.csv:

eventID eventDate locality
eventID_1 2021-01-01 10:00:00 São Paulo

Occurrences extension: occurrences.csv:

eventID occurrenceID sex scientificName
eventID_1 occ_1 female Xylocopa frontalis
eventID_1 occ_2 hermaphrodite Passiflora edulis

Extended MoF extension: emof.csv:

eventID occurrenceID measurementType measurementValue
eventID_1 occ_2 flowerColor purple
eventID_1   resourceCollected nectar

ResourceRelationship Data Model

Core: occurrences.csv:

occurrenceID sex scientificName eventDate locality
occ_1 female Xylocopa frontalis 2021-01-01 10:00:00 São Paulo
occ_2 hermaphrodite Passiflora edulis 2021-01-01 10:00:00 São Paulo

ResourceRelationship extension: resrelat.csv:

occurrenceID resourceRelationshipID resourceID relatedResourceID relationshipOfResource
occ_1 resrelat_1 occ_1 occ_2 visistsFlowerOf

MeasurementOrFact extension: mof.csv:

occurrenceID measurementType measurementValue
occ_2 flowerColor purple
occ_1 resourceCollected nectar

Interaction extension Data Model

Core: occurrences.csv:

occurrenceID sex scientificName
occ_1 female Xylocopa frontalis
occ_2 hermaphrodite Passiflora edulis

ResourceRelationship extension: resrelat.csv:

occurrenceID resourceRelationshipID resourceID relatedResourceID relationshipOfResource relationshipEstablishedDateProperty
occ_1 resrelat_1 occ_1 occ_2 visistsFlowerOf 2021-01-01 10:00:00

Interaction Measurement Or Fact Extension: interactions-mof.csv:

occurrenceID resourceRelationshipID measurementType measurementValue
occ_2   flowerColor purple
occ_1 resrelat_1 resourceCollected nectar

Other options

A Interaction extension can extend the dwc:ResourceRelationship to include geography information direct to the interactions (dwc:Locationclass) and many others terms that are relevant to characterize an interaction.

Questions

  • What is the "best" model for sharing plant-pollinator interactions?
  • What are others advantages/disadvantages of each model?
  • Could someone provide examples that fits each one of the models, and try to explain each one is the best for the purpose?

So, I would like to know the opinion of others about it.

thanks.

I would like to contribute an alternative model, but it may be a while before I can do so. I want to prepare the Darwin Core public review first. I hope to have that finished by 30 April.

Hi @tucotuco

I'm curious to know what you are thinking. Please, let me know if there is anything that I can help.

thks.

Here the model that was missing....
Plant-pollinator interaction schema

That missing one was exactly the one I was thinking of. It isn't as scary when just the tables that are involved are included. This one shows two complete examples in the one diagram.

I think we came to the conclusion that there is not really any such thing as an Occurrence/Taxon interaction, so that simplifies things as well.

Yes it isn't scary when only Event and ResRelat class are included in the model.... :-D

But, more complexity comes when we want to represent Interaction Outcomes which are results of multiple interactions. In that case, I think that a hierarchy of parent-child Events can be used to aggregate child Events (ie. interactions) to parent events (ie. Interactions Outcomes).

I will prepare an example to clarify this.

@tucotuco here a diagram with an example of parent-child Events:
Blank diagram - Page 1

The same example in tabular form:

eventID parentEventID eventDatte
eventID1 2021-05-05 09:12:00
eventID2 eventID1 2021-05-01 12:05:00
eventID3 eventID1 2021-05-01 15:51:00
eventID occurrenceID sex scientificName
eventID_2 occ_1 female Xylocopa frontalis
eventID_2 occ_2 hermaphrodite Passiflora edulis
eventID_3 occ_3 male Xylocopa frontalis
eventID_3 occ_4 hermaphrodite Passiflora edulis
eventID measurementType measurementValue
eventID_1 fruitSet 0.66

Translating, the diagram and the example are saying that the fruit set from two interactions between Xylocopa frontalis* and Passiflora edulis is 66%. This happens because, fruit set is not limited to the flowers of a specific individual plant. Instead, it can be a measurement of flowers from different individuals comprising multiple interactions.

What do you think? Parent-child Events should work for those cases?

thanks.

That looks like it should work in principle. I would about the eventDate for Event1. As a parent to Event2 and Event3, should it not reflect the time span for all of the child events, explicit or not? So, if there were only those two child Events, wouldn't the eventDate be a range that encompassed them? So, something like 2021-05-01T12:05:00Z/2021-05-01T15:51:00Z (using proper ISO8601 date formatting). The eventDate for the parent Event could presumably have a greater span than the children is contains if the parent Event meant something like a monitoring period within which the child Events occurred - something like 2021-05-01T00:00:00Z/2021-05-02T00:00:00Z. And of course Event1 would need a lot of other data to explain what it was really about.

Sorry for the simplicity of the examples... I didn't take too much attention to the dates... but you got the ideia. Yes, eventDate for Event1 should cover the date between event2 and event3. Then, the term dwc:measurementDeterminedDate can be used to capture the date/time when the measurement was taken (fruit set).

A similar approach is being used by OBIS to group marine surveys at multiple scales/levels https://obis.org/manual/dataformat/.

I will commit some exemples of datasets that can be useful to validate the model.

thx.

Some examples here.

Next, I provided standardized version of these datasets.

thx.