Defining a data model for Plant-Pollinator Interactions
zedomel opened this issue · comments
I'm creating this issue for discussions related to a data model for Plant-Pollinator Interactions (PPI).
We have discussed many options to fit PPI into DwC-Archives, and their advantages and disadvantages. Here I will summarized what have been discussed, so we have tracking of this discussion and other people can also participate on that.
Data Model
- Event Core Model: uses
dwc:Event
class as core in DwC-A's to represent interactions. So, an interaction is adwc:Event
with spatial (dwc:Location
) and temporal information (e.g.dwc:eventData
,dwc:eventTime
). - Resource Relationship: uses
dwc:ResourceRelationship
extension and the core class can bedwc:Occurrence
ordwc:Taxon
. The terms indwc:ResourceRelationship
class are used to linkdwc:Occurrence
s ordwc:Taxon
's in order to represent an interaction between organisms/species. - Interaction extension: create a new DwC Extension called Interaction extension (or PPI extension), so it will allow to specify the interaction partner of a core
dwc:Occurrence
ordwc:Taxon
.
Event Core Model
How it works
The same dwc:eventID
is used to link two dwc:Occurrence
s or dwc:Taxon
s represeting an interaction between linked resources. The DwC-A contains a dwc:Event
class as core and dwc:Occurrence
and/or dwc:Taxon
as extensions (see figure bellow).
The dwc:MeasurementOrFact
class is then used within Plant Pollinator Interactions Vocabulary to describe many characteristics about interacting organism/species and interactions.
Issues:
dwc:Event
does not provide a term which can be used to specify the type of an interaction (e.g.visitsFlowersOf
,pollinates
).- No direction of the recored interaction can be provided. So, from two occurrences/taxa that share the same
dwc:eventID
we don't know which one is the subject and the object of the interaction (who visits who?, who pollinates who?).
Issue 1) can be resolved defining a term interactionType
in the Plant Pollinator Interactions Vocabulary and use dwc:MeasurementOrFact
linked to the dwc:Event
, but we still do not have the direction of the interaction.
Resource Relationship
This looks the more "natural" way to use DwC to specify a relationship (aka interaction) between two or more resources.
The dwc:ResourceRelationship
solves the issues of previous data model by the usage of dwc:relationshipOfResource
(ie. interaction type) and the direction of an interaction is given by the terms dwc:resourceID
and dwc:relatedResourceID
.
New term dwc:relationshipOfResourceID
tdwg/dwc#283 (comment) will allow the adoption of an URI (instead of a literal value) and so, a term in a vocabulary or ontology can be used here (e.g. RO).
Some critics have been made about the complexity of using dwc:ResourceRelationship
class, but I don't see it much different (and more complex) than using dwc:Event
as core, since in any scenario we will need a relation model to capture interactions.
The dwc:MeasurementOrFact
class is used in the same way as in Event Model to specify the characteristics of occurrences/taxa, but the interaction data is linked directly to the dwc:Occurrence
/dwc:Taxon
since **Extended Measurement Or Fact does not define a term for **
dwc:resourceRelationshipID(oppsed to the
dwc:occurrenceID). The alternative here is to define a new class
Interaction` (next approach).
Interaction Measurement Or Fact extension
Since we can not link dwc:MeasurementOrFact
to a dwc:ResourceRelationship
, we discussed the creation of a new extension Interaction
. The ideia is similar to Extended Measurement Or Fact extension, but instead of defining the term dwc:occurrenceID
as part of the extension, it will define the dwc: resourceRelationshipID
.
With that approach the star schema can be expanded to a snowflake schema, and the relationships (aka interactions) could have some measurements and facts directly attached to them.
Data sample
Event Data Model
Core: interactions.csv:
eventID | eventDate | locality |
---|---|---|
eventID_1 | 2021-01-01 10:00:00 | São Paulo |
Occurrences extension: occurrences.csv:
eventID | occurrenceID | sex | scientificName |
---|---|---|---|
eventID_1 | occ_1 | female | Xylocopa frontalis |
eventID_1 | occ_2 | hermaphrodite | Passiflora edulis |
Extended MoF extension: emof.csv:
eventID | occurrenceID | measurementType | measurementValue |
---|---|---|---|
eventID_1 | occ_2 | flowerColor | purple |
eventID_1 | resourceCollected | nectar |
ResourceRelationship Data Model
Core: occurrences.csv:
occurrenceID | sex | scientificName | eventDate | locality |
---|---|---|---|---|
occ_1 | female | Xylocopa frontalis | 2021-01-01 10:00:00 | São Paulo |
occ_2 | hermaphrodite | Passiflora edulis | 2021-01-01 10:00:00 | São Paulo |
ResourceRelationship extension: resrelat.csv:
occurrenceID | resourceRelationshipID | resourceID | relatedResourceID | relationshipOfResource |
---|---|---|---|---|
occ_1 | resrelat_1 | occ_1 | occ_2 | visistsFlowerOf |
MeasurementOrFact extension: mof.csv:
occurrenceID | measurementType | measurementValue |
---|---|---|
occ_2 | flowerColor | purple |
occ_1 | resourceCollected | nectar |
Interaction extension Data Model
Core: occurrences.csv:
occurrenceID | sex | scientificName |
---|---|---|
occ_1 | female | Xylocopa frontalis |
occ_2 | hermaphrodite | Passiflora edulis |
ResourceRelationship extension: resrelat.csv:
occurrenceID | resourceRelationshipID | resourceID | relatedResourceID | relationshipOfResource | relationshipEstablishedDateProperty |
---|---|---|---|---|---|
occ_1 | resrelat_1 | occ_1 | occ_2 | visistsFlowerOf | 2021-01-01 10:00:00 |
Interaction Measurement Or Fact Extension: interactions-mof.csv:
occurrenceID | resourceRelationshipID | measurementType | measurementValue |
---|---|---|---|
occ_2 | flowerColor | purple | |
occ_1 | resrelat_1 | resourceCollected | nectar |
Other options
A Interaction extension
can extend the dwc:ResourceRelationship
to include geography information direct to the interactions (dwc:Location
class) and many others terms that are relevant to characterize an interaction.
Questions
- What is the "best" model for sharing plant-pollinator interactions?
- What are others advantages/disadvantages of each model?
- Could someone provide examples that fits each one of the models, and try to explain each one is the best for the purpose?
So, I would like to know the opinion of others about it.
thanks.
I would like to contribute an alternative model, but it may be a while before I can do so. I want to prepare the Darwin Core public review first. I hope to have that finished by 30 April.
Hi @tucotuco
I'm curious to know what you are thinking. Please, let me know if there is anything that I can help.
thks.
That missing one was exactly the one I was thinking of. It isn't as scary when just the tables that are involved are included. This one shows two complete examples in the one diagram.
I think we came to the conclusion that there is not really any such thing as an Occurrence/Taxon interaction, so that simplifies things as well.
Yes it isn't scary when only Event and ResRelat class are included in the model.... :-D
But, more complexity comes when we want to represent Interaction Outcomes which are results of multiple interactions. In that case, I think that a hierarchy of parent-child Events can be used to aggregate child Events (ie. interactions) to parent events (ie. Interactions Outcomes).
I will prepare an example to clarify this.
@tucotuco here a diagram with an example of parent-child Events:
The same example in tabular form:
eventID | parentEventID | eventDatte |
---|---|---|
eventID1 | 2021-05-05 09:12:00 | |
eventID2 | eventID1 | 2021-05-01 12:05:00 |
eventID3 | eventID1 | 2021-05-01 15:51:00 |
eventID | occurrenceID | sex | scientificName |
---|---|---|---|
eventID_2 | occ_1 | female | Xylocopa frontalis |
eventID_2 | occ_2 | hermaphrodite | Passiflora edulis |
eventID_3 | occ_3 | male | Xylocopa frontalis |
eventID_3 | occ_4 | hermaphrodite | Passiflora edulis |
eventID | measurementType | measurementValue |
---|---|---|
eventID_1 | fruitSet | 0.66 |
Translating, the diagram and the example are saying that the fruit set from two interactions between Xylocopa frontalis* and Passiflora edulis is 66%. This happens because, fruit set is not limited to the flowers of a specific individual plant. Instead, it can be a measurement of flowers from different individuals comprising multiple interactions.
What do you think? Parent-child Events should work for those cases?
thanks.
That looks like it should work in principle. I would about the eventDate for Event1. As a parent to Event2 and Event3, should it not reflect the time span for all of the child events, explicit or not? So, if there were only those two child Events, wouldn't the eventDate be a range that encompassed them? So, something like 2021-05-01T12:05:00Z/2021-05-01T15:51:00Z (using proper ISO8601 date formatting). The eventDate for the parent Event could presumably have a greater span than the children is contains if the parent Event meant something like a monitoring period within which the child Events occurred - something like 2021-05-01T00:00:00Z/2021-05-02T00:00:00Z. And of course Event1 would need a lot of other data to explain what it was really about.
Sorry for the simplicity of the examples... I didn't take too much attention to the dates... but you got the ideia. Yes, eventDate for Event1 should cover the date between event2 and event3. Then, the term dwc:measurementDeterminedDate
can be used to capture the date/time when the measurement was taken (fruit set).
A similar approach is being used by OBIS to group marine surveys at multiple scales/levels https://obis.org/manual/dataformat/.
I will commit some exemples of datasets that can be useful to validate the model.
thx.