AtlasOfLivingAustralia / biocache-store

Occurrence processing, indexing and batch processing

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Problem with processing of iNaturalist taxonomic entries

Mesibov opened this issue · comments

I downloaded the 638655-record iNaturalist dataset and found 3 fields with full scientific names:
"scientificName" (here called 1)
"scientificName" (here called 2)
"species" (here called 3)
If I can believe the headings.csv that came with the records, the first is raw scientificName, the second is ALA-processed scientificName and the 3rd is "The species the Atlas has matched this record to in the NSL".

As expected there are a few records in (1) with (2) and (3) blank. Unexpectedly, there are also many records with (1) blank but with (2) and (3) filled. Looking at just one of these, the taxonomy gets weird:
https://biocache.ala.org.au/occurrences/9e4c4726-068c-4860-b142-4ff43ba9fa57

The "original vs processed" tables says that iNaturalist actually did supply a raw ID, namely

Plantae|Tracheophyta|Magnoliopsida|Fabales|Fabaceae|Acacia|Acacia ampliata

Nothing wrong there, but the ALA-processed classification replaces "Tracheophyta" with "Charophyta" (green algae), and "Magnoliopsida" with "Equisetopsida" (horsetails).

I haven't looked at any more of the -/2/3 records. Is this a processing failure? How to explain the phylum and class errors?

Just on the difference in higher classification, ALA's source for Acacia ampliata is here:

https://biodiversity.org.au/nsl/services/rest/node/apni/2919087

which has:

Plantae / Charophyta / Equisetopsida / Magnoliidae / Rosanae / Fabales / Fabaceae / Acacia /Acacia ampliata

That explains the errors. Will ALA be querying APC about this? Their hierarchy is defective.
My download was https://doi.org/10.26197/5d9c2f72356e8

Thanks @Mesibov. Ive passed that question to the NSL.