ropensci / CoordinateCleaner

Automated flagging of common spatial and temporal errors in biological and palaeontological collection data, for the use in conservation, ecology and palaeontology.

Home Page:https://docs.ropensci.org/CoordinateCleaner/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Filter for known defaults of coordinate uncertainty in meters

jhnwllr opened this issue · comments

There are several known default values for coordinate uncertainty in meters.

301 : Geolocate Default (often a country centroid)
3036 : Geolocate Default
999 : Default found in a few datasets (observations.org)
9999 : Large default

occurrence counts
630 353 -- 3036m
401 507 -- 301m
370 553 -- 999m
14 242 -- 9999m

I think CoordinateCleaner could have a function for these filtering these known defaults. I would be happy to make a PR for such a function...

gbif/pipelines#417

Hi John,

thanks for the excellent suggestion. I'll implement this for the next version. Two questions:

  • What do you suggest as default name for the column with the uncertainty in meters, since this will be user provided
  • My impression is that default values may also cause problems in other entry fields. For instance, the individualCount. What do you think about an option to flag those as well?

Thanks!!

I don't have any opinions about individualCount right now.

My assumption would be that there might be some default values there. GBIF has recently done a good job of trying to cleaning up that column. Since GBIF now has the occurrence_status field: https://www.gbif.org/occurrence/search?taxon_key=4689&occurrence_status=present

What do you suggest as default name for the column with the uncertainty in meters, since this will be user provided
I would name the issue or column something like "known_default_coordinate_uncertainty"