ropensci / CoordinateCleaner

Automated flagging of common spatial and temporal errors in biological and palaeontological collection data, for the use in conservation, ecology and palaeontology.

Home Page:https://docs.ropensci.org/CoordinateCleaner/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Some countries always flagged in cc_coun

JDiazCalafat opened this issue · comments

Hi,

First of all, this is a great package. Thanks a lot for the hard work!
I have been using this package for a while, but recently, I have encountered problems when flagging coordinates with the country test in the clean_coordinates() function. I have run this several times, and it appears that all coordinates within certain countries are flagged every time. Specifically, I have been working with a rather large GBIF dataset, and when running the country test, all records within Norway and France are flagged, regardless of the actual location of the coordinates (manual plotting indicates everything is okay). Not sure if this may also be happening for other countries.

Here's a subset of the data I have used (records for Norway and France):
exampledata2.csv. And (just in case it's needed) here's all the data that I have used simultaneously when running the clean_coordinates() function.

I am using the version 3.0.1 of CoordinateCleaner, and this is how I call the clean_coordinates() function:

geo_flags <- clean_coordinates(x = data,
                               lon = "decimalLongitude",
                               lat = "decimalLatitude",
                               countries = "countryCode",
                               tests = c("capitals", "centroids", "equal", 
                                         "institutions", "seas", "zeros", "countries"),
                                 capitals_rad = 1000)

My R session info:

R version 4.3.3 (2024-02-29 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 11 x64 (build 22621)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
system code page: 65001

time zone: Europe/Stockholm
tzcode source: internal

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] countrycode_1.5.0 CoordinateCleaner_3.0.1 lubridate_1.9.3
[4] forcats_1.0.0 stringr_1.5.1 dplyr_1.1.3
[7] purrr_1.0.2 readr_2.1.5 tidyr_1.3.1
[10] tibble_3.2.1 ggplot2_3.4.4 tidyverse_2.0.0
[13] rgbif_3.7.9

loaded via a namespace (and not attached):
[1] gtable_0.3.4 xfun_0.41 lattice_0.22-5
[4] tzdb_0.4.0 vctrs_0.6.4 tools_4.3.3
[7] generics_0.1.3 curl_5.2.1 proxy_0.4-27
[10] fansi_1.0.5 pkgconfig_2.0.3 KernSmooth_2.23-22
[13] data.table_1.15.2 lifecycle_1.0.4 compiler_4.3.3
[16] munsell_0.5.0 terra_1.7-55 codetools_0.2-19
[19] htmltools_0.5.7 class_7.3-22 yaml_2.3.8
[22] lazyeval_0.2.2 pillar_1.9.0 whisker_0.4.1
[25] classInt_0.4-10 rnaturalearthdata_1.0.0 tidyselect_1.2.0
[28] digest_0.6.33 stringi_1.8.3 sf_1.0-14
[31] fastmap_1.1.1 rnaturalearth_1.0.1 grid_4.3.3
[34] colorspace_2.1-0 cli_3.6.1 magrittr_2.0.3
[37] triebeard_0.4.1 crul_1.4.0 utf8_1.2.4
[40] e1071_1.7-13 withr_3.0.0 scales_1.3.0
[43] bit64_4.0.5 sp_2.1-1 oai_0.4.0
[46] timechange_0.3.0 rmarkdown_2.26 httr_1.4.7
[49] bit_4.0.5 hms_1.1.3 evaluate_0.23
[52] knitr_1.45 urltools_1.7.3 rlang_1.1.1
[55] Rcpp_1.0.11 httpcode_0.3.0 glue_1.6.2
[58] geosphere_1.5-18 DBI_1.2.2 xml2_1.3.6
[61] rstudioapi_0.15.0 jsonlite_1.8.7 R6_2.5.1
[64] plyr_1.8.9 units_0.8-4

Hi @JDiazCalafat, thanks for reporting this issue. There seems to be a difference in the rnaturalearth shapefile info in different operational systems, which exactly affect the code of these countries. The quick solution to this is to change the reference column by altering the country_refcol argument to "iso_a3_eh". Hope it helps.

geo_flags <- clean_coordinates(x = data,
                               lon = "decimalLongitude",
                               lat = "decimalLatitude",
                               countries = "countryCode",
                               tests = c("capitals", "centroids", "equal", 
                                         "institutions", "seas", "zeros", "countries"),
                                 capitals_rad = 1000,
                                 country_refcol = "iso_a3_eh")

Thanks, it works like this! :)