Geo function in tidygeocoder resulting random NA for city, state address argument
pratham13 opened this issue · comments
Description
The geo function is resulting in random NA for known city, state address. This tidygeocoder::geo and geocode function was resulting in consistent lat and long data points for address arguments like geo("San Diego, CA") is passed individually. But when passed in a tibble this errors out.
I have tried to update the package. Re-install the package clear cache, clear workspace, restartR. Read all other issues on the Bug report page https://github.com/jessecambon/tidygeocoder/issues
Looks like there was a similar issue in Oct 2022
Steps to Reproduce
Include a small code example that someone else can run to reproduce the bug:
Example dataframe s1 with NA returned values
city_state lat long
<chr> <dbl> <dbl>
1 Costa Mesa, CA NA NA
2 Memphis, TN NA NA
3 Franklin, TN NA NA
4 Detroit, MI NA NA
5 Princeton, NJ NA NA
6 Capitol Heights, MD NA NA
If calculating the geo again on this data
s1 %>% filter(is.na(lat)) %>% pull(city_state) %>% map_dfr(geo)
address lat long
<chr> <dbl> <dbl>
1 Costa Mesa, CA NA NA
2 Memphis, TN 35.1 -90.1
3 Franklin, TN NA NA
4 Detroit, MI 42.3 -83.0
5 Princeton, NJ 40.3 -74.7
6 Capitol Heights, MD 38.9 -76.9
There are still NAs produced in this code.
But individually feed Costa Mesa, CA:
geo('Costa Mesa, CA')
Passing 1 address to the Nominatim single address geocoder
[===============================================================] 1/1 (100%) Elapsed: 2s Remaining: 0s
# A tibble: 1 x 3
address lat long
<chr> <dbl> <dbl>
1 Costa Mesa, CA 33.7 -118.
Works fine. Same is the issue with geocode function as well
- You can use the reprex package to help with this: https://www.tidyverse.org/help/
- The datapasta package can be useful for including data in the reproducible example (see the tribble_paste() function): https://milesmcbain.github.io/datapasta/
Environment
Post the results of devtools::session_info() :
R version 4.2.3 (2023-03-15 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] glue_1.6.2 RPostgres_1.4.5 aws.signature_0.6.0 aws.s3_0.3.21 sf_1.0-9
[6] geosphere_1.5-18 tm_0.7-11 NLP_0.2-1 progress_1.2.2 furrr_0.3.1
[11] future_1.32.0 clipr_0.8.0 openxlsx_4.2.5.2 tictoc_1.1 janitor_2.2.0
[16] readxl_1.4.2 lubridate_1.9.2 forcats_1.0.0 stringr_1.5.0 dplyr_1.1.0
[21] purrr_1.0.1 readr_2.1.4 tidyr_1.3.0 tibble_3.1.8 ggplot2_3.4.1
[26] tidyverse_2.0.0 tidygeocoder_1.0.5 data.table_1.14.8 pacman_0.5.1
loaded via a namespace (and not attached):
[1] httr_1.4.5 jsonlite_1.8.4 vroom_1.6.1 bit64_4.0.5 sp_1.6-0
[6] blob_1.2.3 cellranger_1.1.0 yaml_2.3.7 slam_0.1-50 globals_0.16.2
[11] pillar_1.8.1 lattice_0.20-45 digest_0.6.31 snakecase_0.11.0 colorspace_2.1-0
[16] pkgconfig_2.0.3 listenv_0.9.0 scales_1.2.1 tzdb_0.3.0 timechange_0.2.0
[21] proxy_0.4-27 generics_0.1.3 ellipsis_0.3.2 withr_2.5.0 cli_3.6.0
[26] magrittr_2.0.3 crayon_1.5.2 fansi_1.0.4 parallelly_1.34.0 xml2_1.3.3
[31] class_7.3-21 tools_4.2.3 prettyunits_1.1.1 hms_1.1.2 lifecycle_1.0.3
[36] munsell_0.5.0 zip_2.2.2 compiler_4.2.3 e1071_1.7-13 rlang_1.0.6
[41] classInt_0.4-9 units_0.8-1 grid_4.2.3 rstudioapi_0.14 base64enc_0.1-3
[46] gtable_0.3.1 codetools_0.2-19 DBI_1.1.3 curl_5.0.0 R6_2.5.1
[51] bit_4.0.5 utf8_1.2.3 KernSmooth_2.23-20 stringi_1.7.12 parallel_4.2.3
[56] Rcpp_1.0.10 vctrs_0.5.2 tidyselect_1.2.0
If you don't have devtools installed you can install it with install.packages("devtools")
Your Contribution
Do you see a way to resolve the issue? If so, mention this here.
When you are ready, you can open a pull request with a suggested bug fix and tag this issue. See the developers notes for help: https://jessecambon.github.io/tidygeocoder/articles/developer_notes.html
Hi @pratham13, see my reprex below. I'm getting results returned for these addresses. You can see if this code works for you.
library(tidygeocoder)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
df <- tibble::tribble(
~city_state,
"Costa Mesa, CA",
"Memphis, TN ",
"Franklin, TN",
"Detroit, MI",
"Princeton, NJ",
"Capitol Heights, MD"
)
# use geocode()
df_geocoded <- df %>%
geocode(address = city_state, method = 'osm')
#> Passing 6 addresses to the Nominatim single address geocoder
#> Query completed in: 9.8 seconds
print(df_geocoded)
#> # A tibble: 6 × 3
#> city_state lat long
#> <chr> <dbl> <dbl>
#> 1 "Costa Mesa, CA" 33.7 -118.
#> 2 "Memphis, TN " 35.1 -90.1
#> 3 "Franklin, TN" 35.9 -86.9
#> 4 "Detroit, MI" 42.3 -83.0
#> 5 "Princeton, NJ" 40.3 -74.7
#> 6 "Capitol Heights, MD" 38.9 -76.9
# use geo()
addresses_geocoded <- df %>% pull(city_state) %>%
geo(address=., method = 'osm')
#> Passing 6 addresses to the Nominatim single address geocoder
#> Query completed in: 10.2 seconds
print(addresses_geocoded)
#> # A tibble: 6 × 3
#> address lat long
#> <chr> <dbl> <dbl>
#> 1 Costa Mesa, CA 33.7 -118.
#> 2 Memphis, TN 35.1 -90.1
#> 3 Franklin, TN 35.9 -86.9
#> 4 Detroit, MI 42.3 -83.0
#> 5 Princeton, NJ 40.3 -74.7
#> 6 Capitol Heights, MD 38.9 -76.9
Created on 2023-04-10 with reprex v2.0.2
Hi @jessecambon there are still random errors popping here example below is the tibble I am passing to geocode:
# A tibble: 37 x 1
city_state
<chr>
1 El Segundo, CA
2 Normal, IL
3 Bellevue, WA
4 Brooklyn, NY
5 South San Francisco, CA
6 Plymouth, MI
7 Denver, CO
8 Salt Lake City, UT
9 Phoenix, AZ
10 West Sacramento, CA
11 Richmond, VA
12 Minneapolis, MN
13 Costa Mesa, CA
14 North Las Vegas, NV
15 Olathe, KS
16 San Diego, CA
17 Orlando, FL
18 Houston, TX
19 Chicago, IL
20 Cleveland, OH
21 Chelsea, MA
22 Atlanta, GA
23 Memphis, TN
24 Miami, FL
25 Dallas, TX
26 Austin, TX
27 Vancouver, BC
28 San Jose, CA
29 Franklin, TN
30 Cincinnati, OH
31 Detroit, MI
32 Portland, OR
33 Fenton, MO
34 Palmyra, NJ
35 Princeton, NJ
36 Tualatin, OR
37 Capitol Heights, MD
# Applying the exact code snippet from above
sc_nearby %>% select(city_state) %>% geocode(city_state, method = 'osm')
Passing 37 addresses to the Nominatim single address geocoder
[=============>------------------------------------] 10/37 ( 27%) Elapsed: 18s Remaining: 49sError: <html>
<head><title>502 Bad Gateway</title></head>
<body>
<center><h1>502 Bad Gateway</h1></center>
<hr>
[==================================================] 37/37 (100%) Elapsed: 1m Remaining: 0s
# A tibble: 37 x 3
city_state lat long
<chr> <dbl> <dbl>
1 El Segundo, CA 33.9 -118.
2 Normal, IL 40.5 -89.0
3 Bellevue, WA 47.6 -122.
4 Brooklyn, NY 40.7 -73.9
5 South San Francisco, CA 37.7 -122.
6 Plymouth, MI 45.0 -93.5
7 Denver, CO 39.7 -105.
8 Salt Lake City, UT 40.8 -112.
9 Phoenix, AZ 33.4 -112.
10 West Sacramento, CA 38.6 -122.
11 Richmond, VA NA NA
12 Minneapolis, MN 45.0 -93.3
13 Costa Mesa, CA 33.7 -118.
14 North Las Vegas, NV 36.2 -115.
15 Olathe, KS 38.9 -94.8
16 San Diego, CA 32.7 -117.
17 Orlando, FL 28.5 -81.4
18 Houston, TX 29.8 -95.4
19 Chicago, IL 41.9 -87.6
20 Cleveland, OH 41.5 -81.7
21 Chelsea, MA 42.4 -71.0
22 Atlanta, GA 33.7 -84.4
23 Memphis, TN 35.1 -90.1
24 Miami, FL 25.8 -80.2
25 Dallas, TX 32.8 -96.8
26 Austin, TX 30.3 -97.7
27 Vancouver, BC 49.3 -123.
28 San Jose, CA 37.3 -122.
29 Franklin, TN 35.9 -86.9
30 Cincinnati, OH 39.1 -84.5
31 Detroit, MI 42.3 -83.0
32 Portland, OR 45.5 -123.
33 Fenton, MO 38.5 -90.4
34 Palmyra, NJ 40.0 -75.0
35 Princeton, NJ 40.3 -74.7
36 Tualatin, OR 45.4 -123.
37 Capitol Heights, MD 38.9 -76.9
Warning message:
In query_api(api_url, api_query_parameters, method = method) :
Bad Gateway (HTTP 502).
# It can be seen that Richmond, VA is not getting mapped
Does it work if you run the query again? Based on the warning produced it could be the result of an unstable internet connection or a temporary issue on the Nominatim servers. You could also try a different service to see if that helps (ie. method = 'arcgis')