jessecambon / tidygeocoder

Geocoding Made Easy

Home Page:https://jessecambon.github.io/tidygeocoder

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Geo function in tidygeocoder resulting random NA for city, state address argument

pratham13 opened this issue · comments

Description

The geo function is resulting in random NA for known city, state address. This tidygeocoder::geo and geocode function was resulting in consistent lat and long data points for address arguments like geo("San Diego, CA") is passed individually. But when passed in a tibble this errors out.

I have tried to update the package. Re-install the package clear cache, clear workspace, restartR. Read all other issues on the Bug report page https://github.com/jessecambon/tidygeocoder/issues

Looks like there was a similar issue in Oct 2022

Steps to Reproduce

Include a small code example that someone else can run to reproduce the bug:

Example dataframe s1 with NA returned values

  city_state            lat  long
  <chr>               <dbl> <dbl>
1 Costa Mesa, CA         NA    NA
2 Memphis, TN            NA    NA
3 Franklin, TN           NA    NA
4 Detroit, MI            NA    NA
5 Princeton, NJ          NA    NA
6 Capitol Heights, MD    NA    NA

If calculating the geo again on this data

s1 %>% filter(is.na(lat)) %>%  pull(city_state) %>%  map_dfr(geo)

  address               lat  long
  <chr>               <dbl> <dbl>
1 Costa Mesa, CA       NA    NA  
2 Memphis, TN          35.1 -90.1
3 Franklin, TN         NA    NA  
4 Detroit, MI          42.3 -83.0
5 Princeton, NJ        40.3 -74.7
6 Capitol Heights, MD  38.9 -76.9

There are still NAs produced in this code.

But individually feed Costa Mesa, CA:

geo('Costa Mesa, CA')
Passing 1 address to the Nominatim single address geocoder
[===============================================================] 1/1 (100%) Elapsed:  2s Remaining:  0s
# A tibble: 1 x 3
  address          lat  long
  <chr>          <dbl> <dbl>
1 Costa Mesa, CA  33.7 -118.

Works fine. Same is the issue with geocode function as well

Environment

Post the results of devtools::session_info() :

R version 4.2.3 (2023-03-15 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] glue_1.6.2 RPostgres_1.4.5 aws.signature_0.6.0 aws.s3_0.3.21 sf_1.0-9
[6] geosphere_1.5-18 tm_0.7-11 NLP_0.2-1 progress_1.2.2 furrr_0.3.1
[11] future_1.32.0 clipr_0.8.0 openxlsx_4.2.5.2 tictoc_1.1 janitor_2.2.0
[16] readxl_1.4.2 lubridate_1.9.2 forcats_1.0.0 stringr_1.5.0 dplyr_1.1.0
[21] purrr_1.0.1 readr_2.1.4 tidyr_1.3.0 tibble_3.1.8 ggplot2_3.4.1
[26] tidyverse_2.0.0 tidygeocoder_1.0.5 data.table_1.14.8 pacman_0.5.1

loaded via a namespace (and not attached):
[1] httr_1.4.5 jsonlite_1.8.4 vroom_1.6.1 bit64_4.0.5 sp_1.6-0
[6] blob_1.2.3 cellranger_1.1.0 yaml_2.3.7 slam_0.1-50 globals_0.16.2
[11] pillar_1.8.1 lattice_0.20-45 digest_0.6.31 snakecase_0.11.0 colorspace_2.1-0
[16] pkgconfig_2.0.3 listenv_0.9.0 scales_1.2.1 tzdb_0.3.0 timechange_0.2.0
[21] proxy_0.4-27 generics_0.1.3 ellipsis_0.3.2 withr_2.5.0 cli_3.6.0
[26] magrittr_2.0.3 crayon_1.5.2 fansi_1.0.4 parallelly_1.34.0 xml2_1.3.3
[31] class_7.3-21 tools_4.2.3 prettyunits_1.1.1 hms_1.1.2 lifecycle_1.0.3
[36] munsell_0.5.0 zip_2.2.2 compiler_4.2.3 e1071_1.7-13 rlang_1.0.6
[41] classInt_0.4-9 units_0.8-1 grid_4.2.3 rstudioapi_0.14 base64enc_0.1-3
[46] gtable_0.3.1 codetools_0.2-19 DBI_1.1.3 curl_5.0.0 R6_2.5.1
[51] bit_4.0.5 utf8_1.2.3 KernSmooth_2.23-20 stringi_1.7.12 parallel_4.2.3
[56] Rcpp_1.0.10 vctrs_0.5.2 tidyselect_1.2.0

If you don't have devtools installed you can install it with install.packages("devtools")

Your Contribution

Do you see a way to resolve the issue? If so, mention this here.

When you are ready, you can open a pull request with a suggested bug fix and tag this issue. See the developers notes for help: https://jessecambon.github.io/tidygeocoder/articles/developer_notes.html

Hi @pratham13, see my reprex below. I'm getting results returned for these addresses. You can see if this code works for you.

library(tidygeocoder)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

df <- tibble::tribble(
  ~city_state,
  "Costa Mesa, CA",
  "Memphis, TN ",
  "Franklin, TN",
  "Detroit, MI",
  "Princeton, NJ",
  "Capitol Heights, MD"
)


# use geocode()
df_geocoded <- df %>%
  geocode(address = city_state, method = 'osm')
#> Passing 6 addresses to the Nominatim single address geocoder
#> Query completed in: 9.8 seconds

print(df_geocoded)
#> # A tibble: 6 × 3
#>   city_state              lat   long
#>   <chr>                 <dbl>  <dbl>
#> 1 "Costa Mesa, CA"       33.7 -118. 
#> 2 "Memphis, TN "         35.1  -90.1
#> 3 "Franklin, TN"         35.9  -86.9
#> 4 "Detroit, MI"          42.3  -83.0
#> 5 "Princeton, NJ"        40.3  -74.7
#> 6 "Capitol Heights, MD"  38.9  -76.9

# use geo()
addresses_geocoded <- df %>% pull(city_state) %>%
  geo(address=., method = 'osm')
#> Passing 6 addresses to the Nominatim single address geocoder
#> Query completed in: 10.2 seconds

print(addresses_geocoded)
#> # A tibble: 6 × 3
#>   address               lat   long
#>   <chr>               <dbl>  <dbl>
#> 1 Costa Mesa, CA       33.7 -118. 
#> 2 Memphis, TN          35.1  -90.1
#> 3 Franklin, TN         35.9  -86.9
#> 4 Detroit, MI          42.3  -83.0
#> 5 Princeton, NJ        40.3  -74.7
#> 6 Capitol Heights, MD  38.9  -76.9

Created on 2023-04-10 with reprex v2.0.2

Hi @jessecambon there are still random errors popping here example below is the tibble I am passing to geocode:

# A tibble: 37 x 1
   city_state             
   <chr>                  
 1 El Segundo, CA         
 2 Normal, IL             
 3 Bellevue, WA           
 4 Brooklyn, NY           
 5 South San Francisco, CA
 6 Plymouth, MI           
 7 Denver, CO             
 8 Salt Lake City, UT     
 9 Phoenix, AZ            
10 West Sacramento, CA    
11 Richmond, VA           
12 Minneapolis, MN        
13 Costa Mesa, CA         
14 North Las Vegas, NV    
15 Olathe, KS             
16 San Diego, CA          
17 Orlando, FL            
18 Houston, TX            
19 Chicago, IL            
20 Cleveland, OH          
21 Chelsea, MA            
22 Atlanta, GA            
23 Memphis, TN            
24 Miami, FL              
25 Dallas, TX             
26 Austin, TX             
27 Vancouver, BC          
28 San Jose, CA           
29 Franklin, TN           
30 Cincinnati, OH         
31 Detroit, MI            
32 Portland, OR           
33 Fenton, MO             
34 Palmyra, NJ            
35 Princeton, NJ          
36 Tualatin, OR           
37 Capitol Heights, MD    

# Applying the exact code snippet from above

sc_nearby %>% select(city_state) %>% geocode(city_state, method = 'osm')
Passing 37 addresses to the Nominatim single address geocoder
[=============>------------------------------------] 10/37 ( 27%) Elapsed: 18s Remaining: 49sError: <html>
<head><title>502 Bad Gateway</title></head>
<body>
<center><h1>502 Bad Gateway</h1></center>
<hr>
[==================================================] 37/37 (100%) Elapsed:  1m Remaining:  0s
# A tibble: 37 x 3
   city_state                lat   long
   <chr>                   <dbl>  <dbl>
 1 El Segundo, CA           33.9 -118. 
 2 Normal, IL               40.5  -89.0
 3 Bellevue, WA             47.6 -122. 
 4 Brooklyn, NY             40.7  -73.9
 5 South San Francisco, CA  37.7 -122. 
 6 Plymouth, MI             45.0  -93.5
 7 Denver, CO               39.7 -105. 
 8 Salt Lake City, UT       40.8 -112. 
 9 Phoenix, AZ              33.4 -112. 
10 West Sacramento, CA      38.6 -122. 
11 Richmond, VA             NA     NA  
12 Minneapolis, MN          45.0  -93.3
13 Costa Mesa, CA           33.7 -118. 
14 North Las Vegas, NV      36.2 -115. 
15 Olathe, KS               38.9  -94.8
16 San Diego, CA            32.7 -117. 
17 Orlando, FL              28.5  -81.4
18 Houston, TX              29.8  -95.4
19 Chicago, IL              41.9  -87.6
20 Cleveland, OH            41.5  -81.7
21 Chelsea, MA              42.4  -71.0
22 Atlanta, GA              33.7  -84.4
23 Memphis, TN              35.1  -90.1
24 Miami, FL                25.8  -80.2
25 Dallas, TX               32.8  -96.8
26 Austin, TX               30.3  -97.7
27 Vancouver, BC            49.3 -123. 
28 San Jose, CA             37.3 -122. 
29 Franklin, TN             35.9  -86.9
30 Cincinnati, OH           39.1  -84.5
31 Detroit, MI              42.3  -83.0
32 Portland, OR             45.5 -123. 
33 Fenton, MO               38.5  -90.4
34 Palmyra, NJ              40.0  -75.0
35 Princeton, NJ            40.3  -74.7
36 Tualatin, OR             45.4 -123. 
37 Capitol Heights, MD      38.9  -76.9
Warning message:
In query_api(api_url, api_query_parameters, method = method) :
  Bad Gateway (HTTP 502).

# It can be seen that Richmond, VA is not getting mapped

Does it work if you run the query again? Based on the warning produced it could be the result of an unstable internet connection or a temporary issue on the Nominatim servers. You could also try a different service to see if that helps (ie. method = 'arcgis')