mountainMath / cancensus

R wrapper for calling CensusMapper APIs

Home Page:https://mountainmath.github.io/cancensus/index.html

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Encoding issue with French names

SimonCoulombe opened this issue · comments

French characters are not imported correctly. Here's an example:

data <- get_census(dataset='CA16', regions=list(PR="24"), 
  vectors=c("pop2016"= "v_CA16_401"),
  level='CSD',geo_format="sf") 
data %>% filter(GeoUID == 2401023) %>% pull(name)
[1] "Les Îles-de-la-Madeleine (M�)"

@SimonCoulombe sorry for the slow reply on this one, but traced this error to an encoding issue in Jens' Censusmapper data lake conversion of source STC data. Will look to fix but can't promise an ETA on this. I'm also thinking we may want to adjust how we supply the urbanity type data (the codes that are in the parentheses).

no worries -- I think STC data is often Windows-1252 encoding.

Was this only an issue for CSDs or did you notice this for any other level of geography too @SimonCoulombe? It should be fixed now for 2016 CSDs. You will have to refresh your cache by passing the use_cache=TRUE if you have your call cached locally.

hi!
It was still broken when using use_cache= TRUE, but using use_cache = FALSE (which I guess you meant) fixes it.

It apparently is still broken at the CD level.


> data <- get_census(dataset='CA16', regions=list(PR="24"), 
+                    vectors=c("pop2016"= "v_CA16_401"),
+                    level='CSD',geo_format="sf", use_cache = FALSE) 
Querying CensusMapper API...
Downloading: 33 kB     Querying CensusMapper API...
Downloading: 2.9 MB       
> data %>% filter(GeoUID == 2401023) %>% pull(name)
[1] "Les Îles-de-la-Madeleine (MÉ)"


> data <- get_census(dataset='CA16', regions=list(PR="24"), 
+                    vectors=c("pop2016"= "v_CA16_401"),
+                    level='CD',geo_format="sf", use_cache = FALSE) 
Querying CensusMapper API...
Downloading: 3.2 kB     Querying CensusMapper API...
Downloading: 140 kB     > data %>% filter(GeoUID == 2401) %>% pull(name)
[1] "Les Îles-de-la-Madeleine (T�)"

Yes, use_cache=FALSE is what I meant. Thanks for checking, CD level should be fixed too now.