cdcepi / zika

Data repository of publicly available Zika data

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

date format inconsistent

chendaniely opened this issue · comments

some dates are in the 2016-02-20 format and others are in 2/27/2016 format

current problematic files:

cdc_data_commit <- '05e6c978330da18ee5902cceabeab742f54294f2'

files <- list.files(path = sprintf('data/zika-%s', cdc_data_commit),
                    pattern = '[0-9]{4}-[0-9]{2}-[0-9]{2}.csv$',
                    recursive = TRUE,
                    full.names = TRUE)

tables <- lapply(files, readr::read_csv)

not_dates <- c()
for(i in 1:length(tables)){
    print(i)
    print(class(tables[[i]]$report_date))
    if(class(tables[[i]]$report_date) != 'Date'){
        not_dates <- c(not_dates, i)
    }
}
files[not_dates]
 [1] "data/zika-03022e42828e69ce19b448d40fa806545368b348/Colombia/Municipality_Zika/data/Municipality_Zika_2016-02-27.csv"                        
 [2] "data/zika-03022e42828e69ce19b448d40fa806545368b348/Colombia/Municipality_Zika/data/Municipality_Zika_2016-03-05.csv"                        
 [3] "data/zika-03022e42828e69ce19b448d40fa806545368b348/Colombia/Municipality_Zika/data/Municipality_Zika_2016-03-12.csv"                        
 [4] "data/zika-03022e42828e69ce19b448d40fa806545368b348/Colombia/Municipality_Zika/data/Municipality_Zika_2016-03-19.csv"                        
 [5] "data/zika-03022e42828e69ce19b448d40fa806545368b348/Colombia/Municipality_Zika/data/Municipality_Zika_2016-03-26.csv"                        
 [6] "data/zika-03022e42828e69ce19b448d40fa806545368b348/Colombia/Municipality_Zika/data/Municipality_Zika_2016-04-02.csv"                        
 [7] "data/zika-03022e42828e69ce19b448d40fa806545368b348/Colombia/Municipality_Zika/data/Municipality_Zika_2016-04-09.csv"                        
 [8] "data/zika-03022e42828e69ce19b448d40fa806545368b348/Colombia/Municipality_Zika/data/Municipality_Zika_2016-04-16.csv"                        
 [9] "data/zika-03022e42828e69ce19b448d40fa806545368b348/Dominican_Republic/Epidemiological_Bulletin/data/Epidemiological_Bulletin-2016-03-26.csv"
[10] "data/zika-03022e42828e69ce19b448d40fa806545368b348/Dominican_Republic/Epidemiological_Bulletin/data/Epidemiological_Bulletin-2016-04-02.csv"
[11] "data/zika-03022e42828e69ce19b448d40fa806545368b348/Dominican_Republic/Epidemiological_Bulletin/data/Epidemiological_Bulletin-2016-04-09.csv"
[12] "data/zika-03022e42828e69ce19b448d40fa806545368b348/Dominican_Republic/Epidemiological_Bulletin/data/Epidemiological_Bulletin-2016-04-16.csv"
[13] "data/zika-03022e42828e69ce19b448d40fa806545368b348/Dominican_Republic/Epidemiological_Bulletin/data/Epidemiological_Bulletin-2016-04-23.csv"
[14] "data/zika-03022e42828e69ce19b448d40fa806545368b348/United_States/CDC_Report/data/CDC_Report-2016-04-06.csv"                                 

I'll fix them

do you want me to add it to the data validator?

That's ok, I was just going to add it now.
Thanks!

can you ping me when you fix it?

I'm going to add another validation script that stacks all the csv files.
It's the main way I'm finding errors.

It's a bit of code I run for a dashboard (that's still in it's alpha stages).
I'll just push upstream so everyone benefits.

done!
It would be great to add the stacked data validator.
Will work on adding the date format check in the smaller validator.