Tampa Report double counts statewide cases due to data changing 2020-07-17
mrhellmann opened this issue · comments
It looks like the state data file data/covid-19-florida_arcgis_summary.csv
changed format on July 17th. An added county_1 entry for 'State' (all cases in the state?) was added.
Since then, the gtable (and associated plot) on the Tampa Report has been double counting state cases with the exception of the 17th when all previous state cases were added.
A possible solution for covid19-florida/docs/tampa/index.Rmd
~line 52, filter(county != "State") %>%
covid19-florida/docs/tampa/index.Rmd
Lines 49 to 60 in 86bba4a
tampa_test_summary <-
dash %>%
select(timestamp, county = county_1, t_positive_2, t_total, deaths, t_inconc, c_ed_yes) %>%
filter(county != "State") %>% ### <--- Here
group_by(day = floor_date(timestamp - hours(8), "day")) %>%
filter(timestamp == max(timestamp)) %>%
ungroup() %>%
select(-timestamp, timestamp = day) %>%
bind_rows(mutate(., county = "Florida")) %>%
filter(county %in% c("Hillsborough", "Pinellas", "Florida")) %>%
group_by(timestamp, county) %>%
summarize_all(sum) %>%
ungroup()
A simple filter for 'State' seemed to work for me locally.
Apologies for my lack of github know-how, great work on the project!
Thanks Matt! You're right, I caught the change when it happened last week, but I forgot to update the Tampa report. Thanks for letting me know!