United Kingdom has duplicate dates
timriffe opened this issue · comments
Thanks for this great package!
I'm seeing 11 values per date/type in the United Kingdom:
library(tidyverse)
# devtools::install_github("RamiKrispin/coronavirus")
library(coronavirus)
data("coronavirus")
coronavirus %>%
filter(type == "confirmed",
country == "United Kingdom") %>%
dplyr::pull(date) %>%
table()
# it's a bunch of 11s
I guess these are nations like Scotland, England, Wales, etc. If so, maybe country
could be differentiated somehow, such as "United Kingdom, Scotland"
(or the other way around). Otherwise, one can only infer which series is which based on values, which is pretty risky for analyses.
Thanks!
Hi @timriffe,
I think that the dplyr::pull(date)
on your code is just pulling the date column. In the case of the UK, province
column provides information about the UK territories:
uk <- coronavirus %>%
dplyr::filter(type == "confirmed",
country == "United Kingdom")
unique(uk$province)
[1] "" "Anguilla"
[3] "Bermuda" "British Virgin Islands"
[5] "Cayman Islands" "Channel Islands"
[7] "Falkland Islands (Malvinas)" "Gibraltar"
[9] "Isle of Man" "Montserrat"
[11] "Turks and Caicos Islands"
Where the rows without province are the main countries (e.g., England, Scotland, etc), grouped:
coronavirus %>%
dplyr::filter(type == "confirmed",
country == "United Kingdom",
province == "") %>%
tail()
date province country lat long type cases
349 2021-01-04 United Kingdom 55.3781 -3.436 confirmed 58784
350 2021-01-05 United Kingdom 55.3781 -3.436 confirmed 60916
351 2021-01-06 United Kingdom 55.3781 -3.436 confirmed 62322
352 2021-01-07 United Kingdom 55.3781 -3.436 confirmed 52618
353 2021-01-08 United Kingdom 55.3781 -3.436 confirmed 68053
354 2021-01-09 United Kingdom 55.3781 -3.436 confirmed 59937
Unfortunately, this dataset does not have country-level data for the UK.
Does it help?