ropensci / visdat

Preliminary Exploratory Visualisation of Data

Home Page:https://docs.ropensci.org/visdat/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

vis gaps over an index

sa-lee opened this issue · comments

Hi Nick,

Not sure if this is the right place - so feel free to close if not.

Something that has come up in last communicating with dat class -
when students are using vis_dat() for a time series data they often find there's no missing values over observations but aren't aware that there could be missings over time / an index variable.

Is there any plans to incorporate a vis that looks at gaps or runs? I know you had some stuff going in with Earo but not sure where that is at.

Hey Stuart,

Yes! naniar has miss_var_run and gg_miss_span, does that sort of do what you're looking for?

library(naniar)

miss_var_run(pedestrian, hourly_counts)
#> # A tibble: 35 x 2
#>    run_length is_na   
#>         <int> <chr>   
#>  1       6628 complete
#>  2          1 missing 
#>  3       5250 complete
#>  4        624 missing 
#>  5       3652 complete
#>  6          1 missing 
#>  7       1290 complete
#>  8        744 missing 
#>  9       7420 complete
#> 10          1 missing 
#> # … with 25 more rows

library(ggplot2)

# explore the number of missings in a given run
miss_var_run(pedestrian, hourly_counts) %>%
  filter(is_na == "missing") %>%
  count(run_length) %>%
  ggplot(aes(x = run_length,
             y = n)) +
  geom_col()
#> Error in is_na == "missing": comparison (1) is possible only for atomic and list types

# look at the number of missing values and the run length of these.
miss_var_run(pedestrian, hourly_counts) %>%
  ggplot(aes(x = is_na,
             y = run_length)) +
  geom_boxplot()

gg_miss_span(pedestrian, hourly_counts, span_every = 3000)

Created on 2020-11-16 by the reprex package (v0.3.0)

yes, thanks!