yutannihilation / gghighlight

Highlight points and lines in ggplot2

Home Page:https://yutannihilation.github.io/gghighlight/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

gghighlight doesn't seem to play well with `tsibble`

njtierney opened this issue · comments

First up - thanks for making such a great package!

Well designed and slick, this is a thing of beauty.

I'm actually getting a bug where gghighlight doens't seem to play well with tsibble data.

I have a reprex below, where in the first instance I have a tsibble, and in the second instance, I convert it to a tibble and get a very different plot.

I'm not sure if I should have maybe posted this under tsibble (cc @earowang), but I am happy to move it there.

library(ggplot2)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(gghighlight)
library(brolgar)

wages_ts %>%
  group_by(id) %>%
  mutate(increase = increasing(ln_wages)) %>%
  ggplot(aes(x = xp,
             y = ln_wages,
             group = id)) + 
  geom_line() + 
  gghighlight(increase)
#> Warning: Unspecified temporal ordering may yield unexpected results.
#> Suggest to sort by `id`, `xp` first.

#> Warning: Unspecified temporal ordering may yield unexpected results.
#> Suggest to sort by `id`, `xp` first.

wages_ts %>%
  group_by(id) %>%
  mutate(increase = increasing(ln_wages)) %>%
  as_tibble() %>%
  ggplot(aes(x = xp,
             y = ln_wages,
             group = id)) + 
  geom_line() + 
  gghighlight(increase)

Created on 2019-07-18 by the reprex package (v0.3.0)

Hmm, it seems gghighlight needs to be aware of indices that tsibble uses. Thanks for this good chance to learn about tsibble...

This warning is meant for interactive analysis such as lag(). It can be safely ignored for plotting, which is probably triggered by dplyr::arrange(). Unfortunately I cannot disable the warning on my side for plotting functions. Alternatively you could as.data.frame() the object up the front to ignore the tsibble object?

Oh, thanks. But, I don't think this is just about warnings...

@njtierney
Btw, which plot is the desired result? The second one? Or are both wrong...?

haven’t noticed the plots difference. i’ll probably look into the issue too.

Thanks for looking into this, @yutannihilation and @earowang, The second plot is the correct result.

Thanks! Then, the problem now seems simpler than I thought.

It seems dplyr::select() errors here with this message (and this is ignored by tryCatch()):

Error: The result is not a valid tsibble.
Do you need `as_tibble()` to work with data frame?

gghighlight uses only the parts of data and doesn't care about whether it's valid as the original class, so probably it needs to convert the data into tibble or a bare data.frame before calculating the predicates.

I will fix this when I got some time (as I'm not immediately sure where to insert as_tibble() (or fortify()?)...). Meanwhile, please use as_tibble() or as.data.frame() as a workaround.

A quick workaround could be using [ to replace dplyr::select(), which is silently converted to a tibble.

This also raises another issue that more and more specialist data classes built on top of tibbles with their associated dplyr methods may give contextual errors or warnings for the package code that is context-free (i.e. the package handles a general tabular data input).

Hi @yutannihilation - did you want a hand with a PR for this? I think @earowang 's point of using [ instead of dplyr::select() would work well for this.

Thanks for the kind reminder! But, I'm still thinking what should happen here. [ seems OK for tsibble, but I'm not fully sure if it's really OK for other data.frame-alikes (e.g. sf).

Let me think a bit more. Nowadays I have more time on coding R, so I expect it won't be so long until I come back to gghighlight :)

No worries - and fair enough! I'm not sure what happens with sf for [. As Earo described, it's an interesting issue with building on top of tibbles.

Thanks again for making such a lovely package :)

To handle this properly, I think I need to consider these cases:

  1. The object should be passed to ggplot2 as the original class, and it's fine to process them as it is (e.g. sf)
  2. The object should be passed to ggplot2 as the original class, and it can't be processed without converting to a bare data.frame
  3. The object can be passed to ggplot2 as either the original class or a bare data.frame, but it can't be processed without some conversion (e.g. tsibble)

While case 1 and case 3 are straightforward, case 2 seems difficult. Probably, I need to wait for vctrs::vec_proxy() and vctrs::vec_restore().

But, I now feel [ can handle most of the cases; it returns sf for sf, and data.frame for invalid tsibble. Let's employ [ for now and reconsider when I find other problems... Thanks for the suggestion, Earo.

@njtierney
Sorry, I found I don't understand the intension of your code. I thought #120 fixes this issue, but it won't...

Do you try to highlight lines whose trends are always increasing? If so, increase is not a proper condition to filter the data by group (so the actual filtering is done by point) and you should write

wages %>%
  ggplot(aes(x = xp,
             y = ln_wages,
             group = id)) + 
  geom_line() +
  gghighlight(increasing(ln_wages))

and the code above highlights nothing because there are no such lines. Is this what you wanted?

library(dplyr, warn.conflicts = FALSE)
library(brolgar)

wages %>%
  group_by(id) %>%
  summarise(increase = increasing(ln_wages)) %>%
  filter(increase)
#> # A tsibble: 0 x 3 [?]
#> # Key:       id [0]
#> # … with 3 variables: id <int>, xp <dbl>, increase <lgl>

Created on 2019-10-20 by the reprex package (v0.3.0)

(I'm confused partly because of this difference about summarise()...)

library(dplyr, warn.conflicts = FALSE)
library(brolgar)

d <- wages %>%
  group_by(id) %>%
  mutate(increase = increasing(ln_wages))

d %>% 
  summarise(increase)
#> # A tsibble: 6,402 x 3 [!]
#> # Key:       id [888]
#>       id    xp increase
#>    <int> <dbl> <lgl>   
#>  1    31 0.015 FALSE   
#>  2    31 0.715 FALSE   
#>  3    31 1.73  FALSE   
#>  4    31 2.77  FALSE   
#>  5    31 3.93  FALSE   
#>  6    31 4.95  FALSE   
#>  7    31 5.96  FALSE   
#>  8    31 6.98  FALSE   
#>  9    36 0.315 FALSE   
#> 10    36 0.983 FALSE   
#> # … with 6,392 more rows

d %>%
  as_tibble() %>%
  group_by(id) %>%
  summarise(increase)
#> Error: Column `increase` must be length 1 (a summary value), not 8

Created on 2019-10-20 by the reprex package (v0.3.0)

Example from http://brolgar.njtierney.com/articles/exploratory-modelling.html:

library(brolgar)
library(gghighlight)
#> Loading required package: ggplot2
library(dplyr, warn.conflicts = FALSE)

wages_slope <- key_slope(wages,ln_wages ~ xp) %>%
  left_join(wages, by = "id") 

# with `as_tibble()` workaround
p1 <- wages_slope %>% 
  as_tibble() %>% # workaround for gghighlight + tsibble
  ggplot(aes(x = xp, 
             y = ln_wages, 
             group = id)) + 
  geom_line() +
  gghighlight(.slope_xp < 0, use_direct_label = FALSE)
#> Warning: Tried to calculate with group_by(), but the calculation failed.
#> Falling back to ungrouped filter operation...

# without `as_tibble()`
p2 <- wages_slope %>% 
  ggplot(aes(x = xp, 
             y = ln_wages, 
             group = id)) + 
  geom_line() +
  gghighlight(.slope_xp < 0, use_direct_label = FALSE)
#> Warning: Tried to calculate with group_by(), but the calculation failed.
#> Falling back to ungrouped filter operation...

patchwork::wrap_plots(p1, p2)

Created on 2020-01-24 by the reprex package (v0.3.0)

I'm still not sure if the result is expected one (shouldn't this be highlighted group-wise?), but now the result is the same with or without as_tibble()...