mjskay / ggdist

Visualizations of distributions and uncertainty

Home Page:https://mjskay.github.io/ggdist/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Bug in `geom_lineribbon()` fill layer order

mccarthy-m-g opened this issue · comments

I'm trying to plot a survival curve with multiple intervals (66% and 95%) but ran into a bug with the fill scale when using geom_lineribbon(). Basically I can't plot both intervals in the darker to lighter fill order typical of ggdist. Instead I can either:

  • Plot the 95% interval in the correct colour, but the 66% interval is hidden in a lower layer (if you set alpha you can see it under the 95% interval)
  • Plot both intervals, but the fill colours are going in the wrong order (lighter to darker)

Here's a reprex:

library(survival)
library(tidyverse)
library(broom)
library(ggdist)

veterans_fit_95 <- survfit(
  Surv(time, status) ~ 1, data = veteran, conf.int = 0.95
)
veterans_fit_66 <- update(veterans_fit_95, conf.int = 0.66)

veterans_tidy <- list(`0.66` = veterans_fit_66, `0.95` = veterans_fit_95) |>
  map_df(tidy, .id = ".width") |>
  mutate(.width = as.numeric(.width))

p <- ggplot(
  veterans_tidy, aes(x = time, y = estimate, ymin = conf.low, ymax = conf.high)
) +
  geom_lineribbon(step = "hv")

# Only the 95% interval is shown
p + scale_fill_brewer(direction = 1)

# If you change the fill direction then both intervals are shown (but the colour
# order is reversed so this isn't a great workaround)
p + scale_fill_brewer(direction = -1)

# This also happens with manual fill
p + scale_fill_manual(values = c("#DEEBF7", "#9ECAE1")) # Doesn't work

p + scale_fill_manual(values = c("#9ECAE1", "#DEEBF7")) # Does work

Created on 2023-03-01 with reprex v2.0.2

Here's another reprex with multiple survival curves in one plot. Adding just in case it's helpful for additional debugging.

library(survival)
library(tidyverse)
library(broom)
library(ggdist)

# Helper function for multiple intervals
dist_tidy <- function(x, conf.type = "log-log", conf.int = 0.95, ...) {
  purrr::map_df(
    purrr::set_names(conf.int),
    function(.x) broom::tidy(update(x, conf.type = conf.type, conf.int = .x)),
    .id = ".width"
  ) |>
    dplyr::mutate(.width = as.numeric(.width)) |>
    dplyr::relocate(.width, .after = dplyr::everything())
}

veterans_fit <- survfit(Surv(time, status) ~ trt, data = veteran)
veterans_fit_tidy <- dist_tidy(veterans_fit, conf.int = c(0.66, 0.95))

p <- ggplot(
  veterans_fit_tidy,
  aes(
    x = time, y = estimate,
    ymin = conf.low, ymax = conf.high,
    colour = strata,
    fill = strata, fill_ramp = factor(.width)
  )
) +
  scale_colour_brewer(palette = "Dark2") +
  scale_fill_brewer(palette = "Set2")

# Interval fills still in the wrong layer order (lighter to darker)
p + geom_lineribbon(step = "hv", alpha = 0.8)

# Changing the from colour inconsistently changes the fill layer order. Now the
# 66% interval for trt=2 is below the 95% interval. the Fill layer order for
# trt=1 is fine though...
p + geom_lineribbon(step = "hv") + scale_fill_ramp_discrete(from = "gray50")

Created on 2023-03-01 with reprex v2.0.2

Hmm yeah, something weird going on here, likely in the heuristics lineribbon uses to pick draw order. They should be handling this case and I'm not sure why they aren't.

Does make me think that adding something to make draw order explicitly controllable might be a better solution than the current approach anyway, which has always been a hack. I did that recently with geom_dots by allowing draw order to be set using the order aesthetic, maybe I'll do that here too.

Yeah, I was surprised by it. I did a bit more digging and it seems like the problem is caused by CIs that have NA values (see reprex below).

Having the option for an explicit order aesthetic might be nice. Would that make it possible to put all the ribbons from a group above all the ribbons from a different group, or would it just be for the .width level order?


library(survival)
library(tidyverse)
library(broom)
library(ggdist)

# Helper function for multiple intervals
dist_tidy <- function(x, conf.type = "log-log", conf.int = 0.95, ...) {
  purrr::map_df(
    purrr::set_names(conf.int),
    function(.x) broom::tidy(update(x, conf.type = conf.type, conf.int = .x)),
    .id = ".width"
  ) |>
    dplyr::mutate(.width = as.numeric(.width)) |>
    dplyr::relocate(.width, .after = dplyr::everything())
}

# Null model ----
veterans_fit_1 <- survfit(Surv(time, status) ~ 1, data = veteran)
veterans_fit_1_tidy <- dist_tidy(veterans_fit_1, conf.int = c(0.66, 0.95))

# The last time always has an estimate of 0 and CIs of NA. Omitting these rows
# seems to fix things in null model reprex.
tail(arrange(veterans_fit_1_tidy, desc(estimate)), 2)
#> # A tibble: 2 × 9
#>    time n.risk n.event n.censor estimate std.error conf.high conf.low .width
#>   <dbl>  <dbl>   <dbl>    <dbl>    <dbl>     <dbl>     <dbl>    <dbl>  <dbl>
#> 1   999      1       1        0        0       Inf        NA       NA   0.66
#> 2   999      1       1        0        0       Inf        NA       NA   0.95

# Interval fills are ordered correctly
ggplot(
  na.omit(veterans_fit_1_tidy),
  aes(x = time, y = estimate, ymin = conf.low, ymax = conf.high)
) +
  geom_lineribbon(step = "hv") +
  scale_fill_brewer()

# Group model ----
veterans_fit_2 <- survfit(Surv(time, status) ~ trt, data = veteran)
veterans_fit_2_tidy <- dist_tidy(veterans_fit_2, conf.int = c(0.66, 0.95))

# In the two group example the last time for each group has an estimate of 0
# and CIs of NA. Omitting these rows seems to fix most things but not all things.
tail(arrange(veterans_fit_2_tidy, desc(estimate)), 4)
#> # A tibble: 4 × 10
#>    time n.risk n.event n.censor estimate std.error conf.…¹ conf.…² strata .width
#>   <dbl>  <dbl>   <dbl>    <dbl>    <dbl>     <dbl>   <dbl>   <dbl> <chr>   <dbl>
#> 1   553      1       1        0        0       Inf      NA      NA trt=1    0.66
#> 2   999      1       1        0        0       Inf      NA      NA trt=2    0.66
#> 3   553      1       1        0        0       Inf      NA      NA trt=1    0.95
#> 4   999      1       1        0        0       Inf      NA      NA trt=2    0.95
#> # … with abbreviated variable names ¹​conf.high, ²​conf.low

# Not the desired plot, but the within-group interval fills are ordered
# correctly. The layers are intermingling between groups, but this happens in
# the ggdist vignettes too so that's expected behaviour.
ggplot(
  na.omit(veterans_fit_2_tidy),
  aes(x = time, y = estimate, ymin = conf.low, ymax = conf.high, colour = strata)
) +
  geom_lineribbon(step = "hv") +
  scale_fill_brewer()

# However only the 95% interval seems to be shown if you replace the colour
# aesthetic with the fill aesthetic.
ggplot(
  na.omit(veterans_fit_2_tidy),
  aes(x = time, y = estimate, ymin = conf.low, ymax = conf.high, fill = strata)
) +
  geom_lineribbon(step = "hv", alpha = 1/4) +
  theme_dark()

# Adding fill_ramp gets us the desired result as long as the factor levels for
# width go from high to low.
ggplot(
  na.omit(veterans_fit_2_tidy),
  aes(x = time, y = estimate, ymin = conf.low, ymax = conf.high, fill = strata)
) +
  geom_lineribbon(
    aes(fill_ramp = factor(.width, levels = c(0.95, 0.66))),
    step = "hv"
  )

Created on 2023-03-02 with reprex v2.0.2

Yeah, I was surprised by it. I did a bit more digging and it seems like the problem is caused by CIs that have NA values (see reprex below).

Ah makes sense! I might adjust the logic to better account for NAs to handle this case then.

Having the option for an explicit order aesthetic might be nice. Would that make it possible to put all the ribbons from a group above all the ribbons from a different group, or would it just be for the .width level order?

Yeah, I would probably make the current behavior the default but allow arbitrary re-ordering using order, which would include putting all of one group above another.

Both those things would be great!

TODOs for me:

  • ignore NAs when calculating draw order based on widths
  • allow use of order to determine draw order of ribbons

Your original code should now produce the correct result on the dev version (install via remotes::install_github("mjskay/ggdist@dev"):

image

This doesn't implement the arbitrary ordering via an order aesthetic yet, but I will do that at some point.

FYI dev branch now allows draw order of ribbons to be explicitly controlled via the order aesthetic. You could (e.g.) do something like aes(order = interaction(-.width, strata)) to plot all the ribbons for each level of strata together.

Awesome, both work great!

Would it be possible to order the lines with their ribbons too?

For example, I find it a bit disorienting that the green line is on top of the orange ribbons here; it would feel more natural if it was under the orange ribbons.

image

(I'm guessing the answer is no, and this would be better handled by adding two geom_lineribbons() to the plot each subset by a level of strata)

Hmm yeah. The lines are like that so that other types of displays (particularly gradient-style lineribbons with lots of low-opacity intervals) work well... but maybe I can figure out a reasonable solution that accommodates both.

I was mostly just curious, but if it is maybe possible want me to open a new issue for it?

sure, then I can close this one

Closing since aa1e3b7 and 71f346b resolved the original issue.