mjskay / ggdist

Visualizations of distributions and uncertainty

Home Page:https://mjskay.github.io/ggdist/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

annotation layers for thickness and dots scales

mjskay opened this issue · comments

Pinging off of #182, it occurred to me a solution for this would be to add a layer that is capable of adding subscale axis labels for thickness and dots geoms. It would be like a legend, but drawn directly on the chart.

This requires knowing geom settings and data from a slab or dots geom, so this would probably have to be tied to the geom. I initially thought a separate layer makes sense, but perhaps an option on a slab is more sensible, because of the inherent ties to the normalization settings of the geom (and, in the case of dots, it would have to be computed after binwidth is determined by the grob, so can't be on a separate layer at all). Something like stat_slab(..., thickness_guide = ...) or stat_slab(..., subaxis = ...) or stat_slab(..., subguide = ...) ...

I think that this could work, and would be helpful.
But it needs to be robust to missing/NA values in x, as in palmerpenguins'
penguins$bill_length_mm
apparently 2 penguins were not very cooperative with bill measurement that day.

This works as a very simple version (without being robust to missingness in x)
library(tidyverse)
library(ggdist)
set.seed(1234)
x = rnorm(100)

binwidth = find_dotplot_binwidth(na.omit(x), maxheight = 2/3*diff(range(x, na.rm = TRUE)), heightratio = 1)

bin_df = bin_dots(x = x, y = 0, binwidth = binwidth, heightratio = 1)

bin_df %>%
ggplot(aes(x0 = x, y0 = y / binwidth, a = binwidth/2, b = 1/2, angle = 0)) +
ggforce::geom_ellipse(fill = "gray") +
coord_fixed(ratio = binwidth) +
ylab("Count") +
xlab("Bill Length in mm") +
labs(title ="Count Histogram of Penguin Bill Length in mm") +
theme_classic()

This version seems to be robust to missingness in x

library(tidyverse)
library(ggdist)
set.seed(1234)
x = rnorm(100)

x = penguins$bill_length_mm

binwidth = find_dotplot_binwidth(na.omit(x), maxheight = 2/3*diff(range(x, na.rm = TRUE)), heightratio = 1)

bin_df = bin_dots(x = na.omit(x), y = 0, binwidth = binwidth, heightratio = 1)

bin_df %>%
ggplot(aes(x0 = x, y0 = y / binwidth, a = binwidth/2, b = 1/2, angle = 0)) +
ggforce::geom_ellipse(fill = "gray") +
coord_fixed(ratio = binwidth) +
ylab("Count") +
xlab("Bill Length in mm") +
labs(title ="Count Histogram of Penguin Bill Length in mm") +
theme_classic()

Thanks, this will be helpful for updating the docs of bin_dots / find_dotplot_binwidth to help other folks with this problem!

If anyone (@ASKurz ?) is interested in trying this out, there is now a prototype implementation of what I am provisionally calling "sub-guides" for annotating thickness and dot counts. You can test it on the "subguide" branch via:

remotes::install_github("mjskay/ggdist@subguide")

Some examples:

library(ggplot2)
library(ggdist)
library(distributional)

df = data.frame(
  x = c(dist_gamma(1:2,1:2), dist_normal(2:3,0.75)),
  group = c("a","a","b","b"),
  subgroup = c("d","e","d","e")
)

df |>
  ggplot(aes(xdist = x, y = group, fill = subgroup)) +
  stat_dots(subguide = "count", position = "dodge", color = NA, justification = 0.5, quantiles = 50)

image

df |>
  ggplot(aes(xdist = x, y = group, fill = subgroup)) +
  stat_dots(
    subguide = subguide_count(title = "count", label_side = "left"), 
    position = "dodgejust", 
    color = NA, 
    quantiles = 50, 
    height = 0.91
  ) +
  scale_x_continuous(expand = expansion(add = 0.6))

image

df |>
  ggplot(aes(xdist = x, y = group, fill = subgroup)) +
  stat_slabinterval(
    subguide = subguide_axis(label_side = "outside", title = "density"), 
    position = "dodgejust", 
    height = 0.9,
    scale = 0.9,
    side = "top"
  ) +
  scale_x_continuous(expand = expansion(add = 1))

image

df |>
  ggplot(aes(xdist = x, y = group, fill = subgroup)) +
  stat_slabinterval(
    subguide = subguide_outside(title = "density"), 
    position = "dodgejust", 
    height = 0.9,
    scale = 0.8,
    side = "top",
    normalize = "groups"
  ) +
  scale_y_discrete(breaks = NULL) +
  ylab(NULL) +
  theme(plot.margin = margin(5.5, 5.5, 5.5, 50))

image

Positioning can be a bit finicky, but I'm not sure there's any way to make that easier without fundamental changes to ggplot2 (see e.g. tidyverse/ggplot2#5609)

Thanks for the heads up @mjskay. I bet @kruschke would like this.

But anyways, so far I really like what I'm seeing.

So I guess this issue is independent from using similar scales across facets?

Notice the different y-axis scales in the left and right facets

expand_grid(
  group = c("a","a","b"),
  subgroup = c("d","d","e"),
  reps = 1:50
) %>% 
  mutate(x = rnorm(n(), group=="a", 1+(subgroup == "d"))) %>% 
  ggplot(aes(x = x, fill = subgroup)) +
  ggdist::geom_dots(
    subguide = ggdist::subguide_count(title = "count", label_side = "left"), 
    position = "dodgejust", 
    color = NA, 
    height = 0.91
  ) +
  scale_x_continuous(expand = expansion(add = 0.6)) +
  facet_grid(cols = vars(group))

image

Yeah the faceting issue is separate unfortunately; trickier to address. See #191.

I think the faceting issue would also apply to my use cases.

For faceting, if the chart isn't dynamic you can just choose a binwidth manually and then everything should line up --- the inconsistency is caused by the automatic binwidth algorithm picking different binwidths in different charts.

no complaints so far, so this is on master now and will be in the next release.