Scaling of density_bounded() to match the sample size

Question

Scaling of density_bounded() to match the sample size

DominiqueMakowski opened this issue a year ago · comments

Hiya, I am trying to compute the density for various groups of unequal size, and I am trying to reflect that in the density plot (but at the level of the data itself), similarly to what a histogram would naturally do.

Does it make sense to simply multiply the y value by the number of observations like so:

library(tidyverse)
library(ggdist)

data <- rbind(
  mutate(data.frame(v = rnorm(100, 0, 1)), group = "A"),
  mutate(data.frame(v = rnorm(1000, 0.5, 1)), group = "B"),
  mutate(data.frame(v = rnorm(500, -0.5, 1)), group = "C")
)

density_data <- data.frame()
for(group in unique(data$group)) {
    slice <- data[data$group == group, "v"]
    dens <- ggdist::density_bounded(slice)
    density_data <- rbind(
      density_data,
      data.frame(x = dens$x,
                 y = dens$y * length(slice),    #   <--- HERE 
                 group = group)
    )
  }

density_data |>
  ggplot() +
  geom_line(aes(x = x, y = y, color = group))

^{Created on 2023-05-16 by the reprex package (v2.0.1)}

It gives a plausible plot, but it seems too simple to be accurate...
Thanks a lot for any tips :)

Matthew Kay · Answer 1 · Wed May 17 2023 12:46:06 GMT+0800 (China Standard Time)

Yup, that is the correct approach!

If you don't care about the count axis you can also do this directly within stat_slab() using the n computed variable:

data |>
  ggplot() +
  stat_slab(aes(x = v, thickness = after_stat(pdf * n), color = group), fill = NA)

Or using the probability expression mini-dsl:

data |>
  ggplot() +
  stat_slab(aes(x = v, thickness = !!p_(x) * n, color = group), fill = NA)