Scaling of density_bounded() to match the sample size
DominiqueMakowski opened this issue · comments
Hiya, I am trying to compute the density for various groups of unequal size, and I am trying to reflect that in the density plot (but at the level of the data itself), similarly to what a histogram would naturally do.
Does it make sense to simply multiply the y value by the number of observations like so:
library(tidyverse)
library(ggdist)
data <- rbind(
mutate(data.frame(v = rnorm(100, 0, 1)), group = "A"),
mutate(data.frame(v = rnorm(1000, 0.5, 1)), group = "B"),
mutate(data.frame(v = rnorm(500, -0.5, 1)), group = "C")
)
density_data <- data.frame()
for(group in unique(data$group)) {
slice <- data[data$group == group, "v"]
dens <- ggdist::density_bounded(slice)
density_data <- rbind(
density_data,
data.frame(x = dens$x,
y = dens$y * length(slice), # <--- HERE
group = group)
)
}
density_data |>
ggplot() +
geom_line(aes(x = x, y = y, color = group))
Created on 2023-05-16 by the reprex package (v2.0.1)
It gives a plausible plot, but it seems too simple to be accurate...
Thanks a lot for any tips :)
Yup, that is the correct approach!
If you don't care about the count axis you can also do this directly within stat_slab()
using the n
computed variable:
data |>
ggplot() +
stat_slab(aes(x = v, thickness = after_stat(pdf * n), color = group), fill = NA)
Or using the probability expression mini-dsl:
data |>
ggplot() +
stat_slab(aes(x = v, thickness = !!p_(x) * n, color = group), fill = NA)