observablehq / plot

A concise API for exploratory data visualization implementing a layered grammar of graphics

Home Page:https://observablehq.com/plot/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Automatic binning when a facet dimension is quantitative?

mbostock opened this issue · comments

It’d be neat if you could use a quantitative dimension for faceting, and we automatically binned it (say using d3.bin) into a reasonable number of facets.

Borrowing from cartography we would want to use Jenks natural breaks or k-means, not only quantize. Seems particularly relevant for faceting, to avoid creating spurious (almost empty) facets. E.g. if the dimension has 3 modes we want those modes as the facets.

This would be done, I guess, by specifying the thresholds (or threshold generator) to d3.bin.

For a relevant example, I combined ac93f58 with simple-statistics' ckmeans method to cluster countries by GDP per cap:
Capture d’écran 2020-11-24 à 10 01 51

These 4 clusters would be my facets.

The default thresholds using d3.ticks have the nice property that the axis documents the threshold values. I wonder if you specify alternative thresholds if there would be a convenient way to use those threshold values as ticks also — it’s hard to tell in the screenshot above exactly where the thresholds are. Though, I suppose exactness is not essential and they’re probably not nice round values anyway.

The https://observablehq.com/d/e87ba37a7b86bb94#ckMeansNiceThresholds function returns "not so ugly" thresholds, I suppose we could use them as ticks: for example : [14500, 38000, 80000].

Adding ticks: breaks to the x-axis definition works well if you’re passing in explicit thresholds.

The interval scale option is a great workaround for this issue. It’s not automatic since the interval isn’t computed automatically, but it makes it very easy to bin while faceting. For example:

Screenshot 2023-04-23 at 1 50 27 PM

Plot.plot({
  fy: {
    grid: true,
    tickFormat: ".1f",
    interval: 0.1,
    reverse: true
  },
  marks: [
    Plot.boxX(olympians.filter((d) => d.height), {x: "weight", fy: "height"})
  ]
})