bokeh / bokeh

Interactive Data Visualization in the browser, from Python

Home Page:https://bokeh.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[DOC] Boxplot example: some whiskers and vbar are rendered slightly asymmetrically and boldly

dinya opened this issue · comments

Software versions

Python version : 3.11.5
Tornado version : 6.3.3
Bokeh version : 3.4.0

Expected behavior

Some whiskers and vbar are rendered slightly asymmetrically and boldly.

It turned out that the figure displays a lot of whisker and vbar (overlapped), because the source is the original dataframe df with "duplicated" quantiles (q1, q2 and q3), upper and lower.

If you make qs the data source for whisker and vbar, including the calculation of quantiles in qs, then the figure becomes prettier:

Original example "Patched" example
boxplot_example_original boxplot_example_patched

Observed behavior

See above.

Example code

Original example

import pandas as pd

from bokeh.models import ColumnDataSource, Whisker
from bokeh.plotting import figure, show
from bokeh.sampledata.autompg2 import autompg2
from bokeh.transform import factor_cmap

df = autompg2[["class", "hwy"]].rename(columns={"class": "kind"})

kinds = df.kind.unique()

# compute quantiles
qs = df.groupby("kind").hwy.quantile([0.25, 0.5, 0.75])
qs = qs.unstack().reset_index()
qs.columns = ["kind", "q1", "q2", "q3"]
df = pd.merge(df, qs, on="kind", how="left")

# compute IQR outlier bounds
iqr = df.q3 - df.q1
df["upper"] = df.q3 + 1.5*iqr
df["lower"] = df.q1 - 1.5*iqr

source = ColumnDataSource(df)

p = figure(x_range=kinds, tools="", toolbar_location=None,
           title="Highway MPG distribution by vehicle class",
           background_fill_color="#eaefef", y_axis_label="MPG")

# outlier range
whisker = Whisker(base="kind", upper="upper", lower="lower", source=source)
whisker.upper_head.size = whisker.lower_head.size = 20
p.add_layout(whisker)

# quantile boxes
cmap = factor_cmap("kind", "TolRainbow7", kinds)
p.vbar("kind", 0.7, "q2", "q3", source=source, color=cmap, line_color="black")
p.vbar("kind", 0.7, "q1", "q2", source=source, color=cmap, line_color="black")

# outliers
outliers = df[~df.hwy.between(df.lower, df.upper)]
p.scatter("kind", "hwy", source=outliers, size=6, color="black", alpha=0.3)

p.xgrid.grid_line_color = None
p.axis.major_label_text_font_size="14px"
p.axis.axis_label_text_font_size="12px"

show(p)

"Patched" example

import pandas as pd

from bokeh.models import ColumnDataSource, Whisker
from bokeh.plotting import figure, show
from bokeh.sampledata.autompg2 import autompg2
from bokeh.transform import factor_cmap

df = autompg2[["class", "hwy"]].rename(columns={"class": "kind"})

kinds = df.kind.unique()

# compute quantiles
qs = df.groupby("kind").hwy.quantile([0.25, 0.5, 0.75])
qs = qs.unstack().reset_index()
qs.columns = ["kind", "q1", "q2", "q3"]
# Patch 1
#df = pd.merge(df, qs, on="kind", how="left")

# compute IQR outlier bounds
# Patch 2
#iqr = df.q3 - df.q1
#df["upper"] = df.q3 + 1.5*iqr
#df["lower"] = df.q1 - 1.5*iqr
iqr = qs.q3 - qs.q1
qs["upper"] = qs.q3 + 1.5*iqr
qs["lower"] = qs.q1 - 1.5*iqr
df = pd.merge(df, qs, on="kind", how="left")

# Patch 3
#source = ColumnDataSource(df)
source = ColumnDataSource(qs)

p = figure(x_range=kinds, tools="", toolbar_location=None,
           title="Highway MPG distribution by vehicle class",
           background_fill_color="#eaefef", y_axis_label="MPG")

# outlier range
whisker = Whisker(base="kind", upper="upper", lower="lower", source=source)
whisker.upper_head.size = whisker.lower_head.size = 20
p.add_layout(whisker)

# quantile boxes
cmap = factor_cmap("kind", "TolRainbow7", kinds)
p.vbar("kind", 0.7, "q2", "q3", source=source, color=cmap, line_color="black")
p.vbar("kind", 0.7, "q1", "q2", source=source, color=cmap, line_color="black")

# outliers
outliers = df[~df.hwy.between(df.lower, df.upper)]
p.scatter("kind", "hwy", source=outliers, size=6, color="black", alpha=0.3)

p.xgrid.grid_line_color = None
p.axis.major_label_text_font_size="14px"
p.axis.axis_label_text_font_size="12px"

show(p)

After https://discourse.bokeh.org/t/boxplot-example-performance/11177/2

@dinya do you intend to submit a PR?

@bryevdv I'll try. You set milestone 3.x. Can I fork branch-3.5 (it's suggested by GitHub on forking by default) and add the patch there?

Yes any changes should currently go against branch-3.5