whitews / FlowKit

A Python toolkit for flow cytometry analysis supporting GatingML and FlowJo workspaces

Home Page:https://flowkit.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

plot_scatter() density coloring is distorted

gregcourville opened this issue · comments

Describe the bug
When I use Sample.plot_scatter() or Session.plot_scatter() with color_density=True, the colors don't seem to be mapping correctly to the plot points. There is a region where nothing got colored and there are some lonely points that got hot colors. See example image below. Happy to investigate further if you want me to check anything specific.

Code To Reproduce
Code to reproduce the behavior:

import bokeh
from bokeh.plotting import show
import flowkit as fk

bokeh.io.output_notebook()
%matplotlib inline

sample = fk.Sample("data/11-Well-F2.fcs")
p = sample.plot_scatter(
    'SSC-H', 'FSC-H',
    subsample=False,
    y_min=0., y_max=6e5,
    x_min=0., x_max=3e5,
    source='raw',
    color_density=True)
show(p)

Expected behavior
Point colors following local density

Screenshots
Result of above:
bokeh_plot (3)

Same data plotted in Kaluza for reference:
kaluza

Desktop (please complete the following information):

  • OS: Debian GNU/Linux 9 (stretch) on WSL2
  • Python version: 3.8.15
  • FlowKit version: 0.9.90b0
  • bokeh version: 2.4.3
  • Jupyter Notebook version: 6.5.2

Hi Greg,

Yeah, that doesn't look right. Can you check the output in the 1st tutorial notebook? You can find it in the docs/notebooks folder of the repo, the file name is:

flowkit-tutorial-part01-sample-class.ipynb

Towards the bottom of that notebook it calls the Sample.plot_scatter() method. This will help narrow down whether the issue is the FCS file or some difference in your environment.

Thanks for reporting the issue!
-Scott

Interesting! It appears to work fine with the example data:
bokeh_plot_01251059

I also tried a contour plot with my data and it does something similar to what was happening with the density-colored scatter plot. It almost looks like the axes are swapped?
contour

Here is the FCS file in question:
11-Well-F2_fcs.zip

Thanks for the info, I'll check out the file and see what I can find.

Hi Greg,

I see what is happening here. First, this data set has some pretty serious outliers, especially in the scatter channels. Those outliers helped to reveal a bug that was present but not visibly apparent in all cases. There is an internal fixed bin size for the color density calculation. The extreme outliers stretched this bin size to a rather large size. With such large bins, the resolution of the colors is nowhere near fine enough to look reasonable.

The bug is that the first bin was not inclusive, or rather what we thought was the first bin really wasn't. In data sets without major outliers and without dense events at the lower bounds this is never apparent. Removing most of the outliers in this data set reduces the bin size and you can see the color density becomes reasonable. However, you can also see the slice of events at the lower bound that are not included in the density calculation:

Proposed Solution

  • Implement a fix for 1st bin issue
  • Expose bin size as a keyword argument to the plot scatter methods so users can control the "resolution" of the color mapping
  • Add outlier detection via a simple Z-score method applied to each channel
  • Expose the outlier standard deviation threshold as a keyword argument (default set to 3.0). This value will only affect the behavior of the color density (i.e. the outliers are still displayed). A value of None will turn off outlier detection and you will get output similar to what you reported.
  • Investigate performance & visual differences of scipy's interpn interpolation methods. Currently using 'spline2fd' but 'linear' may be faster.

I've got crude versions for some of this, might be able to push some commits later today for testing.

-Scott

Greg,

I think I've come up with a better way to auto-bin the data for all data sets. Can you try out the code at commit 64d04fe and see how it looks?

Thanks,
Scott