scverse / pytometry

Flow & mass cytometry analytics.

Home Page:https://pytometry.readthedocs.io/en/latest/index.html

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Dependencies updates

quentinblampey opened this issue · comments

Hello,

In my opinion, all dependencies are very common and/or lightweight, except datashader which requires dask/xarray/pillow among others. Since it is used in only one plot function, should we moved this inside an extra dependency? If someone runs scatter_density without datashader installed, we could log an error explaining how to install the extra.

Alternatively, instead of throwing an error, we could also plot a subset of cells if datashader is not installed and add a warning like "Cells are subset. To show all cells, install datashader with 'pip install pytometry[performance]'"

But maybe you plan to implement other plot functions with datashader? In that case, I agree that it would be really preferable to keep it in the main dependencies. What do you think @mbuttner, @grst?

Also, what about moving nbproject in the dev dependencies? Is there a reason to have this in the main dependencies?

Hi @quentinblampey

thank you for your suggestions! I agree with the suggestion to move the datashader package inside an extra dependency for the plotting library and only display subsets of cells with a corresponding warning message.
I am personally very keen at showing all cells wherever possible, which poses quite a challenge in flow cytometry and CYTOF. However, let me stress the importance of having a lightweight package first and extend the visualization functionality second. We can still discuss how to integrate the plotting functions at a later stage.

If scatter_density is the only function that requires datashader, I think we can follow @ivirshup's suggestion to get rid of it altogether and replace it with np.histogram2d (see scverse/governance#64 (comment)).

UMAPs/embeddings with millions of cells are also slow with matplotlib and can benefit from datashader (with categorical data it's not as trivial as to use histogram2d). But for embeddings, in my experience, there's not a lot to gain from showing all cells vs. subsampling. Plus any performance uprades for sc.pl.umap should probably be solved on the scanpy side.

About nbproject: It is currently used for the testing of notebooks of the package, so I suggest to keep it.

Thanks @grst, I'll try the np.histogram2d solution and do a PR if it looks promising.

Concerning nbproject, if it's used only for testing, then we can move it to the "test" dependencies, right?

Concerning nbproject, if it's used only for testing, then we can move it to the "test" dependencies, right?

I think there should be an additional group docs with all the packages required to run the tutorial, including nbproject.

Closed as completed.