dask / dask-blog

Dask development blog

Home Page:https://blog.dask.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Imaging post on segmentation

mrocklin opened this issue Β· comments

Following on the loading and deconvolution posts (#26 #27) we would next like to perform image segmentation, and then get out some quantities for each region. What is the right way to do this?

@rxist525 @jakirkham @thewtex @jni

When it comes to segmentation, there are a lot of techniques one could employ. Here's a short list from ITK (requires some drilling down to get to actual implementations). Deep learning is also commonly employed here.

My temptation would be to treat segmentation more-or-less as a blackbox with a standard API. IOW one supplies an image and get out a label image (objects identified). Alternatively we could imagine this being two steps. Namely we get a mask of interesting portions of the image selected and then label those regions based on connectivity. Though we probably don't want to get much more granular than that. Users will already be more familiar with what segmentation algorithms work well on their data. This is just my opinion though. Others may have thoughts here. πŸ™‚

We can certainly pick one from ITK to make things a bit more concrete. If you have thoughts on this, @rxist525, or if others have thoughts on this, that would be useful.

Segmentation is probably a bit trickier than deconv... You probably want map_overlap with sufficient overlap to have "one full segment" in the overlap zone.

Yeah it depends on what our goal is here. With @rxist525's data, this would still be map_blocks as full frames fit comfortably in-memory. Generally this is not true though and map_overlap would be required (amongst other things). Thoughts on the intended scope here, @mrocklin?

Lots of things to unpack here. Two main points to focus on are types of segmentation and scalability. While the type of data that you are currently working on will fit into memory, some of the operations (like tensor voting) would speed up geometrically with smaller volumes. On the other hand, we do have datasets (4D instead of 5D) that are several terabytes for a single volume.
https://www.biorxiv.org/content/10.1101/374140v1.supplementary-material
Perhaps a quick chat this week to discuss/brainstorm a few avenues to explore?

Previously @jni and I worked on extending connected components to larger than memory data. ( dask/dask-image#94 ) It was doable, but did require a few tricks and generated basically a module worth of code. Connected components was probably easier than segmentation will be as it was clearer (though not exactly straightforward) to resolve pixels near chunk boundaries. With segmentation I'd naively suspect resolving pixels near chunk boundaries is somewhat dependent on the segmentation algorithm used. While it could be interesting to explore this, I'm somewhat inclined to agree with @mrocklin that this is out of scope for a blogpost. It might be a reasonable PR though. Would you agree or do you have other thoughts?

Perhaps a quick chat this week to discuss/brainstorm a few avenues to explore?

Sorry to miss this @rxist525 I'm up for chatting. I think that people like @jakirkham and @thewtex are probably more important to have in that conversation though.

@rxist525 also, I'm curious at what point you or someone you know might be interested in getting engaged in the development process here. Using the last post as a model, my guess is that embarrassingly parallel map_blocks + ITK workloads are straightforward enough that someone who is comfortable with Numpy and ITK could start engaging.

There would almost surely be problems, but we could work through those together.

From my perspective we're mostly bound by people who know what imaging routines to run being busy, rather than by people who understand Dask.

Yeah just to unpack @mrocklin's point a bit further and following up on a related conversation with @thewtex, I think one can get pretty far with a wrapper function that looks like this. Now one can use map_blocks with whatever one wants. We could do a similar thing with the Numba example at the end of the last blogpost to simplify this further and thus avoid even the use of map_blocks.

def itk_udf_wrapper(udf, img, udf_args, udf_kwargs):
    """ Apply user-defined function to a single chunk of data """
    import itk

    img = img[0, 0, ...]  # remove leading two length-one dimensions
    image = itk.image_view_from_array(img)   # Convert to ITK object

    op_result = udf(image, *udf_args, **udf_kwargs)  # Call ITK-based user function

    result = itk.array_from_image(op_result)  # Convert back to Numpy array
    result = result[None, None, ...]  # Add back the leading length-one dimensions

    return result

That said, the region properties side of the story may be a bit more interesting.

@rxist525 also, I'm curious at what point you or someone you know might be interested in getting engaged in the development process here. Using the last post as a model, my guess is that embarrassingly parallel map_blocks + ITK workloads are straightforward enough that someone who is comfortable with Numpy and ITK could start engaging.

There would almost surely be problems, but we could work through those together.

From my perspective we're mostly bound by people who know what imaging routines to run being busy, rather than by people who understand Dask.

@mrocklin Great question - and good timing. I think we are just finished with our infrastructure setup that we can engage more seriously. We should setup a time to chat next week with my postdoc @ruanxt present. let's move this discussion to email?

I was chatting with @sofroniewn last week and he mentioned that it might also be interesting to try using a trained PyTorch model for segmentation. I would personally be fine with anything. This might also be a nice addition because this sequence would then have used all of scikit-image, ITK, and Torch to scale out image processing, which would be a nice Trifecta.

From my perspective neither scaling nor writing about this is the bottleneck. The bottleneck to doing this is just finding someone who can write up the imaging side of this, probably as a notebook, in such a way that I or someone else can then scale out easily without knowing too much about imaging.

Are there any labels of this dataset that can be used for generation of the PyTorch model?

We have a couple of datasets with curated labels, if necessary.

@sofroniewn produced this notebook that uses some custom code around PyTorch to perform pixel-featurization. It might be a good base for a next step here.

https://gist.github.com/sofroniewn/2e1d5068a979e4393fd549dff675d543

@sofroniewn two questions about your notebook:

These lines are interesting:

    # remove leading two length-one dimensions
    img = image[0, 0, ...]
    
    # make sure image has four dimentions (b,c,w,h)
    img = np.expand_dims(np.expand_dims(img, 0), 0)
    img = np.transpose(img, (1,0,2,3))

You take away two dimensions, and then seem to add them back in. Was this just because of copy-pasting from the previous post, or was this intentional in some way.

Is there a nice way to produce an image from the output data here? One of the things that I think we could have done better on with the deconvolution post was providing a before-and-after comparison. Presumably pixel-level segmentation would also provide some eye-candy.

@mrocklin - i wasn't sure what to do with those two lines, removing then adding back in the dimensions. On the one hand they accomplish nothing, but in many ways that is by chance and the reasons they are in there (if explained!) might be informative to people.

My understanding is that the first two lines where we drop the leading two dimensions are because of the shape of the original dask array and the chunking - we have a 4D array but we're interested in extracting 2D images so we drop those first two length-one dimensions. If we'd had a 5D array of data we would have had to drop the first three length-one dimensions.

The adding of dimensions back in corresponds to getting the data in a shape that PyTorch expects, which is Batch x Channel x Width x Height and here Batch and Channel are just 1, so in this example we go 4D -> 2D -> 4D, but we might have had to go 5D -> 2D -> 4D.

Maybe all that is just more confusing than necessary and we can leave the comment # make sure image has four dimentions (b,c,w,h) without actually running that code. Also that transpose doesn't seem to do anything either in this case (which was a copy and paste type of situation!) and so can be dropped too.

As to producing images of the output data, yes it's possible - the output data can actually be used to produce 16 images of the different features, but they don't look so great. Maybe I'll think of something else to show. I agree though a before / after image would be nice.

the output data can actually be used to produce 16 images of the different features, but they don't look so great.

Is it possible to have the model produce not 16 features, but 3 and then map those to RGB?

but in many ways that is by chance and the reasons they are in there (if explained!) might be informative to people.

Yes, I think that regardless we should explain the dimensions in each chunk, and map them in a way that users understand. My hope is that we can do this with a fair amount of prose, similar to what you have provided in your recent comment :)

I could also just pick 3 random features from the 16 to show as rgb. I'll play around and see what looks good.

As a more general comment to explain how the featurization i'm doing with the UNet relates to the segmentation - on other datasets i've been using those learned features along with some minimal human input to then perform a random forrest based semantic-segmentation like ilastic does. Once you've got the random forrest then you can easily apply that with map_blocks too, so I could provide the random-forrest and then go all the way to a multiclass semantic segmentation.

The only slight downside is that the data is much more suited to instance segmentation (finding the individual cells) and not semantic segmentation (classifying pixels) as most pixels are either cell or background and a relatively simple threshold can tell those apart. Instance segmenation is just a bit harder and I don't have anything out of the box right now that would do great. I'll think about it more, curious what direction anyone else wants to see the blog post go in

@sofroniewn , friendly poke :)

(but no pressure, you have an actual job, I know)

Hi @sofroniewn!
I really like your draft blog post in this gist and think it'd be very useful for other people doing similar work too. What do you think needs to happen so it can be published in the blog? (I'm very happy to help out with tidying up & editing, if you're swamped for time)

I really like your draft blog post in this gist and think it'd be very useful for other people doing similar work too. What do you think needs to happen so it can be published in the blog? (I'm very happy to help out with tidying up & editing, if you're swamped for time)

Wow, i'll be honest I completely forgot about this!! eek, my bad!! That demo just did some basic featurization, I think I'd revisit it now to do actual segmentation using either stardist or cellpose (or both!). The movie will look much cooler!!

One example I was thinking of was actually using a multiscale histology image as input and doing segmentation lazily, but that's quite tricky as you have to think about different scales.

Curious if you have ideas @GenevieveBuckley on what would be cool? I'm happy to pick this back up together, I'd love the support around tidying/ editing, but I'm also happy to make one more push to make it a little more interesting/ useful to people compared to where it is at right now

I'd want to err on the side of putting something out relatively soon, since we can always edit blogposts (or make another follow up post).

I like the idea of combining it with stardist/cellpose. That would be very useful for other people to see. If it can be included without too much extra work I'm all for it, otherwise we can have that be a standalone follow up post.

As I've said, if there's any way I can help here just let me know. You have my email & we can always videochat if it's useful :)

Had a chat with @sofroniewn today, he's going to make a PR in the next couple of days (this comment is mostly for future-me, who often forgets what happened when if it's not logged in a comment thread)

Perhaps the basic segmentation pipeline I wrote for SciPy Japan would be a good fit here? I have a to-do item in my notes about adding it to the dask-examples repository, but maybe we could do both.

You can see the segmentation example if you scroll down here

That would be great! πŸ˜„

Do you all have best practices on how are labels can be created and then merged across chunks? The segmentation section of the recent blogpost uses a subset of data, so it doesn't look like a chunked situation may get handled well.

For our pipeline, we had to sequentially do 2 very funky map_overlaps -- avoiding trimming, doing array gymnastics, and using block_info to make sure chunks used unique label ranges. This allowed the 2nd map_overlap to peek into neighboring chunks and merge labels across nearest neighbor chunks.

(To fit on a GPU, my blocks are 256x256x256 and take 40 minutes to segment using cellpose. Our datasets can be approx 500 x 20k x 20k.)

Yes we implement the SciPy-equivalent of label in dask-image πŸ˜„

Currently this is CPU-only, but could be done on the GPU with some additions to CuPy ( cupy/cupy#4054 ) and corresponding changes in dask-image

I like what you've done, and I think I can re-use some of what you've done with the connectivity graphs to avoid my 2 map_overlaps. I'm going to examine that code more.

How much of the code is serial (which I likely need to avoid)? I see the labelling does the chunks serially so that the labels do not overlap. Is anything else serial?

One thing I did to avoid the label collision and run my chunks fully in parallel is use int64 for the labels. I run naive labelling (which is int32), and then just add in a unique chunk_id (derived from block_info) in the upper 32 bits. You could then relabel back to int32 in the end, if desired.

Well hopefully it can be used directly and we can work on improvements together πŸ˜‰

I think it is just resolving collisions.

What happens when a label spans multiple chunks?

Our labels can span multiple chunks, but only direct neighbors (i.e. in 3d, a label may appear in 8 chunks). So the use case might be easier in some sense -- it means we don't have to block for all chunks to be processed for global information. That said, blocking may be a good trade-off in exchange for a simpler algorithm.

Perhaps the only insight I have to contribute back is avoiding the serial labelling of chunks by temporarily using int64 that include block_ids. Parallel processing is crucial in our case.

I think the adjacency graph could be useful in image stitching, as well. It may be hard to implement in full generality, but a useful narrow subset of use cases (where the movements are less than a chunk size) may be feasible.

@chrisroat

Perhaps the only insight I have to contribute back is avoiding the serial labelling of chunks by temporarily using int64 that include block_ids. Parallel processing is crucial in our case.

Very clever! But the labeling is in parallel, the only serial bit is small in the scheme of things?

At any rate, I don't remember exactly how we do the label reassignment after the correspondence graph, but I suspect it might use a NumPy array with the labels as indexing, which would be huge if you use that unique-labels strategy. (Update: indeed we do.) Also, the graph we make is a scipy.sparse.csr_matrix, which likewise needs O(max_label) storage to store the indptr array.

So it's not as minor a tweak as one would initially suspect...

I see the why I'm talking past you all -- the ndmeasure label function labels connected components, but other labelling strategies I'm imagining (i.e watershed, cell segmentation) are more expensive and parallelization is useful. Sorry!

In this case, the labelling is fairly inexpensive and may scale well -- though I don't know... perhaps there are nasty edge cases. It is done serially, as the labels in block N+1 start where the labels in block N end.

Enforcing a requirement that the labels be sequential is not always necessary. The list of labels is not always needed; and even if it is, keeping a list of labels is not expensive compared to the image size.

Would a O(num_labels) relabelling strategy be feasible if non-sequential labels were allowed? Or is there some vectorization speed-up in the CSR approach that you would lose?

Adding watershed is definitely of interest (though tricky!). Also generalizing what we have so it can be a framework to plugin other segmentation operations that can operate on the chunk level makes sense as well. If you have ideas on what is useful, that would be good to discuss those as well

I actually misread the labelling as being inside a code loop, but it's delayed and runs in parallel.

In the case of cell segmentation, two or more regions from neighboring chunks could meet at a chunk boundary and should not be merged -- so it's hard to use the two-pixel wide boundary slices in this algorithm to merge regions. Some (but not all) cell segmentation has the advantage that the segmentation is compact and the max extent of a region can be roughly known. Ours is compact and runs in two stages -- it seems to work OK:

  • map_overlap with enough depth to capture a full region (plus more, as the segmentation algorithm is sensitive at the image edges). Any segment is discarded if its center of mass lies outside the nominal chunk boundary (i.e a region is centered in the overlap region, but a small fraction crosses into the nominal chunk).
  • a careful map_overlap over the untrimmed output of the previous pass allows a chunk to overwrite itself with boundary-crossing segments from neighboring chunks. this was a bit mind-bending for me to write, but I didn't think of a better way.

The second pass might benefit from the ideas here about slicing around boundaries, and creating an adjacency graph. But it seems difficult to me at the moment -- a corner of a chunk might get input from 7 of its neighbors.

I actually misread the labelling as being inside a code loop, but it's delayed and runs in parallel.

Yup. 😊

Would a O(num_labels) relabelling strategy be feasible if non-sequential labels were allowed? Or is there some vectorization speed-up in the CSR approach that you would lose?

Yes, what we would lose is the pre-existing implementation of connected components using scipy.sparse.csgraph. We'd have to do our own thing, or use NetworkX. The point is that scipy.sparse.csgraph uses the scipy.sparse.csr_matrix as the data structure, and that contains an indptr array that has length (max_label+1). And actually I think it's int32! πŸ˜‚

In [1]: import numpy as np
In [2]: from scipy import sparse
In [3]: n = 10                                                                                                                
In [4]: mat = sparse.csr_matrix(np.eye(n, n, n-1))                                                                            
In [5]: mat.indptr                                                                                                            
Out[5]: array([0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], dtype=int32)

I think a mapping between 64-bit labels and int32 indices is probably the fastest way to adapt this.

Generally, the graph of overlapping labels is a useful concept and indeed extends generally to non-identical segmentation stitching β€” you just need some overlap threshold. That would be a fun project. =)

Thinking about this point on avoiding sequential label offset updates and Chris' suggestion. I guess it is arbitrary where we account for the chunk's integer. IOW we don't necessarily need to put it in the highest bits. Instead we could put it in the lowest bits and just do a shift of the labels after to make room for the block integer. This should still be uint32 friendly (unless we really have so many chunks and labels within a chunk to need the promotion to uint64, which should hopefully be rare)

Great idea @jakirkham! ❀️ I've summarised this discussion in dask/dask-image#199, which is probably a more useful place to keep this information long term. 😊 (Props to @GenevieveBuckley who has modeled good discussion-archiving practices for me often. πŸ˜…)

Awesome thanks for doing that Juan! πŸ˜„

I'm going to mark this as closed by #82.