Imaging post on segmentation

Question

Imaging post on segmentation

mrocklin opened this issue 5 years ago · comments

Following on the loading and deconvolution posts (#26 #27) we would next like to perform image segmentation, and then get out some quantities for each region. What is the right way to do this?

@rxist525 @jakirkham @thewtex @jni

jakirkham · Answer 1 · Wed Aug 07 2019 20:21:15 GMT+0800 (China Standard Time)

When it comes to segmentation, there are a lot of techniques one could employ. Here's a short list from ITK (requires some drilling down to get to actual implementations). Deep learning is also commonly employed here.

My temptation would be to treat segmentation more-or-less as a blackbox with a standard API. IOW one supplies an image and get out a label image (objects identified). Alternatively we could imagine this being two steps. Namely we get a mask of interesting portions of the image selected and then label those regions based on connectivity. Though we probably don't want to get much more granular than that. Users will already be more familiar with what segmentation algorithms work well on their data. This is just my opinion though. Others may have thoughts here. 🙂

We can certainly pick one from ITK to make things a bit more concrete. If you have thoughts on this, @rxist525, or if others have thoughts on this, that would be useful.

Juan Nunez-Iglesias · Answer 2 · Wed Aug 07 2019 21:32:48 GMT+0800 (China Standard Time)

Segmentation is probably a bit trickier than deconv... You probably want map_overlap with sufficient overlap to have "one full segment" in the overlap zone.

jakirkham · Answer 3 · Thu Aug 08 2019 08:14:46 GMT+0800 (China Standard Time)

Yeah it depends on what our goal is here. With @rxist525's data, this would still be map_blocks as full frames fit comfortably in-memory. Generally this is not true though and map_overlap would be required (amongst other things). Thoughts on the intended scope here, @mrocklin?

Matthew Rocklin · Answer 4 · Thu Aug 08 2019 08:17:52 GMT+0800 (China Standard Time)

So far we've been working on a particular dataset. I suggest that we just stick with that

…

On Wed, Aug 7, 2019 at 5:14 PM jakirkham ***@***.***> wrote: Yeah it depends on what are goal is here. With @rxist525 <https://github.com/rxist525>'s data, this would still be map_blocks as full frames fit comfortably in-memory. Generally this is not true though and map_overlap would be required (amongst other things). Thoughts on the intended scope here, @mrocklin <https://github.com/mrocklin>? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#47?email_source=notifications&email_token=AACKZTHLRTJYHKFCUEHGMS3QDNQPNA5CNFSM4IJMITPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD32BSQI#issuecomment-519313729>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AACKZTBSWKZHOIZ6GGFM7CTQDNQPNANCNFSM4IJMITPA> .

Gokul U · Answer 5 · Thu Aug 08 2019 11:36:55 GMT+0800 (China Standard Time)

Lots of things to unpack here. Two main points to focus on are types of segmentation and scalability. While the type of data that you are currently working on will fit into memory, some of the operations (like tensor voting) would speed up geometrically with smaller volumes. On the other hand, we do have datasets (4D instead of 5D) that are several terabytes for a single volume.
https://www.biorxiv.org/content/10.1101/374140v1.supplementary-material
Perhaps a quick chat this week to discuss/brainstorm a few avenues to explore?

jakirkham · Answer 6 · Thu Aug 08 2019 22:07:08 GMT+0800 (China Standard Time)

Previously @jni and I worked on extending connected components to larger than memory data. ( dask/dask-image#94 ) It was doable, but did require a few tricks and generated basically a module worth of code. Connected components was probably easier than segmentation will be as it was clearer (though not exactly straightforward) to resolve pixels near chunk boundaries. With segmentation I'd naively suspect resolving pixels near chunk boundaries is somewhat dependent on the segmentation algorithm used. While it could be interesting to explore this, I'm somewhat inclined to agree with @mrocklin that this is out of scope for a blogpost. It might be a reasonable PR though. Would you agree or do you have other thoughts?

Gokul U · Answer 7 · Fri Aug 09 2019 02:16:51 GMT+0800 (China Standard Time)

I agree.

…

On Thu, Aug 8, 2019 at 7:07 AM jakirkham ***@***.***> wrote: Previously @jni <https://github.com/jni> and I worked on extending connected components to larger than memory data. ( dask/dask-image#94 <dask/dask-image#94> ) It was doable, but did require a few tricks and generated basically a module worth of code. Connected components was probably easier than segmentation will be as it was clearer (though not exactly straightforward) to resolve pixels near chunk boundaries. With segmentation I'd naively suspect resolving pixels near chunk boundaries is somewhat dependent on the segmentation algorithm used. While it could be interesting to explore this, I'm somewhat inclined to agree with @mrocklin <https://github.com/mrocklin> that this is out of scope for a blogpost. It might be a reasonable PR though. Would you agree or do you have other thoughts? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#47?email_source=notifications&email_token=ACE3CWN4JVKIU3CKQGE6IV3QDQSA3A5CNFSM4IJMITPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD33XGII#issuecomment-519533345>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ACE3CWJ53TF4AI6PRVBACRTQDQSA3ANCNFSM4IJMITPA> .

Matthew Rocklin · Answer 8 · Wed Aug 14 2019 20:16:40 GMT+0800 (China Standard Time)

Perhaps a quick chat this week to discuss/brainstorm a few avenues to explore?

Sorry to miss this @rxist525 I'm up for chatting. I think that people like @jakirkham and @thewtex are probably more important to have in that conversation though.

Matthew Rocklin · Answer 9 · Wed Aug 14 2019 21:32:52 GMT+0800 (China Standard Time)

@rxist525 also, I'm curious at what point you or someone you know might be interested in getting engaged in the development process here. Using the last post as a model, my guess is that embarrassingly parallel map_blocks + ITK workloads are straightforward enough that someone who is comfortable with Numpy and ITK could start engaging.

There would almost surely be problems, but we could work through those together.

From my perspective we're mostly bound by people who know what imaging routines to run being busy, rather than by people who understand Dask.

jakirkham · Answer 10 · Thu Aug 15 2019 00:13:40 GMT+0800 (China Standard Time)

Yeah just to unpack @mrocklin's point a bit further and following up on a related conversation with @thewtex, I think one can get pretty far with a wrapper function that looks like this. Now one can use map_blocks with whatever one wants. We could do a similar thing with the Numba example at the end of the last blogpost to simplify this further and thus avoid even the use of map_blocks.

def itk_udf_wrapper(udf, img, udf_args, udf_kwargs):
    """ Apply user-defined function to a single chunk of data """
    import itk

    img = img[0, 0, ...]  # remove leading two length-one dimensions
    image = itk.image_view_from_array(img)   # Convert to ITK object

    op_result = udf(image, *udf_args, **udf_kwargs)  # Call ITK-based user function

    result = itk.array_from_image(op_result)  # Convert back to Numpy array
    result = result[None, None, ...]  # Add back the leading length-one dimensions

    return result

jakirkham · Answer 11 · Thu Aug 15 2019 00:18:26 GMT+0800 (China Standard Time)

That said, the region properties side of the story may be a bit more interesting.

Gokul U · Answer 12 · Thu Aug 15 2019 09:06:51 GMT+0800 (China Standard Time)

@rxist525 also, I'm curious at what point you or someone you know might be interested in getting engaged in the development process here. Using the last post as a model, my guess is that embarrassingly parallel map_blocks + ITK workloads are straightforward enough that someone who is comfortable with Numpy and ITK could start engaging.

There would almost surely be problems, but we could work through those together.

From my perspective we're mostly bound by people who know what imaging routines to run being busy, rather than by people who understand Dask.

@mrocklin Great question - and good timing. I think we are just finished with our infrastructure setup that we can engage more seriously. We should setup a time to chat next week with my postdoc @ruanxt present. let's move this discussion to email?

Matthew Rocklin · Answer 13 · Wed Aug 28 2019 02:22:10 GMT+0800 (China Standard Time)

I was chatting with @sofroniewn last week and he mentioned that it might also be interesting to try using a trained PyTorch model for segmentation. I would personally be fine with anything. This might also be a nice addition because this sequence would then have used all of scikit-image, ITK, and Torch to scale out image processing, which would be a nice Trifecta.

From my perspective neither scaling nor writing about this is the bottleneck. The bottleneck to doing this is just finding someone who can write up the imaging side of this, probably as a notebook, in such a way that I or someone else can then scale out easily without knowing too much about imaging.

Matt McCormick · Answer 14 · Wed Aug 28 2019 05:33:25 GMT+0800 (China Standard Time)

Are there any labels of this dataset that can be used for generation of the PyTorch model?

Matthew Rocklin · Answer 15 · Wed Aug 28 2019 05:38:11 GMT+0800 (China Standard Time)

The sense I got from Nick was that he may already have had a pre-trained model

…

On Tue, Aug 27, 2019 at 2:33 PM Matt McCormick ***@***.***> wrote: Are there any labels of this dataset that can be used for generation of the PyTorch model? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#47?email_source=notifications&email_token=AACKZTAYD2O76CJPJZ4YSK3QGWMSLA5CNFSM4IJMITPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5JF6QY#issuecomment-525492035>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AACKZTHLNUVZTCSX6GCX6STQGWMSLANCNFSM4IJMITPA> .

Gokul U · Answer 16 · Thu Aug 29 2019 14:34:57 GMT+0800 (China Standard Time)

We have a couple of datasets with curated labels, if necessary.

Matthew Rocklin · Answer 17 · Wed Sep 18 2019 23:14:34 GMT+0800 (China Standard Time)

@sofroniewn produced this notebook that uses some custom code around PyTorch to perform pixel-featurization. It might be a good base for a next step here.

https://gist.github.com/sofroniewn/2e1d5068a979e4393fd549dff675d543

Matthew Rocklin · Answer 18 · Wed Sep 18 2019 23:17:29 GMT+0800 (China Standard Time)

@sofroniewn two questions about your notebook:

These lines are interesting:

    # remove leading two length-one dimensions
    img = image[0, 0, ...]
    
    # make sure image has four dimentions (b,c,w,h)
    img = np.expand_dims(np.expand_dims(img, 0), 0)
    img = np.transpose(img, (1,0,2,3))

You take away two dimensions, and then seem to add them back in. Was this just because of copy-pasting from the previous post, or was this intentional in some way.

Is there a nice way to produce an image from the output data here? One of the things that I think we could have done better on with the deconvolution post was providing a before-and-after comparison. Presumably pixel-level segmentation would also provide some eye-candy.

Nicholas Sofroniew · Answer 19 · Thu Sep 19 2019 00:07:15 GMT+0800 (China Standard Time)

@mrocklin - i wasn't sure what to do with those two lines, removing then adding back in the dimensions. On the one hand they accomplish nothing, but in many ways that is by chance and the reasons they are in there (if explained!) might be informative to people.

My understanding is that the first two lines where we drop the leading two dimensions are because of the shape of the original dask array and the chunking - we have a 4D array but we're interested in extracting 2D images so we drop those first two length-one dimensions. If we'd had a 5D array of data we would have had to drop the first three length-one dimensions.

The adding of dimensions back in corresponds to getting the data in a shape that PyTorch expects, which is Batch x Channel x Width x Height and here Batch and Channel are just 1, so in this example we go 4D -> 2D -> 4D, but we might have had to go 5D -> 2D -> 4D.

Maybe all that is just more confusing than necessary and we can leave the comment # make sure image has four dimentions (b,c,w,h) without actually running that code. Also that transpose doesn't seem to do anything either in this case (which was a copy and paste type of situation!) and so can be dropped too.

As to producing images of the output data, yes it's possible - the output data can actually be used to produce 16 images of the different features, but they don't look so great. Maybe I'll think of something else to show. I agree though a before / after image would be nice.

Matthew Rocklin · Answer 20 · Thu Sep 19 2019 00:09:37 GMT+0800 (China Standard Time)

the output data can actually be used to produce 16 images of the different features, but they don't look so great.

Is it possible to have the model produce not 16 features, but 3 and then map those to RGB?

Matthew Rocklin · Answer 21 · Thu Sep 19 2019 00:11:48 GMT+0800 (China Standard Time)

but in many ways that is by chance and the reasons they are in there (if explained!) might be informative to people.

Yes, I think that regardless we should explain the dimensions in each chunk, and map them in a way that users understand. My hope is that we can do this with a fair amount of prose, similar to what you have provided in your recent comment :)

Nicholas Sofroniew · Answer 22 · Thu Sep 19 2019 00:17:40 GMT+0800 (China Standard Time)

I could also just pick 3 random features from the 16 to show as rgb. I'll play around and see what looks good.

As a more general comment to explain how the featurization i'm doing with the UNet relates to the segmentation - on other datasets i've been using those learned features along with some minimal human input to then perform a random forrest based semantic-segmentation like ilastic does. Once you've got the random forrest then you can easily apply that with map_blocks too, so I could provide the random-forrest and then go all the way to a multiclass semantic segmentation.

The only slight downside is that the data is much more suited to instance segmentation (finding the individual cells) and not semantic segmentation (classifying pixels) as most pixels are either cell or background and a relatively simple threshold can tell those apart. Instance segmenation is just a bit harder and I don't have anything out of the box right now that would do great. I'll think about it more, curious what direction anyone else wants to see the blog post go in

Matthew Rocklin · Answer 23 · Fri Oct 11 2019 02:54:45 GMT+0800 (China Standard Time)

@sofroniewn , friendly poke :)

(but no pressure, you have an actual job, I know)

Genevieve Buckley · Answer 24 · Wed Feb 03 2021 06:02:50 GMT+0800 (China Standard Time)

Hi @sofroniewn!
I really like your draft blog post in this gist and think it'd be very useful for other people doing similar work too. What do you think needs to happen so it can be published in the blog? (I'm very happy to help out with tidying up & editing, if you're swamped for time)

Matthew Rocklin · Answer 25 · Wed Feb 03 2021 06:28:42 GMT+0800 (China Standard Time)

+10

…

On Tue, Feb 2, 2021 at 2:03 PM Genevieve Buckley ***@***.***> wrote: Hi @sofroniewn <https://github.com/sofroniewn>! I really like your draft blog post in this gist <https://gist.github.com/sofroniewn/2e1d5068a979e4393fd549dff675d543> and think it'd be very useful for other people doing similar work too. What do you think needs to happen so it can be published in the blog? (I'm very happy to help out with tidying up & editing, if you're swamped for time) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#47 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AACKZTDXAIIKYVBIZMFO473S5BZBZANCNFSM4IJMITPA> .

Nicholas Sofroniew · Answer 26 · Wed Feb 03 2021 07:18:50 GMT+0800 (China Standard Time)

I really like your draft blog post in this gist and think it'd be very useful for other people doing similar work too. What do you think needs to happen so it can be published in the blog? (I'm very happy to help out with tidying up & editing, if you're swamped for time)

Wow, i'll be honest I completely forgot about this!! eek, my bad!! That demo just did some basic featurization, I think I'd revisit it now to do actual segmentation using either stardist or cellpose (or both!). The movie will look much cooler!!

One example I was thinking of was actually using a multiscale histology image as input and doing segmentation lazily, but that's quite tricky as you have to think about different scales.

Curious if you have ideas @GenevieveBuckley on what would be cool? I'm happy to pick this back up together, I'd love the support around tidying/ editing, but I'm also happy to make one more push to make it a little more interesting/ useful to people compared to where it is at right now

Genevieve Buckley · Answer 27 · Thu Feb 04 2021 10:34:37 GMT+0800 (China Standard Time)

I'd want to err on the side of putting something out relatively soon, since we can always edit blogposts (or make another follow up post).

I like the idea of combining it with stardist/cellpose. That would be very useful for other people to see. If it can be included without too much extra work I'm all for it, otherwise we can have that be a standalone follow up post.

As I've said, if there's any way I can help here just let me know. You have my email & we can always videochat if it's useful :)

Genevieve Buckley · Answer 28 · Thu Feb 04 2021 10:50:41 GMT+0800 (China Standard Time)

Emailed you @sofroniewn

Genevieve Buckley · Answer 29 · Thu Feb 25 2021 10:03:36 GMT+0800 (China Standard Time)

Had a chat with @sofroniewn today, he's going to make a PR in the next couple of days (this comment is mostly for future-me, who often forgets what happened when if it's not logged in a comment thread)

Genevieve Buckley · Answer 30 · Thu Mar 04 2021 06:30:10 GMT+0800 (China Standard Time)

Perhaps the basic segmentation pipeline I wrote for SciPy Japan would be a good fit here? I have a to-do item in my notes about adding it to the dask-examples repository, but maybe we could do both.

You can see the segmentation example if you scroll down here

jakirkham · Answer 31 · Thu Mar 04 2021 06:46:50 GMT+0800 (China Standard Time)

That would be great! 😄

Chris Roat · Answer 32 · Tue Mar 30 2021 11:53:30 GMT+0800 (China Standard Time)

Do you all have best practices on how are labels can be created and then merged across chunks? The segmentation section of the recent blogpost uses a subset of data, so it doesn't look like a chunked situation may get handled well.

For our pipeline, we had to sequentially do 2 very funky map_overlaps -- avoiding trimming, doing array gymnastics, and using block_info to make sure chunks used unique label ranges. This allowed the 2nd map_overlap to peek into neighboring chunks and merge labels across nearest neighbor chunks.

(To fit on a GPU, my blocks are 256x256x256 and take 40 minutes to segment using cellpose. Our datasets can be approx 500 x 20k x 20k.)

jakirkham · Answer 33 · Tue Mar 30 2021 12:11:45 GMT+0800 (China Standard Time)

Yes we implement the SciPy-equivalent of label in dask-image 😄

Currently this is CPU-only, but could be done on the GPU with some additions to CuPy ( cupy/cupy#4054 ) and corresponding changes in dask-image

Chris Roat · Answer 34 · Tue Mar 30 2021 12:36:26 GMT+0800 (China Standard Time)

I like what you've done, and I think I can re-use some of what you've done with the connectivity graphs to avoid my 2 map_overlaps. I'm going to examine that code more.

How much of the code is serial (which I likely need to avoid)? I see the labelling does the chunks serially so that the labels do not overlap. Is anything else serial?

One thing I did to avoid the label collision and run my chunks fully in parallel is use int64 for the labels. I run naive labelling (which is int32), and then just add in a unique chunk_id (derived from block_info) in the upper 32 bits. You could then relabel back to int32 in the end, if desired.

jakirkham · Answer 35 · Tue Mar 30 2021 13:11:58 GMT+0800 (China Standard Time)

Well hopefully it can be used directly and we can work on improvements together 😉

I think it is just resolving collisions.

What happens when a label spans multiple chunks?

Chris Roat · Answer 36 · Tue Mar 30 2021 13:34:20 GMT+0800 (China Standard Time)

Our labels can span multiple chunks, but only direct neighbors (i.e. in 3d, a label may appear in 8 chunks). So the use case might be easier in some sense -- it means we don't have to block for all chunks to be processed for global information. That said, blocking may be a good trade-off in exchange for a simpler algorithm.

Perhaps the only insight I have to contribute back is avoiding the serial labelling of chunks by temporarily using int64 that include block_ids. Parallel processing is crucial in our case.

I think the adjacency graph could be useful in image stitching, as well. It may be hard to implement in full generality, but a useful narrow subset of use cases (where the movements are less than a chunk size) may be feasible.

Juan Nunez-Iglesias · Answer 37 · Wed Mar 31 2021 10:04:35 GMT+0800 (China Standard Time)

@chrisroat

Perhaps the only insight I have to contribute back is avoiding the serial labelling of chunks by temporarily using int64 that include block_ids. Parallel processing is crucial in our case.

Very clever! But the labeling is in parallel, the only serial bit is small in the scheme of things?

At any rate, I don't remember exactly how we do the label reassignment after the correspondence graph, but I suspect it might use a NumPy array with the labels as indexing, which would be huge if you use that unique-labels strategy. (Update: indeed we do.) Also, the graph we make is a scipy.sparse.csr_matrix, which likewise needs O(max_label) storage to store the indptr array.

So it's not as minor a tweak as one would initially suspect...

Chris Roat · Answer 38 · Wed Mar 31 2021 11:57:49 GMT+0800 (China Standard Time)

I see the why I'm talking past you all -- the ndmeasure label function labels connected components, but other labelling strategies I'm imagining (i.e watershed, cell segmentation) are more expensive and parallelization is useful. Sorry!

In this case, the labelling is fairly inexpensive and may scale well -- though I don't know... perhaps there are nasty edge cases. It is done serially, as the labels in block N+1 start where the labels in block N end.

Enforcing a requirement that the labels be sequential is not always necessary. The list of labels is not always needed; and even if it is, keeping a list of labels is not expensive compared to the image size.

Would a O(num_labels) relabelling strategy be feasible if non-sequential labels were allowed? Or is there some vectorization speed-up in the CSR approach that you would lose?

jakirkham · Answer 39 · Wed Mar 31 2021 14:27:31 GMT+0800 (China Standard Time)

Adding watershed is definitely of interest (though tricky!). Also generalizing what we have so it can be a framework to plugin other segmentation operations that can operate on the chunk level makes sense as well. If you have ideas on what is useful, that would be good to discuss those as well

Chris Roat · Answer 40 · Wed Mar 31 2021 16:03:07 GMT+0800 (China Standard Time)

I actually misread the labelling as being inside a code loop, but it's delayed and runs in parallel.

In the case of cell segmentation, two or more regions from neighboring chunks could meet at a chunk boundary and should not be merged -- so it's hard to use the two-pixel wide boundary slices in this algorithm to merge regions. Some (but not all) cell segmentation has the advantage that the segmentation is compact and the max extent of a region can be roughly known. Ours is compact and runs in two stages -- it seems to work OK:

map_overlap with enough depth to capture a full region (plus more, as the segmentation algorithm is sensitive at the image edges). Any segment is discarded if its center of mass lies outside the nominal chunk boundary (i.e a region is centered in the overlap region, but a small fraction crosses into the nominal chunk).
a careful map_overlap over the untrimmed output of the previous pass allows a chunk to overwrite itself with boundary-crossing segments from neighboring chunks. this was a bit mind-bending for me to write, but I didn't think of a better way.

The second pass might benefit from the ideas here about slicing around boundaries, and creating an adjacency graph. But it seems difficult to me at the moment -- a corner of a chunk might get input from 7 of its neighbors.

Juan Nunez-Iglesias · Answer 41 · Wed Mar 31 2021 17:56:58 GMT+0800 (China Standard Time)

I actually misread the labelling as being inside a code loop, but it's delayed and runs in parallel.

Yup. 😊

Would a O(num_labels) relabelling strategy be feasible if non-sequential labels were allowed? Or is there some vectorization speed-up in the CSR approach that you would lose?

Yes, what we would lose is the pre-existing implementation of connected components using scipy.sparse.csgraph. We'd have to do our own thing, or use NetworkX. The point is that scipy.sparse.csgraph uses the scipy.sparse.csr_matrix as the data structure, and that contains an indptr array that has length (max_label+1). And actually I think it's int32! 😂

In [1]: import numpy as np
In [2]: from scipy import sparse
In [3]: n = 10                                                                                                                
In [4]: mat = sparse.csr_matrix(np.eye(n, n, n-1))                                                                            
In [5]: mat.indptr                                                                                                            
Out[5]: array([0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], dtype=int32)

I think a mapping between 64-bit labels and int32 indices is probably the fastest way to adapt this.

Juan Nunez-Iglesias · Answer 42 · Wed Mar 31 2021 17:57:53 GMT+0800 (China Standard Time)

Generally, the graph of overlapping labels is a useful concept and indeed extends generally to non-identical segmentation stitching — you just need some overlap threshold. That would be a fun project. =)

jakirkham · Answer 43 · Thu Apr 01 2021 04:30:06 GMT+0800 (China Standard Time)

Thinking about this point on avoiding sequential label offset updates and Chris' suggestion. I guess it is arbitrary where we account for the chunk's integer. IOW we don't necessarily need to put it in the highest bits. Instead we could put it in the lowest bits and just do a shift of the labels after to make room for the block integer. This should still be uint32 friendly (unless we really have so many chunks and labels within a chunk to need the promotion to uint64, which should hopefully be rare)

Juan Nunez-Iglesias · Answer 44 · Thu Apr 01 2021 05:43:16 GMT+0800 (China Standard Time)

Great idea @jakirkham! ❤️ I've summarised this discussion in dask/dask-image#199, which is probably a more useful place to keep this information long term. 😊 (Props to @GenevieveBuckley who has modeled good discussion-archiving practices for me often. 😅)

jakirkham · Answer 45 · Thu Apr 01 2021 06:06:05 GMT+0800 (China Standard Time)

Awesome thanks for doing that Juan! 😄

Jacob Tomlinson · Answer 46 · Mon Jun 27 2022 23:49:15 GMT+0800 (China Standard Time)

I'm going to mark this as closed by #82.