Does cc3d also work with memmory-mapped numpy arrays and array-like data?

Question

Does cc3d also work with memmory-mapped numpy arrays and array-like data?

Karol-G opened this issue a year ago · comments

Hey,

Thanks for this really cool package. By now, I am using it quite frequently and it is simply awesome!

I often work with larger-than-memory 3D images and the only solution is often to memory-map them as a numpy/zarr array in order to process them.

Is cc3d capable of working with memory-mapped numpy arrays or even array-like data (Zarr, Dask, Tensor, Xarray, ...)? Or will it simply throw an exception or convert it internally to an in-memory numpy array?

If it is not possible are you aware of other libraries or approaches that could perform operations such as connected component analysis on larger-than-memory data (either memory-mapped or via patchification)?

Best,
Karol

William Silversmith · Answer 1 · Thu Jun 01 2023 23:54:00 GMT+0800 (China Standard Time)

Hi Karol!

Thanks for the kind words! Currently cc3d works with arrays internally and doesn't have the ability to work with mmapped arrays. Even if it did, the output array will still be at least uint16, uint32, or uint64 depending on your data, so the output alone might be too much (I suppose it could be modified to use mmapped files for the output too).

However, with respect to larger-than-RAM files, I do have a (beta) solution for that if 6-way connectivity is sufficient. Check out https://github.com/seung-lab/igneous#connected-components-labeling-ccl-beta which has the ability to independently process image cutouts in parallel and perform CCL labeling. You'd have to first convert your image into a Neuroglancer Precomputed format using CloudVolume https://github.com/seung-lab/cloud-volume. However, then you'd be able to visualize your data too.

One warning. This procedure passes my automated tests and I've used it for pretty big volumes successfully. I did see it screw up on one very large volume and I haven't figured out why yet, but odds are it will work fine for you.

William Silversmith · Answer 2 · Fri Jun 02 2023 02:38:02 GMT+0800 (China Standard Time)

The memory mapped numpy array does sound interesting. It would be very nice to just direct people to using that for very large volumes. I'll have to play around with this.

Karol Gotkowski · Answer 3 · Fri Jun 02 2023 18:07:05 GMT+0800 (China Standard Time)

Hey William,

The memory mapped numpy array does sound interesting. It would be very nice to just direct people to using that for very large volumes. I'll have to play around with this.

It would certainly be awesome, if cc3d would be compatible with memory-mapped arrays in the future. For large images speed is often less relevant than memory consumption. So even if cc3d functions would be slower when applied on memory-mapped data that shouldn't be a big issue.

However, with respect to larger-than-RAM files, I do have a (beta) solution for that if 6-way connectivity is sufficient. Check out https://github.com/seung-lab/igneous#connected-components-labeling-ccl-beta which has the ability to independently process image cutouts in parallel and perform CCL labeling. You'd have to first convert your image into a Neuroglancer Precomputed format using CloudVolume https://github.com/seung-lab/cloud-volume. However, then you'd be able to visualize your data too.

This also sounds interesting, but this solution has probably too much pre- and postprocessing overhead and conversion of the images between different formats for my usecases.

I also had the idea to do CCL in a sliding-window manner. The image would be patchified, every patch labeled via CCL and then assembled again. In the naive version of this approach, components that strech over multiple patches would be split.
To prevent this, a 1 pixel patch overlap could be introduced in order to propagate labels of components that strech over multiple patches from the previous patch. It would also require to have a running counter of the number of components and use this counter to incremeant the labels of components in every new patch.

Does this approach make sense to you? Are there problems that I have not considered?

Best,
Karol

William Silversmith · Answer 4 · Sat Jun 03 2023 05:18:10 GMT+0800 (China Standard Time)

Hi Karol,

I just gave it a try with an mmapped input file and it seems to work. However, it outputs to an in-memory array which can be several times bigger since its 2-8 bytes per voxel. The union-find data structure will also be in memory, but usually it is 10x to 100x smaller than the input image so I'm not as worried about it.

William Silversmith · Answer 5 · Sat Jun 03 2023 05:19:52 GMT+0800 (China Standard Time)

As for the strategy you suggested, yes, that's pretty much what I did in Igneous. It seems to work. 6-connected is much easier to implement than 26-connected in that scheme. If that interests you, you can look at the CCL code in Igneous for tips on implementing it. I may also just add mmap support shortly so give me a day or two before trying that.

William Silversmith · Answer 6 · Sat Jun 03 2023 07:51:45 GMT+0800 (China Standard Time)

Hi Karol,

I implemented mmap output, and it already worked with mmap input. You can try experimenting with the master branch. I will also be releasing 3.11.0 shortly and you will be able to get it on PyPI. Check the front page README examples for how to use it. Let me know if you have any feedback!

Karol Gotkowski · Answer 7 · Tue Jun 06 2023 17:22:58 GMT+0800 (China Standard Time)

Hey,

Thank you a lot! I first tested it with cc3d.connected_components and it worked like a charm. The memory consumption is virtually non-existent with only 1-2 GB max when running it on a (2598, 2833, 2857) uint8 array (~21 GB). The function estimated that uint32 would be needed for the output and created a memory-mapped output array with a size of 84 GB. Uint16 would actually be completely fine as there are about ~40.000 components, but that is another topic ;) The connected_component function took maybe ~6 min, which is not a problem for my use case.

Something I noticed is that the 'r+' mode is required with just the 'r' mode raising an exception when running cc3d.connected_components. Does this mean that the method modifies the input array? This is not a problem in my current code but would be something to keep in mind if that is the case.

I then ran cc3d.statistics with the memory-mapped output, which sadly ran seemingly forever and I had to quit it after 2-3 hours. On the plus side, it essentially did not consume any memory.
Do you think this could be a bug or is it simply very slow when using a memory-mapped array?

Best,
Karol

William Silversmith · Answer 8 · Tue Jun 06 2023 22:37:50 GMT+0800 (China Standard Time)

Hi Karol, great to hear that it (partially) worked! The input array is not modified, but maybe flipping settings on the numpy array causes a weird interaction. I can check it out, but the input is safe. The algorithm creates a set of provisional labels that are later resolved. Empirically, I've found that the number of provisional labels is about 10x the final labels, so in this case uint32 is correct. The dtype is determined using a fast upper bound calculation that generally gives a result about 10x the number of provisional labels (and is guaranteed to overestimate so no out of bounds access on union find occurs). As for statistics, I knew I should have tested that. ;) Will take a look. Will

…

On Tue, Jun 6, 2023, 5:23 AM Karol Gotkowski ***@***.***> wrote: Hey, Thank you a lot! I first tested it with cc3d.connected_components and it worked like a charm. The memory consumption is virtually non-existent with only 1-2 GB max when running it on a (2598, 2833, 2857) uint8 array (~21 GB). The function estimated that uint32 would be needed for the output and created a memory-mapped output array with a size of 84 GB. Uint16 would actually be completely fine as there are about ~40.000 components, but that is another topic ;) The connected_component function took maybe ~6 min, which is not a problem for my use case. Something I noticed is that the 'r+' mode is required with just the 'r' mode raising an exception when running cc3d.connected_components. Does this mean that the method modifies the input array? This is not a problem in my current code but would be something to keep in mind if that is the case. I then ran cc3d.statistics with the memory-mapped output, which sadly ran seemingly forever and I had to quit it after 2-3 hours. On the plus side, it essentially did not consume any memory. Do you think this could be a bug or is it simply very slow when using a memory-mapped array? Best, Karol — Reply to this email directly, view it on GitHub <#109 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AATGQSKTLOCVASSGAZTGQWDXJ3ZH3ANCNFSM6AAAAAAYWZD4SQ> . You are receiving this because you commented.Message ID: ***@***.***>

William Silversmith · Answer 9 · Wed Jun 07 2023 02:24:29 GMT+0800 (China Standard Time)

So, I looked into it. I think it's actually working, but very slow b/c it's swapping a lot. I was being cautious and used large data types assuming a reasonable number of regions, but if you have a noise dataset, then it takes 8 * 6 * N bytes, which in the case of a 1000^3 noise dataset with 742769605 regions, ends up being 35 GB for the bounds alone. When I pick a dataset small enough to not swap too much (e.g. 700^3), the calculation is quick. When it is big, it is very slow and may get killed by OOM.

What I can do is use tighter data types (uint16 for bounds, uint32 for label counts, and float for centroids), give an option for skipping converting the bounds into slices, and make sure to iterate in the contiguous direction. Do you know how many regions you have and how much RAM you've been using for statistics?

William Silversmith · Answer 10 · Wed Jun 07 2023 08:07:22 GMT+0800 (China Standard Time)

Hi Karol,

I did some memory and performance optimization on cc3d.statistics if you get the latest version 3.12.0. You can give it another shot and see if it does any better.

Karol Gotkowski · Answer 11 · Tue Jun 13 2023 15:43:31 GMT+0800 (China Standard Time)

Hey,

Sorry for the late reply and thanks a lot for the optimization! I am at a conference this week and will only be able to test it next week. I will give you an update then :)

Karol Gotkowski · Answer 12 · Tue Jun 20 2023 19:23:39 GMT+0800 (China Standard Time)

Hey,

I was finally able to test it on cc3d.statistics and it runs perfectly now!
The memory consumption was neglectable and the runtime was 309s for a (2598, 2833, 2857) array with ~40.000 components.
This essentially enables my pipeline to be memory efficient throughout every stage without major bottlenecks. Thank you a lot!

Best,
Karol

William Silversmith · Answer 13 · Wed Jun 21 2023 01:46:36 GMT+0800 (China Standard Time)

Hi Karol! I'm so glad that worked for you. The number of components you have is quite small (even 10 * 10 * 40.000 is only 4 MB) so I suspect that what made the difference was how I made the loop change the scan order based on C or Fortran order. It seems likely that the non-sequential access was very slow when working and changing to more sequential access improved things. Regardless, I'm glad it worked! Will

…

On Tue, Jun 20, 2023 at 7:23 AM Karol Gotkowski ***@***.***> wrote: Hey, I was finally able to test it on cc3d.statistics and it runs perfectly now! The memory consumption was neglectable and the runtime was 309s for a (2598, 2833, 2857) array with ~40.000 components. This essentially enables my pipeline to be memory efficient throughout every stage without major bottlenecks. Thank you a lot! Best, Karol — Reply to this email directly, view it on GitHub <#109 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AATGQSLWPHCETC7ICIT4PU3XMGB4NANCNFSM6AAAAAAYWZD4SQ> . You are receiving this because you commented.Message ID: ***@***.***>

Karol Gotkowski · Answer 14 · Wed Jun 21 2023 16:12:50 GMT+0800 (China Standard Time)

Hey,

Glad to know that the fix was not too complicated! From my side, the issue is solved. Thank you again for your outstanding support :)

Best,
Karol