ValueError for the big 2d image in the get_frame method

Question

ValueError for the big 2d image in the get_frame method

GinYoshida opened this issue 2 years ago · comments

GinYoshida commented 2 years ago

nd2 version: 0.3.0
Python version: 3.10
Operating System:Window10 (RAM size is 15 GB)

Description

I would like to slice the big image data.

Code

dask_array = nd2.imread(file_path, dask=True)
dask_array =dask_array[0,0:100,0:100,:]
result_ndarray = dask_array.compute()

Error

  File "C:\{my environment}\.venv\lib\site-packages\nd2\nd2file.py", line 510, in _get_frame
    frame.shape = self._raw_frame_shape
ValueError: cannot reshape array of size 50059620352 into shape (26420,37152,1,3)

What I Did

Tried to compute the following nd2 file.

The size information

Attributes(bitsPerComponentInMemory=8, bitsPerComponentSignificant=8, componentCount=3, heightPx=26420, pixelDataType='unsigned', sequenceCount=17, widthBytes=111456, widthPx=37152, compressionLevel=None, compressionType=None, tileHeightPx=None, tileWidthPx=None, channelCount=1)

Note

Another trial

I tried another file and it was working well.

Attributes(bitsPerComponentInMemory=8, bitsPerComponentSignificant=8, componentCount=3, heightPx=5530, pixelDataType='unsigned', 
sequenceCount=16, widthBytes=15984, widthPx=5328, compressionLevel=None, compressionType=None, tileHeightPx=None, tileWidthPx=None, channelCount=1)
5530, 5328, 1, 3

Question

_get_frame in nd2file.py seems to require a big memory if the data is huge in width and height due to converting it to ndarray?

My status

Sorry to day, I'm a beginner at Python. Just using the debugger and running a straightforward script is the maximum that I can do.
Please inform me what you would like to make me do some more investigation.

Talley Lambert · Answer 1 · Mon Aug 15 2022 19:25:57 GMT+0800 (China Standard Time)

Thanks for the detailed issue @GinYoshida :) very helpful.

without having access to the file itself, I'm not immediately sure 🤔
That number 50059620352 is 17.000092544233993 times the size of the number of elements in shape: (26420,37152,1,3) which is super close to sequenceCount, so my main question is whether this is a somehow corrupt file/frame that we need to handle more gracefully, or something else.

Can you try something for me? use the read_using_sdk flag and let me know if it works for that file

dask_array = nd2.imread(file_path, dask=True, read_using_sdk=True)
dask_array =dask_array[0,0:100,0:100,:]
result_ndarray = dask_array.compute()

_get_frame in nd2file.py seems to require a big memory if the data is huge in width and height due to converting it to ndarray?

yeah, unfortunately, I haven't yet implemented subframe chunking. The SDK doesn't provide it directly (i.e. you must read a full 2D + channels chunk of data before cropping), but it's on the list of things to do. It shouldn't be too hard to do this at the level of the mmap around here. Will add a new issue to track progress on that

GinYoshida · Answer 2 · Tue Aug 16 2022 05:15:36 GMT+0800 (China Standard Time)

@tlambert03
Thank you for your very quick reply.

Conclusion

Using read_using_sdk is working very well. Really appreciate!

nd2.imread(file_path, dask=True, read_using_sdk=True)

File topic

I see your worry. We cannot share the file. it must be very hard for several issues without reproducing the phenomenon on your side.
If we find some condition to get this kind of unique data, which is not confidential, we will share it.

Notice the new issue with the memory

I also appreciate your action.
I hope the demand is not too low for other people.

Talley Lambert · Answer 3 · Tue Sep 13 2022 21:04:00 GMT+0800 (China Standard Time)

hi @GinYoshida, you might give this another try after version 0.4.4 ... I'm not certain if it will fix your issue (when not using read_with_sdk=True without seeing the file itself. but it might?)

Since this issue is hard to tackle without the actual file, and since you have a workaround using the sdk reader, I'm going to close this issue, and see #85 for the sub-frame chunking. Feel free to re-open or comment with additional questions