cgohlke / imagecodecs

Image transformation, compression, and decompression codecs

Home Page:https://pypi.org/project/imagecodecs

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

IMCD_LZW_TABLE_TOO_SMALL exception with large image

rossbar opened this issue · comments

I'm not sure this is the correct place to open this issue, as the original exception arose while using dask_image to attempt to lazily load a large (32 x 50k x 51k) ome tiff.

Reproducer + traceback

Update: See subsequent comment for simpler example w/out dask

>>> import dask_image.imread
>>> import numpy as np
>>> img_fname = "foo.ome.tif"
>>> x = dask_image.imread.imread(img_fname)
>>> x.shape
(32, 49380, 51055)
>>> x.dtype
dtype('uint16')
>>> img = np.array(x[-1, ...])  # Attempt to lazily load the last channel
Traceback (most recent call last)
   ...
ImcdError: imcd_lzw_decode returned IMCD_LZW_TABLE_TOO_SMALL
Full traceback ImcdError Traceback (most recent call last) Cell In[5], line 1 ----> 1 img = np.array(x[-1, ...])

File ~/envs/base/lib/python3.10/site-packages/dask/threaded.py:89, in get(dsk, keys, cache, num_workers, pool, **kwargs)
86 elif isinstance(pool, multiprocessing.pool.Pool):
87 pool = MultiprocessingPoolExecutor(pool)
---> 89 results = get_async(
90 pool.submit,
91 pool._max_workers,
92 dsk,
93 keys,
94 cache=cache,
95 get_id=_thread_get_id,
96 pack_exception=pack_exception,
97 **kwargs,
98 )
100 # Cleanup pools associated to dead threads
101 with pools_lock:

File ~/envs/base/lib/python3.10/site-packages/dask/local.py:511, in get_async(submit, num_workers, dsk, result, cache, get_id, rerun_exceptions_locally, pack_exception, raise_exception, callbacks, dumps, loads, chunksize, **kwargs)
509 _execute_task(task, data) # Re-execute locally
510 else:
--> 511 raise_exception(exc, tb)
512 res, worker_id = loads(res_info)
513 state["cache"][key] = res

File ~/envs/base/lib/python3.10/site-packages/dask/local.py:319, in reraise(exc, tb)
317 if exc.traceback is not tb:
318 raise exc.with_traceback(tb)
--> 319 raise exc

File ~/envs/base/lib/python3.10/site-packages/dask/local.py:224, in execute_task(key, task_info, dumps, loads, get_id, pack_exception)
222 try:
223 task, data = loads(task_info)
--> 224 result = _execute_task(task, data)
225 id = get_id()
226 result = dumps((result, id))

File ~/envs/base/lib/python3.10/site-packages/dask_image/imread/init.py:99, in _map_read_frame(x, multiple_files, block_info, **kwargs)
96 else:
97 i, j = block_info[None]['array-location'][0]
---> 99 return _read_frame(fn=fn, i=slice(i, j), **kwargs)

File ~/envs/base/lib/python3.10/site-packages/dask_image/imread/init.py:104, in _read_frame(fn, i, arrayfunc)
102 def _read_frame(fn, i, *, arrayfunc=np.asanyarray):
103 with pims.open(fn) as imgs:
--> 104 return arrayfunc(imgs[i])

File ~/envs/base/lib/python3.10/site-packages/slicerator/init.py:226, in (.0)
225 def iter(self):
--> 226 return (self._get(i) for i in self.indices)

File ~/envs/base/lib/python3.10/site-packages/slicerator/init.py:206, in Slicerator._get(self, key)
205 def _get(self, key):
--> 206 return self._ancestor[key]

File ~/envs/base/lib/python3.10/site-packages/slicerator/init.py:187, in Slicerator.from_class..SliceratorSubclass.getitem(self, i)
185 indices, new_length = key_to_indices(i, len(self))
186 if new_length is None:
--> 187 return self._get(indices)
188 else:
189 return cls(self, indices, new_length, propagate_attrs)

File ~/envs/base/lib/python3.10/site-packages/pims/base_frames.py:100, in FramesSequence.getitem(self, key)
97 def getitem(self, key):
98 """getitem is handled by Slicerator. In all pims readers, the data
99 returning function is get_frame."""
--> 100 return self.get_frame(key)

File ~/envs/base/lib/python3.10/site-packages/pims/tiff_stack.py:115, in TiffStack_tifffile.get_frame(self, j)
113 def get_frame(self, j):
114 t = self._tiff[j]
--> 115 data = t.asarray()
116 return Frame(data, frame_no=j, metadata=self._read_metadata(t))

File ~/envs/base/lib/python3.10/site-packages/tifffile/tifffile.py:10261, in TiffFrame.asarray(self, *args, **kwargs)
10254 def asarray(self, *args: Any, **kwargs: Any) -> NDArray[Any]:
10255 """Return image from frame as NumPy array.
10256
10257 Parameters:
10258 **kwargs: Arguments passed to :py:meth:TiffPage.asarray.
10259
10260 """

10261 return TiffPage.asarray(self, *args, **kwargs)

File ~/envs/base/lib/python3.10/site-packages/tifffile/tifffile.py:8923, in TiffPage.asarray(self, out, squeeze, lock, maxworkers)
8913 out[
8914 s, d : d + shape[0], h : h + shape[1], w : w + shape[2]
8915 ] = segment[
(...)
8918 : keyframe.imagewidth - w,
8919 ]
8920 # except IndexError:
8921 # pass # corrupted file, e.g., with too many strips
-> 8923 for _ in self.segments(
8924 func=func,
8925 lock=lock,
8926 maxworkers=maxworkers,
8927 sort=True,
8928 _fullsize=False,
8929 ):
8930 pass
8932 result.shape = keyframe.shaped

File ~/envs/base/lib/python3.10/site-packages/tifffile/tifffile.py:8737, in TiffPage.segments(self, lock, maxworkers, func, sort, _fullsize)
8729 with ThreadPoolExecutor(maxworkers) as executor:
8730 for segments in fh.read_segments(
8731 self.dataoffsets,
8732 self.databytecounts,
(...)
8735 flat=False,
8736 ):
-> 8737 yield from executor.map(decode, segments)

File /usr/lib/python3.10/concurrent/futures/_base.py:621, in Executor.map..result_iterator()
618 while fs:
619 # Careful not to keep a reference to the popped future
620 if timeout is None:
--> 621 yield _result_or_cancel(fs.pop())
622 else:
623 yield _result_or_cancel(fs.pop(), end_time - time.monotonic())

File /usr/lib/python3.10/concurrent/futures/_base.py:319, in _result_or_cancel(failed resolving arguments)
317 try:
318 try:
--> 319 return fut.result(timeout)
320 finally:
321 fut.cancel()

File /usr/lib/python3.10/concurrent/futures/_base.py:451, in Future.result(self, timeout)
449 raise CancelledError()
450 elif self._state == FINISHED:
--> 451 return self.__get_result()
453 self._condition.wait(timeout)
455 if self._state in [CANCELLED, CANCELLED_AND_NOTIFIED]:

File /usr/lib/python3.10/concurrent/futures/_base.py:403, in Future.__get_result(self)
401 if self._exception:
402 try:
--> 403 raise self._exception
404 finally:
405 # Break a reference cycle with the exception in self._exception
406 self = None

File /usr/lib/python3.10/concurrent/futures/thread.py:58, in _WorkItem.run(self)
55 return
57 try:
---> 58 result = self.fn(*self.args, **self.kwargs)
59 except BaseException as exc:
60 self.future.set_exception(exc)

File ~/envs/base/lib/python3.10/site-packages/tifffile/tifffile.py:8712, in TiffPage.segments..decode(args, decodeargs, decode)
8711 def decode(args, decodeargs=decodeargs, decode=keyframe.decode):
-> 8712 return func(decode(*args, **decodeargs))

File ~/envs/base/lib/python3.10/site-packages/tifffile/tifffile.py:8641, in TiffPage.decode..decode_other(data, index, jpegtables, jpegheader, _fullsize)
8638 if decompress is not None:
8639 # TODO: calculate correct size for packed integers
8640 size = shape[0] * shape[1] * shape[2] * shape[3]
-> 8641 data = decompress(data, out=size * dtype.itemsize)
8642 data_array = unpack(data) # type: ignore
8643 # del data

File imagecodecs/_imcd.pyx:1209, in imagecodecs._imcd.lzw_decode()

ImcdError: imcd_lzw_decode returned IMCD_LZW_TABLE_TOO_SMALL

Unfortunately foo.ome.tif is not publicly-available data, so I can't provide a full reproducer. If there is any relevant metadata from the file that would help in debugging, I'd be happy to provide it if possible. Also, if this issue seems more relevant to a different component in the dask/tif-reading toolchain, I'm happy to transfer it there.

Environment info

Python 3.10.6 on Ubuntu 22.04
Subset of pip list:

dask                         2023.7.0
dask-image                   2023.3.0
tifffile                     2023.7.10
imagecodecs                  2023.7.10
ome-types                    0.3.4

Quick follow-up: I was able to access a machine with enough RAM that I could in principle load the entire image at once, so I tried a simplified MRE without the dask_image component. I can confirm that I still get the ImcdError: imcd_lzw_decode returned IMCD_LZW_TABLE_TOO_SMALL:

>>> img_fname = "foo.ome.tif"
>>> import tifffile as tff
>>> data = tff.imread(fname, maxworkers=1)
Traceback (most recent call last)
   ...
ImcdError: imcd_lzw_decode returned IMCD_LZW_TABLE_TOO_SMALL
Full Traceback Traceback (most recent call last) Cell In[3], line 1 ----> 1 data = tff.imread(fname, maxworkers=1)

File ~/envs/base/lib/python3.10/site-packages/tifffile/tifffile.py:1121, in imread(files, selection, aszarr, key, series, level, squeeze, maxworkers, mode, name, offset, size, pattern, axesorder, categories, imread, sort, container, chunkshape, dtype, axestiled, ioworkers, chunkmode, fillvalue, zattrs, multiscales, omexml, out, out_inplace, _multifile, _useframes, **kwargs)
1119 return store
1120 return zarr_selection(store, selection)
-> 1121 return tif.asarray(
1122 key=key,
1123 series=series,
1124 level=level,
1125 squeeze=squeeze,
1126 maxworkers=maxworkers,
1127 out=out,
1128 )
1130 elif isinstance(files, (FileHandle, BinaryIO)):
1131 raise ValueError('BinaryIO not supported')

File ~/envs/base/lib/python3.10/site-packages/tifffile/tifffile.py:4309, in TiffFile.asarray(self, key, series, level, squeeze, out, maxworkers)
4307 result = page0.asarray(out=out, maxworkers=maxworkers)
4308 else:
-> 4309 result = stack_pages(pages, out=out, maxworkers=maxworkers)
4311 if result is None:
4312 return None

File ~/envs/base/lib/python3.10/site-packages/tifffile/tifffile.py:22285, in stack_pages(pages, tiled, lock, maxworkers, out, **kwargs)
22283 if maxworkers < 2:
22284 for index, page in enumerate(pages):

22285 func(page, index)
22286 else:
22287 page0.decode # init TiffPage.decode function

File ~/envs/base/lib/python3.10/site-packages/tifffile/tifffile.py:22280, in stack_pages..func(page, index, out, filecache, kwargs)
22278 if page is not None:
22279 filecache.open(page.parent.filehandle)

22280 page.asarray(lock=lock, out=out[index], **kwargs)
22281 filecache.close(page.parent.filehandle)

File ~/envs/base/lib/python3.10/site-packages/tifffile/tifffile.py:10328, in TiffFrame.asarray(self, *args, **kwargs)
10321 def asarray(self, *args: Any, **kwargs: Any) -> NDArray[Any]:
10322 """Return image from frame as NumPy array.
10323
10324 Parameters:
10325 **kwargs: Arguments passed to :py:meth:TiffPage.asarray.
10326
10327 """

10328 return TiffPage.asarray(self, *args, **kwargs)

File ~/envs/base/lib/python3.10/site-packages/tifffile/tifffile.py:8981, in TiffPage.asarray(self, out, squeeze, lock, maxworkers)
8971 out[
8972 s, d : d + shape[0], h : h + shape[1], w : w + shape[2]
8973 ] = segment[
(...)
8976 : keyframe.imagewidth - w,
8977 ]
8978 # except IndexError:
8979 # pass # corrupted file, for example, with too many strips
-> 8981 for _ in self.segments(
8982 func=func,
8983 lock=lock,
8984 maxworkers=maxworkers,
8985 sort=True,
8986 _fullsize=False,
8987 ):
8988 pass
8990 result.shape = keyframe.shaped

File ~/envs/base/lib/python3.10/site-packages/tifffile/tifffile.py:8782, in TiffPage.segments(self, lock, maxworkers, func, sort, _fullsize)
8774 if maxworkers < 2:
8775 for segment in fh.read_segments(
8776 self.dataoffsets,
8777 self.databytecounts,
(...)
8780 flat=True,
8781 ):
-> 8782 yield decode(segment)
8783 else:
8784 # reduce memory overhead by processing chunks of up to
8785 # ~256 MB of segments because ThreadPoolExecutor.map is not
8786 # collecting iterables lazily
8787 with ThreadPoolExecutor(maxworkers) as executor:

File ~/envs/base/lib/python3.10/site-packages/tifffile/tifffile.py:8770, in TiffPage.segments..decode(args, decodeargs, decode)
8769 def decode(args, decodeargs=decodeargs, decode=keyframe.decode):
-> 8770 return func(decode(*args, **decodeargs))

File ~/envs/base/lib/python3.10/site-packages/tifffile/tifffile.py:8699, in TiffPage.decode..decode_other(data, index, jpegtables, jpegheader, _fullsize)
8696 if decompress is not None:
8697 # TODO: calculate correct size for packed integers
8698 size = shape[0] * shape[1] * shape[2] * shape[3]
-> 8699 data = decompress(data, out=size * dtype.itemsize)
8700 data_array = unpack(data) # type: ignore
8701 # del data

File imagecodecs/_imcd.pyx:1209, in imagecodecs._imcd.lzw_decode()

ImcdError: imcd_lzw_decode returned IMCD_LZW_TABLE_TOO_SMALL

I have not seen that error for a while. If I understand correctly, this should only happen with old-style LZW, but I am surprised that is used to write OME. So it could simply be a corrupt segment. Is libtiff based software able to load the whole image? Libtiff uses the same LZW table size I think.

Can you share the output and .lzw file produced by this script:

import sys
import tifffile
from imagecodecs import lzw_decode

with tifffile.TiffFile('foo.ome.tif') as tif:
    for page in tif.pages:
        if not page.compression == 5:
            print(f'page {page.index} not LZW compressed')
            continue
        for offset, bytecount in zip(page.dataoffsets, page.databytecounts):
            tif.filehandle.seek(offset)
            encoded = tif.filehandle.read(bytecount)
            try:
                segment = lzw_decode(encoded)
            except Exception as exc:
                print(page)
                print(page.tags)
                print(page.dataoffsets)
                print(offset, bytecount, exc)
                with open(f'{page.index}_{offset}_{bytecount}.lzw', 'wb') as fh:
                    fh.write(encoded)
                raise

If I understand correctly, this should only happen with old-style LZW, but I am surprised that is used to write OME.

I'm relatively new to tiffs/biological imaging so I have no real sense here. I am however in contact with the folks who generated the data, so if there are specific questions that would help shed light on the situation I'd be happy to pass them along.

So it could simply be a corrupt segment.

I had overlooked this possibility a bit, so after your comments I went back to test whether I was able to load any other of the 32 channels. For this particular image, it turns out I am able to read channel 0 (i.e. img = np.array(x[0, ...])), but I get the ImcdError for channels 1+.

There are three other images in this series (2 of which I have access to) with roughly the same dimensions, I will experiment with those to see if there are any differences.

Is libtiff based software able to load the whole image?

I'm not sure, I will give it a try.

Can you share the output and .lzw file produced by this script:

Log of stdout+stderr from script TiffPage 1 @119418264305 49380x51055 uint16 minisblack tiled lzw TiffTag 256 ImageWidth @119418264313 LONG8 @119418264325 = 51055 TiffTag 257 ImageLength @119418264333 LONG8 @119418264345 = 49380 TiffTag 258 BitsPerSample @119418264353 SHORT @119418264365 = 16 TiffTag 259 Compression @119418264373 SHORT @119418264385 = LZW TiffTag 262 PhotometricInterpretation @119418264393 SHORT @119418264405 = MINIS TiffTag 277 SamplesPerPixel @119418264413 SHORT @119418264425 = 1 TiffTag 284 PlanarConfiguration @119418264433 SHORT @119418264445 = CONTIG TiffTag 305 Software @119418264453 ASCII[22] @119418264601 = OME Bio-Formats 6. TiffTag 322 TileWidth @119418264473 SHORT @119418264485 = 1024 TiffTag 323 TileLength @119418264493 SHORT @119418264505 = 1024 TiffTag 324 TileOffsets @119418264513 LONG8[2450] @119418264623 = (1918989043, TiffTag 325 TileByteCounts @119418264533 LONG8[2450] @119418284223 = (1569815, TiffTag 330 SubIFDs @119418264553 LONG8[8] @119418303823 = (119417710631, 11941 TiffTag 339 SampleFormatimcd_lzw_decode_size returned IMCD_LZW_TABLE_TOO_SMALL

File saved by script:

Here's a google drive link to the output - if you have a different preferred form of file sharing just LMK

https://drive.google.com/file/d/1ZmyyQjBM0FOn7W8oQgAoMetUGZZ8bRFS/view?usp=sharing

The LZW compressed segment is corrupted. The libtiff library throws an Using code not yet in table error and Bio-Formats 6.16 stops decoding after about a quarter.

Which Bio-Formats version was used to produce the file? Was multi-threading enabled? This could be related to the following issues. There was also an issue with multi-threaded export from QuPath which I cannot find right now....

https://forum.image.sc/t/bioformats-lzw-compression-missing-image-parts/64451
ome/bioformats#3752

Try to re-write the file using latest versions or report the issue with the writing software. Here's a TIFF file containing just the corrupted segment: 1_3455306614_2333616.zip

Thanks for looking into this @cgohlke , I will follow-up with the folks who generated the images to see if they have any info on the above. If anything relevant pops up I'll be sure to post it here. Thanks again!