cvat-ai / cvat

Annotate better with CVAT, the industry-leading data engine for machine learning. Used and trusted by teams at any scale, for data of any scale.

Home Page:https://cvat.ai

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Load large TIFF images

Erotemic opened this issue · comments

It would be useful if cvat was able to open large (e.g. 40,000 x 40,000 pixel) images. Currently it uses PIL to load images, which is known to fail with large images.

Here is the error I get when I try to load a large tiff:

cvat          | 2019-06-27 21:30:09,025 DEBG 'rqworker_default_1' stderr output:
cvat          | 21:30:09 default: cvat.apps.engine.task._create_thread(4, {'remote_files': [], 'server_files': [], 'client_files': ['a_very_large_image.tif']}) (/api/v1/tasks/4)
cvat          | 
cvat          | 2019-06-27 21:30:09,044 DEBG 'rqworker_default_1' stderr output:
cvat          | [2019-06-27 21:30:09,044] INFO cvat.server: create task #4
cvat          | 
cvat          | 2019-06-27 21:30:09,084 DEBG 'rqworker_default_1' stderr output:
cvat          | 21:30:09 TypeError: function takes at least 3 arguments (1 given)
cvat          | Traceback (most recent call last):
cvat          |   File "/usr/local/lib/python3.5/dist-packages/rq/worker.py", line 812, in perform_job
cvat          |     rv = job.perform()
cvat          |   File "/usr/local/lib/python3.5/dist-packages/rq/job.py", line 588, in perform
cvat          |     self._result = self._execute()
cvat          |   File "/usr/local/lib/python3.5/dist-packages/rq/job.py", line 594, in _execute
cvat          |     return self.func(*self.args, **self.kwargs)
cvat          |   File "/usr/lib/python3.5/contextlib.py", line 30, in inner
cvat          |     return func(*args, **kwds)
cvat          |   File "/home/django/cvat/apps/engine/task.py", line 402, in _create_thread
cvat          |     _copy_images_to_task(upload_dir, db_task)
cvat          |   File "/home/django/cvat/apps/engine/task.py", line 237, in _copy_images_to_task
cvat          |     image = image.convert('RGB')
cvat          |   File "/usr/local/lib/python3.5/dist-packages/PIL/Image.py", line 879, in convert
cvat          |     self.load()
cvat          |   File "/usr/local/lib/python3.5/dist-packages/PIL/TiffImagePlugin.py", line 1054, in load
cvat          |     return super(TiffImageFile, self).load()
cvat          |   File "/usr/local/lib/python3.5/dist-packages/PIL/ImageFile.py", line 204, in load
cvat          |     args, self.decoderconfig)
cvat          |   File "/usr/local/lib/python3.5/dist-packages/PIL/Image.py", line 437, in _getdecoder
cvat          |     return decoder(mode, *args + extra)
cvat          | TypeError: function takes at least 3 arguments (1 given)
cvat          | Traceback (most recent call last):
cvat          |   File "/usr/local/lib/python3.5/dist-packages/rq/worker.py", line 812, in perform_job
cvat          |     rv = job.perform()
cvat          |   File "/usr/local/lib/python3.5/dist-packages/rq/job.py", line 588, in perform
cvat          |     self._result = self._execute()
cvat          |   File "/usr/local/lib/python3.5/dist-packages/rq/job.py", line 594, in _execute
cvat          |     return self.func(*self.args, **self.kwargs)
cvat          |   File "/usr/lib/python3.5/contextlib.py", line 30, in inner
cvat          |     return func(*args, **kwds)
cvat          |   File "/home/django/cvat/apps/engine/task.py", line 402, in _create_thread
cvat          |     _copy_images_to_task(upload_dir, db_task)
cvat          |   File "/home/django/cvat/apps/engine/task.py", line 237, in _copy_images_to_task
cvat          |     image = image.convert('RGB')
cvat          |   File "/usr/local/lib/python3.5/dist-packages/PIL/Image.py", line 879, in convert
cvat          |     self.load()
cvat          |   File "/usr/local/lib/python3.5/dist-packages/PIL/TiffImagePlugin.py", line 1054, in load
cvat          |     return super(TiffImageFile, self).load()
cvat          |   File "/usr/local/lib/python3.5/dist-packages/PIL/ImageFile.py", line 204, in load
cvat          |     args, self.decoderconfig)
cvat          |   File "/usr/local/lib/python3.5/dist-packages/PIL/Image.py", line 437, in _getdecoder
cvat          |     return decoder(mode, *args + extra)
cvat          | TypeError: function takes at least 3 arguments (1 given)

Perhaps if the backend image loader was abstracted so it had the option to use PIL, or GDAL/rasterio then CVAT would work for this use case?

@Erotemic , please let me know if you can help with the issue.

Tough to say. I've got a lot of things on my plate at the moment. It may be possible that this becomes critically relevant to a project at my workplace, in which case I may be able to lend a few hours. However, at least in the immediate future I won't be able to give this any direct (as-in writing code) attention, but I'm certainly still able to advise, review, and discuss.

I did try to just quickly insert a GDAL reader instead of PIL, but GDAL is a notoriously difficult package to install (without conda), so I stopped after I encountered that resistance.

I'm also not sure if just getting a reader to load large images is sufficient, there may still be an issue when it comes to display and zooming in and out if care is not taken to only load relevant subregions when zoomed in and lower level-of-detail images when zoomed out. For whomever looks into this, it may be useful to use Cloud Optimized Geotiffs as a backend image format.

I am interested in fixing the issue. Do I need to create a new reader for TIFF images?

@souravsingh Instead of using PIL for large tiff (/ geotiff) images it may be better to use GDAL or rasterio (which uses GDAL in the backend but has a nicer interface) as the image reader backend. These readers will have better compatibility with the various types of TIFF images you may find in the wild.

So I just tried to recreate this issue with these large TIF images from NASA: https://visibleearth.nasa.gov/view.php?id=57752 . However, oddly was possible (after setting PIL.Image.MAX_IMAGE_PIXELS to a very large number) to load the images with PIL even though land_shallow_topo_west.tif is 240MB and has 21,600 x 21,600 pixels.

So it seems like not every geotiff image causes this issue with PIL. That being said, I'm not sure that these will work in CVAT, so they may still be good for testing purposes.

However, I was able to cause an issue if I converted the image to a COG TIF that used JPEG compression via gdal_translate -co TILED=YES -co COPY_SRC_OVERVIEWS=YES -co COMPRESS=JPEG land_shallow_topo_8192.tif test.tif . The resulting test.tif image has issues opening with PIL, so perhaps that might be a good test case to reproduce this issue.

Hi @souravsingh , please look at the PR (#434). It will give you an idea how to add your own extractor. The PR has been merged today.

#557 has an example for PDF's as well.

Some thoughts regarding the issue:

internally we got a similar request to load huge images. The first idea when you think about the issue is to implement it similar as google/yandex/apple maps work. Here an article: https://medium.com/@marcusasplund/huge-images-on-small-devices-189f13f59014

There is a very popular JS library to solve specifically the issue: https://github.com/Leaflet/Leaflet. Probably it is a good way to implement the feature.

Hi all.
Some great work is being done here.
I am interested on loading Whole Slide Images (WSIs)
As I can see from the list of the supported file from pillow
https://pillow.readthedocs.io/en/stable/handbook/image-file-formats.html
it is not possible.
@Erotemic did you manage to open large tiffs?
@souravsingh can you help me add my own extractor?
Thank you all

I think for various TIFF format it could be interesting to use either the BioFormat reader that can read BigTIFF files, or Christoph Gohlke's tifffile librairie that just saved my PhD thesis because it can handle various types of tiff files.

As for Whole Slide Images, @ChristosSpyropoulos I had the same issue as you, but it is actually known that it is a pain to exploit those images. Now I use OpenSlide to get an handle on a specific image bigger than my RAM and I use the Python API to load successive smaller tiles to run the analyses. That could probably be easily implemented in CVAT by better programmer than my :-)

https://www.cogeo.org/ - A Cloud Optimized GeoTIFF (COG) is a regular GeoTIFF file, aimed at being hosted on a HTTP file server, with an internal organization that enables more efficient workflows on the cloud. It does this by leveraging the ability of clients issuing ​HTTP GET range requests to ask for just the parts of a file they need.

I would love to see support for annotation of large Tiff files. However loading large Tiff files is only a little part of the problem. I see geospatial data such as satellite or aerial data and biomedical images as relevant areas of application for this feature.

I was able to import a geospatial tiff file, view it and create labels. I'm not sure whether the annotation page can deal with thousands of labels that would be required to label the whole image. Interactive annotation is not working.

As already mentioned in other issue tiling the imagery before uplaod is a workaround for this situation. But it creates a new set of problems:

  • artifacts at the tile boundaries due to missing context information when stitched together afterwards(human and automatic labels)
  • assignment of regions of the image to annotators or as a data subset.

Geospatial data usually comes with more than three bands. However for labeling the user should decide for a main RGB image file and should be able to optionally upload seperate bands as contextual imagery(RGB or Greyscale images need to be created by the user before upload). This would allow to user the available data for annotation and adhere to the concepts of CVAT

Parts of the solution:

  • Different image reader for .TIFF files (rasterio or gdal) and XYZ tile conversion in the backend
  • present large files as tiled imagery using leafleat.js
  • present contextual imagery as overlay with a opacity slider
  • Allow to divide work on a tif file into "frames" and jobs also using the tiling approach. I would assume leafleat-tiles != frames. i.E. the user can decide the grid size for frames. leafleat-tiles size is selected for best performance. Instead of overlapping tiles it would be nice if the user can see the annotation of adjacent frames (8-connectedness). The annotation tool should be able to react to annotations of adjacent tiles
  • Supply only the frames to automated annotation + context of adjacent frames
  • Support a export format that allows the frame annotations to be related to its tiff file or export the annotation as a combine mask image

Open questions:

  • Does that fit in the current architecture?
  • Should jobs be a on a regular frame grid?
  • What input formats to support?
  • Annotation import?

FWIW: The FOSS kwcoco library I've been developing at Kitware for the last few years has been rapidly evoloving and has good support for MSI image (i.e. images with more than 3 channels - and perhaps those channels exist at different spatial resolutions). In combination with the delayed-image module (both of which are on pypi) it can quickly and randomly sample arbitrary combinations of bands at any resolution in any spatial location. It does this via GDAL's COG support and a customized image operation tree in delayed-image.

The kwcoco API is great for backend reading / writing of data and annotations, but it does not have a good viewer / editor.