DALI 2022 roadmap

Question

DALI 2022 roadmap

JanuszL opened this issue 2 years ago · comments

The following represents a high-level overview of our 2022 plan. You should be aware that this roadmap may change at any time and the order below does not reflect any type of priority.

We strongly encourage you to comment on our roadmap and provide us feedback on this issue here.

Some of the items mentioned below are the continuation of the 2021 effort (#2978)

Improving Usability:

eager mode - introducing DALI operators callable as standalone entities to simplify debugging, prototyping, and improve adoption: #3648, #3734, #4016
conditional execution - ability to conditionally apply each operation, providing auto Augmentor style capabilities - #3701, #4359, #4405, #4358
intra-pipeline batch size variability - providing an ability to change batch size from operator to operator inside the execution graph
support for Hopper GPU architecture - #4308

Extending input format support:

Extending support of formats and containers with variable frame rate videos: #3615, #3668, #4184, #4296, #4302, #4351, #4354, #4327, #4424
Image decoding operators with support for the following higher dynamic ranges - #4223

Performance:

reducing memory consumption by utilizing fast, pool based, memory allocator: #3728, #3667, #3670, #3678, #3754, #3759
operators performance optimizations:
- audio resampling optimization for ARM64 planforms #3745
- audio resampling for GPU - #3884, #3914 and #3911)
- GPU nonsilent-region operator - #3874
- slice operator optimization - #3568, #3557, #3600,
- transpose operator - #3730
- cast operator optimizations - #3541,

New transformations:

We are constantly extending the set of operations supported by DALI. Currently, this section lists the most notable additions to our areas of interest that we plan to do this year. This list is not exhaustive and we plan on expanding the set of operators as the needs or requests arise.

new transformations for general data processing
- Histogram operator - #3502
- inflate operator that enables decompression of LZ4 compressed input - #4366
- support for broadcasting in arithmetic operators (CPU and GPU) - #4348
new transformations for image processing
- Laplacian operator - #3518, #3563, #3618, #3644
- GPU debayer operator - #4495, #4486
- remap operator for generic geometric transformation of images and video - #4379, #4419, #4365, #4374, #4425
new transformations for video processing - extending support of existing operator for video sequences (including temporal - per frame parametrized, augmentations)
- support for processing video and handling of temporal arguments to color-manipulation operators and affine transform operators - #3937, #3946, #3917

Mark Saroufim · Answer 1 · Wed Apr 27 2022 05:26:05 GMT+0800 (China Standard Time)

Hi @JanuszL I was interested in a more seamless integration between DALI and torchvision for better end to end model training and inference time

Relevant PRs

In DALI: #2569
In torch/vision: pytorch/vision#608

In particular we don't necessarily need to integrate everything including the data loader but at the very least I think the accelerated image decoding and specialized preprocessing kernels will be of huge value gated behind another vision backend https://github.com/pytorch/vision#image-backend

The integration would probably look similar to the one made with accimage

I'm guessing torchaudio facebookresearch/mmf (multimodal) would also be similar

Janusz Lisiecki · Answer 2 · Wed Apr 27 2022 06:43:00 GMT+0800 (China Standard Time)

Hi @msaroufim,

Thank you for your feedback regarding our 2022 roadmap.
It would be nice to accelerate existing TorchVision pipelines, however, I'm not convinced if the way that DALI works can be combined in the suggested fashion. DALI relies on the processing graph, we plan to extract some of its operators into callable entities but this will not match the performance of the pipeline execution model. Also, it will lead to less efficient GPU memory utilization.
I would wait until this effort is at least partially completed and let the TorchVision community experiment with it and see what works best.

songyuc · Answer 3 · Mon Sep 26 2022 17:36:34 GMT+0800 (China Standard Time)

Hi, @JanuszL,
Is there a timeline for a stable version of DALI with support of python-3.10, as I saw this warning as:

Warning: DALI support for Python 3.10 is experimental and some functionalities may not work.
deprecation_warning("DALI support for Python 3.10 is experimental and some functionalities "

Your answer and guide will be appreciated!

Janusz Lisiecki · Answer 4 · Mon Sep 26 2022 19:52:50 GMT+0800 (China Standard Time)

Hi @songyuc,

This warning is there mostly because we don't have a full test coverage for Python 3.10, although it should work fine.
I cannot commit to a particular timeline but we hope to do it sooner than latter.

Clay Sheaff · Answer 5 · Sun Oct 16 2022 22:08:26 GMT+0800 (China Standard Time)

Hello, @JanuszL ,

I'm currently looking at DALI for medical image processing. Might there be plans for DICOM support?

Edit: nevermind I see it was mentioned in #3275.

Janusz Lisiecki · Answer 6 · Mon Oct 17 2022 14:15:03 GMT+0800 (China Standard Time)

Hi @csheaff,

Thank you for reaching out. We don't have a short-term plan to support DICOM. As I understand the usual workflow is the conversion from DICOM to NumPy (which includes offline preprocessing, like normalization), and then the NumPy files are used for the training. The conversion is done only once while NumPy files are reused multiple times. That is why we have this item low on our priority list.
Still, can you describe your workflow so we can reprioritize it if needed?

Clay Sheaff · Answer 7 · Mon Oct 17 2022 22:27:41 GMT+0800 (China Standard Time)

Thanks for the response @JanuszL. I've just discovered DALI and I'm sort of new to MLOps, so perhaps my ideas are off here. If you have better suggestions on workflow I'm happy to hear them.

My priority is low latency from end-to-end in medical imaging applications. Triton is great for inference, but i'm looking for ways to handle data loading and pre-processing as well on the GPU, to be used in tandem with an inference model served by Triton.

It's true that there can be a heavy amount of pre-processing and metadata extraction with DICOMs. The metadata extraction for purposes other than the main processing pipeline will likely always be there. Perhaps this means it doesn't make sense for DALI to handle DICOMs directly.

As for the normalization, my understanding is that DALI would be able to handle such tasks. Perhaps I'm mistaken.

Janusz Lisiecki · Answer 8 · Tue Oct 18 2022 03:14:11 GMT+0800 (China Standard Time)

Hi @csheaff,

As for the normalization, my understanding is that DALI would be able to handle such tasks.

DALI is mostly useful for online augmentation, and in general things, you need to do each iteration of your training/inference process. In the case of DICOM the conversion to NumPy can be done only once as a part of offline preparation, doing it every iteration wouldn't yield any value and would be wasteful from the resource point of view.
Nevertheless, I agree that it would be nice to seemingly handle that in DALI.

blazespinnaker · Answer 9 · Mon Dec 12 2022 03:49:08 GMT+0800 (China Standard Time)

@JanuszL

It looks like dali at least partially supports dicom by way of nvjpeg2k now and a bit of a hack. eg: https://www.kaggle.com/competitions/rsna-breast-cancer-detection/discussion/371534 notebook here - https://www.kaggle.com/code/tivfrvqhs5/decode-jpeg2000-dicom-with-dali?scriptVersionId=113466193

Kaggle isn't releasing a decoded dataset for the code competition so folks are having to decode on each run (train about 10 minutes, inference about 7 hrs!). The dali speedup is likely to be a huge win, but as noted it's only for dcmfile.file_meta.TransferSyntaxUID .90 standard, and .70 makes up about 1/2 of the other images. Heres the breakdown:

1.2.840.10008.1.2.4.70 29519
1.2.840.10008.1.2.4.90 25187

Any thoughts on how we might be able to get .70 in there as well? Are there fundamental limitations as to why it can't be supported?

Janusz Lisiecki · Answer 10 · Mon Dec 12 2022 22:19:16 GMT+0800 (China Standard Time)

Hi @blazespinnaker,

I'm glad to see that the community made that work. I think that it should be possible to use the external operator to extract DICOM data and pass it directly to the decoder instead of writing it to the disk.
Can you tell me more about .70 standard? How it is encoded (it may just happen that DALI doesn't support such a format yet)?

blazespinnaker · Answer 11 · Tue Dec 13 2022 07:36:08 GMT+0800 (China Standard Time)

Looks like .70 is a rarely used Jpeg Lossless standard .. https://crnl.readthedocs.io/jpeg_formats/index.html

If I had to guess, I'd say the question is whether nvjpeg can support it.

Maybe if we can get the pydicom folks to help support a pipeline to nvjpeg / nvjpeg2000 this can be done.

Janusz Lisiecki · Answer 12 · Tue Dec 13 2022 16:54:41 GMT+0800 (China Standard Time)

I just checked with the nvJPEG team and this format is not supported yet. In this case, DALI should fall back to the CPU libjpeg-turbo decoder. I'm sorry but I don't think we can do much now.

blazespinnaker · Answer 13 · Fri Dec 16 2022 14:03:32 GMT+0800 (China Standard Time)

Hmm, I think libjpeg might be a better fall back for 1.2.840.10008.1.2.4.70? libjpeg-turbo does not yet support lossless jpeg I believe.

libjpeg-turbo/libjpeg-turbo#638

Janusz Lisiecki · Answer 14 · Fri Dec 16 2022 16:36:24 GMT+0800 (China Standard Time)

Hi @blazespinnaker,

Currently, we fully rely on libjpeg-turbo for JPEG decoding. If it fails DALI cannot decode it.
What we can do it to try to fallback to the OpenCV (although I believe it may use libjpeg-turbo as well) here https://github.com/NVIDIA/DALI/blob/main/dali/image/jpeg.cc#L80. We would be more than happy to accept PR adding such functionality.

Janusz Lisiecki · Answer 15 · Wed Jan 18 2023 04:50:37 GMT+0800 (China Standard Time)

Please check #4578 for the 2023 roadmap.