Separable filters?

Question

Separable filters?

phoekz opened this issue 2 years ago · comments

Hey, forgive me for a newbie question, image filters are not my strongest skill :). I have been evaluating your work a bit and it does seem to fit my case really well! I am going to continue evaluating, but at the moment my main concern is performance, as I plan to run this for a hundreds of thousands of image pairs. At least in the C++ implementation, most of the time is spent in convolution, so I wonder if there is a way to separate the "spatial", "point" and "edge" filters? Unfortunately GPU acceleration is not available in the instances I plan to run this on :(.

Tomas Akenine-Möller · Answer 1 · Mon Dec 20 2021 19:19:19 GMT+0800 (China Standard Time)

Hello there!
We had it on our TODO-list to improve speed for both the C++ version and for the CUDA versions.
What kind of CPU will you be running on?
We will make three different optimizations:

use openmp to parallelize the for-loops (this will help quite a lot if you have a CPU with many cores).
run several convolve filters at the same time.
separate the filters.
We'll write here when we have made progress. Number 1 should be ready later today.
Thanks for reaching out!
/Tomas

Vinh Truong · Answer 2 · Mon Dec 20 2021 19:35:26 GMT+0800 (China Standard Time)

Thanks for the quick reply! Yeah I was planning to do the 1. for now, my machine has 48 cores. It will work for now and with default settings from your paper, the kernel size is just 21x21, so it's not totally out of control yet :). With bigger kernels and once I distribute this across many machines (which will have much less than 48 cores), faster filters would help a ton!

Tomas Akenine-Möller · Answer 3 · Mon Dec 20 2021 19:38:59 GMT+0800 (China Standard Time)

Ok, so I pushed some #pragma omp parallel for which helped performance quite a lot for me. I got about 10x speedup on both LDR and HDR images.

Tomas Akenine-Möller · Answer 4 · Mon Dec 20 2021 19:40:02 GMT+0800 (China Standard Time)

Are you using the LDR or HDR version, btw?

Vinh Truong · Answer 5 · Mon Dec 20 2021 20:04:25 GMT+0800 (China Standard Time)

Are you using the LDR or HDR version, btw?

I am using the LDR version.

Tomas Akenine-Möller · Answer 6 · Wed Dec 22 2021 15:15:28 GMT+0800 (China Standard Time)

I realized there were some more places where I could add #pragma omp parallel for. These were just pushed to the repo.
LDR-FLIP takes 0.35 s (Evaluation time -- including readinf the files, it takes 1.1s) on my machine now on the CPU for 1920x1080.

Vinh Truong · Answer 7 · Wed Dec 22 2021 16:34:58 GMT+0800 (China Standard Time)

(I am the same person as phoekz). Thanks for the recent improvements! I have couple of updates that might be interesting to hear:

I ran FLIP on a couple thousands of image pairs and it was immediately clear that FLIP pretty much beats SSIM (our current metric for comparing images) in every aspect. Originally with SSIM we saw a lot of strange results where SSIM reported huge errors, but it was imperceptible unless you zoomed in 1000%, and also it missed errors which clearly perceptible. FLIP fixed all of these issues :)! Also the error map is just more pleasant to look at in general.
Our use case is basically evaluating 3d reconstructions made out of real photographs. Our system captures images in a sequence while moving. I noticed that with SSIM, the results were not coherent over sequential captures, the error map sometimes flickered a lot, while FLIP the results were much more stable. Looking at the original image pairs, you only notice the change in contrast when you zoomed in like 500%.
I eventually ended up translating the C++ LDR version into Rust. It simplified building & binding into my app and since the original code wasn't too big, it wasn't that bad. The translation only took me a couple of hours because the original code was really clear :). I looked at your recent commits and threw in the Rust equivalent of OpenMP and I did achieve similar timings, about 1 s for 5 MP photos.

Tomas Akenine-Möller · Answer 8 · Wed Dec 22 2021 16:57:35 GMT+0800 (China Standard Time)

Oh, so very nice to hear!
I just added one more optimization that was worthwhile. It improved speeds further. From 0.25-0.30 seconds to 0.18-0.22 seconds for LDR and 5.1s --> 2.9 for HDR.

Vinh Truong · Answer 9 · Wed Dec 22 2021 18:16:52 GMT+0800 (China Standard Time)

Oh, so very nice to hear! I just added one more optimization that was worthwhile. It improved speeds further. From 0.25-0.30 seconds to 0.18-0.22 seconds for LDR and 5.1s --> 2.9 for HDR.

Nice! Can confirm that this sped up the algorithm quite a bit.

Vinh Truong · Answer 10 · Wed Dec 22 2021 21:39:41 GMT+0800 (China Standard Time)

I went ahead and did a similar optimization to the spatial filter. I convolved ref and test with spatial filter at the same time and got another 25% speed gain on top of the feature filter optimizations.

Now if we can separate the filters, FLIP would run so fast that it's no longer a concern :). At least in my app everything else will run much slower in comparison.

Tomas Akenine-Möller · Answer 11 · Thu Dec 23 2021 04:37:21 GMT+0800 (China Standard Time)

Nice! I will add that after the winter break -- cool! Pontus will continue with separable filters as well after his break.

Tomas Akenine-Möller · Answer 12 · Thu Jan 13 2022 18:22:48 GMT+0800 (China Standard Time)

More optimizations are coming in FLIP v1.2, which should be up quite soon. Closing this for now.

Tomas Akenine-Möller · Answer 13 · Tue Jan 18 2022 19:02:30 GMT+0800 (China Standard Time)

FLIP v1.2 has now been released... with even more perf optimizations!

Vinh Truong · Answer 14 · Tue Jan 18 2022 19:04:25 GMT+0800 (China Standard Time)

Thank you so much! I can't wait to check out your improvements!

pandersson94 · Answer 15 · Tue Jan 18 2022 23:28:48 GMT+0800 (China Standard Time)

Thank you for the inspiration and ideas! :)