NVlabs / flip

A tool for visualizing and communicating the errors in rendered images.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Separable filters?

phoekz opened this issue · comments

Hey, forgive me for a newbie question, image filters are not my strongest skill :). I have been evaluating your work a bit and it does seem to fit my case really well! I am going to continue evaluating, but at the moment my main concern is performance, as I plan to run this for a hundreds of thousands of image pairs. At least in the C++ implementation, most of the time is spent in convolution, so I wonder if there is a way to separate the "spatial", "point" and "edge" filters? Unfortunately GPU acceleration is not available in the instances I plan to run this on :(.

Hello there!
We had it on our TODO-list to improve speed for both the C++ version and for the CUDA versions.
What kind of CPU will you be running on?
We will make three different optimizations:

  1. use openmp to parallelize the for-loops (this will help quite a lot if you have a CPU with many cores).
  2. run several convolve filters at the same time.
  3. separate the filters.
    We'll write here when we have made progress. Number 1 should be ready later today.
    Thanks for reaching out!
    /Tomas

Thanks for the quick reply! Yeah I was planning to do the 1. for now, my machine has 48 cores. It will work for now and with default settings from your paper, the kernel size is just 21x21, so it's not totally out of control yet :). With bigger kernels and once I distribute this across many machines (which will have much less than 48 cores), faster filters would help a ton!

Ok, so I pushed some #pragma omp parallel for which helped performance quite a lot for me. I got about 10x speedup on both LDR and HDR images.

Are you using the LDR or HDR version, btw?

Are you using the LDR or HDR version, btw?

I am using the LDR version.

I realized there were some more places where I could add #pragma omp parallel for. These were just pushed to the repo.
LDR-FLIP takes 0.35 s (Evaluation time -- including readinf the files, it takes 1.1s) on my machine now on the CPU for 1920x1080.

(I am the same person as phoekz). Thanks for the recent improvements! I have couple of updates that might be interesting to hear:

  1. I ran FLIP on a couple thousands of image pairs and it was immediately clear that FLIP pretty much beats SSIM (our current metric for comparing images) in every aspect. Originally with SSIM we saw a lot of strange results where SSIM reported huge errors, but it was imperceptible unless you zoomed in 1000%, and also it missed errors which clearly perceptible. FLIP fixed all of these issues :)! Also the error map is just more pleasant to look at in general.
  2. Our use case is basically evaluating 3d reconstructions made out of real photographs. Our system captures images in a sequence while moving. I noticed that with SSIM, the results were not coherent over sequential captures, the error map sometimes flickered a lot, while FLIP the results were much more stable. Looking at the original image pairs, you only notice the change in contrast when you zoomed in like 500%.
  3. I eventually ended up translating the C++ LDR version into Rust. It simplified building & binding into my app and since the original code wasn't too big, it wasn't that bad. The translation only took me a couple of hours because the original code was really clear :). I looked at your recent commits and threw in the Rust equivalent of OpenMP and I did achieve similar timings, about 1 s for 5 MP photos.

Oh, so very nice to hear!
I just added one more optimization that was worthwhile. It improved speeds further. From 0.25-0.30 seconds to 0.18-0.22 seconds for LDR and 5.1s --> 2.9 for HDR.

Oh, so very nice to hear! I just added one more optimization that was worthwhile. It improved speeds further. From 0.25-0.30 seconds to 0.18-0.22 seconds for LDR and 5.1s --> 2.9 for HDR.

Nice! Can confirm that this sped up the algorithm quite a bit.

I went ahead and did a similar optimization to the spatial filter. I convolved ref and test with spatial filter at the same time and got another 25% speed gain on top of the feature filter optimizations.

Now if we can separate the filters, FLIP would run so fast that it's no longer a concern :). At least in my app everything else will run much slower in comparison.

Nice! I will add that after the winter break -- cool! Pontus will continue with separable filters as well after his break.

More optimizations are coming in FLIP v1.2, which should be up quite soon. Closing this for now.

FLIP v1.2 has now been released... with even more perf optimizations!

Thank you so much! I can't wait to check out your improvements!

Thank you for the inspiration and ideas! :)