jrmuizel / raqote

Rust 2D graphics library

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Rasterization of complex images 10x slower than Cairo

Shnatsel opened this issue · comments

Rendering the image from https://gitlab.gnome.org/GNOME/librsvg/-/issues/574 with resvg takes 10 seconds with Cairo backend and 30 seconds with Raqote backend on my machine (end-to-end, including SVG processing by resvg). It's a 2400 by 2650px SVG with map data. As far as I can tell it only has fill and stroke with no interesting features.

A quick perf run shows that most of the time is spent in raqote::rasterizer::Rasterizer::rasterize when called from fill and stroke routines.

I had a look at this more closely. It looks like the bulk of the time in rasterize() is being spent in insert_starting_edges() the insertion sort here can be O(n^2) but even if I move the sorts with large counts to a separate function a lot of the time remains in insert_starting_edges()

I figured out what the problem was here. It was dumb.

9734d12 improves the time it takes to rasterize that svg for me from 23 seconds to 6.5 seconds. Do you mind retesting?

End-to-end rendering with resvg now takes 15 seconds with raqote vs 11 seconds with cairo. Most of the difference is gone!

Most of the time spent in raqote is now taken up by __memset_avx2_unaligned_erms called by __libc_calloc called by raqote::draw_target::DrawTarget::fill. Interactive profile with flamegraph is here.

That's likely caused by MaskSuperBlitter:new() https://github.com/jrmuizel/raqote/blob/master/src/blitter.rs#L41. We currently allocate a Vec that's fill bounds height * width. We don't actually need it to be that big. We could do rasterization in chunks so that we're only allocating some maximum height. Another thing we can do is reuse the allocation.

Reusing the allocation sounds good. For completeness, another possible approach is to reserve a vector with a certain capacity and append to it via .extend() or .extend_from_slice() instead of casting it to a slice and writing by index. This would bypass the need for initializing memory altogether.

Turns out there was another mistake causing a performance problem. 0b8d17c fixes it. Can you try testing again?

8891da8 will help too.

With those two additional changes my rendering time goes down to 3.6 seconds

Yup, on this image raqote is now faster than cairo! Thanks!