fishfolk / punchy

A 2.5D side-scroller beatemup, made in Bevy

Home Page:https://fishfolk.github.io/punchy/player/latest

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Project is slow to compile; double check that there isn't anything unintentionally slow

64kramsystem opened this issue · comments

Description

Compiling the project is (relatively) very time consuming. Even making small changes causes slow compiles on my (8 threads) laptop - 35+ seconds. I'm even using the mold linker, which is as fast as it gets.

I have no idea what causes such slowdown. Compiling could be profiled, but maybe somebody has an idea (based on experience) of why it takes so long.

Alternatives & Prior Art

I've tried the dynamic Bevy feature, but it doesn't bring any improvement.

I've noticed the same thing. My times are much better with 12 cores ending up 14 seconds at worst, and often better depending on the change, but it's not at the usual bevy project of around roughly 2-6 seconds.

The only think I can think of right off the bat is that I always put all my code in lib.rs ( and submodules ) and the main.rs file is always a shim that simply calls my_crate::run().

I do it because you can't integration test main.rs because it's outside the crate, but maybe it also effects compile times.

It think the times are better on nightly with the shared-generics flag.

I've also tried LLD and mold in an attempt to improve times, but apparently the blocker isn't the linker. I think there's something like a -Z timings flag you can pass to Cargo to get some stats, but I'll have to see exactly how to do it because I can't remember.

Some very good compilation time debugging advice here by fasterthanlime and in the followup article.

Please let us know what helped!

lto = true: this will cause extremely slow release builds. Are you seeing this problem for debug builds too?

In general, I would only turn on lto for final releases that are going to be in the hands of players.

lto = true: this will cause extremely slow release builds. Are you seeing this problem for debug builds too?

In general, I would only turn on lto for final releases that are going to be in the hands of players.

Hello! Yes, I'm experiencing this on debug (default) builds.

Some very good compilation time debugging advice here by fasterthanlime and in the followup article.

Please let us know what helped!

Thanks for the links, I'll check them out. It's strange because the only crate compiled (when I make changes to the project) is the project crate itself (this is why Bevy dynamic linking doesn't help), which is not large, so there must be something subtle going on.

I've also tried LLD and mold in an attempt to improve times, but apparently the blocker isn't the linker. I think there's something like a -Z timings flag you can pass to Cargo to get some stats, but I'll have to see exactly how to do it because I can't remember.

I'm trying the self-profile flag. Timings doesn't reveal anything in this case unfortunately, because the only crate compiled is the project one.

Some preliminary data. It seems that a significant part of the time is taken by LTO.

Raw data: punchy-0630965.mm_profdata.zip

Table:

Item Self time % of total time Time Item count Incremental load time Incremental result hashing time
LLVM_module_codegen_emit_obj 40.79s 26.958 40.79s 233 0.00ns 0.00ns
LLVM_lto_optimize 28.00s 18.506 28.00s 232 0.00ns 0.00ns
LLVM_module_optimize 24.02s 15.875 24.02s 107 0.00ns 0.00ns
LLVM_passes 17.02s 11.250 17.10s 1 0.00ns 0.00ns
LLVM_thin_lto_import 11.30s 7.466 11.30s 232 0.00ns 0.00ns
finish_ongoing_codegen 10.70s 7.073 10.70s 1 0.00ns 0.00ns

Flamegraph: rustc

First finding: disabling LTO in dev build saves a considerable time.

Why was LTO even enabled for the dev build? We only put lto = true in the [profile.release]. How did you disable it?

Why was LTO even enabled for the dev build? We only put lto = true in the [profile.release]. How did you disable it?

Without checking, AFAIR, there are multiple types of LTO (thin/fat). I think thin LTO is enabled by default also on dev - but I may be wrong 😬

I couldn't find anything obvious (I'm no expert in the field).

The cargo llvm-lines tool shows what it seems are a high number of lines for Bevy method, but I don't know if that's the cause (I think that regardless, the number itself should be expected due to the system parameters nature).

Cranelift compiles the project, and shaves another ~35% of the time approximately, but that doesn't say anything about the root cause.

I wonder which parts of the project could be "heavy", e.g. Serde, but I don't have experience on this.

Interesting. Cranelift compiles faster, but produces a noticeably slow binary (e.g. slow to start, to load assets etc.). Sadly, this means it likely can't be used to compile for running, but it may (not 100% sure) be used in IDEs (VSC) to make IDE operations faster (e.g. checking).

I'm not sure that checking actually uses the codegen backend. I think checking is all rustc and no LLVM or cranelift, so that wouldn't end up speeding up IDE stuff as far as I understand. I could definitely be wrong, though.

I'm not sure that checking actually uses the codegen backend. I think checking is all rustc and no LLVM or cranelift, so that wouldn't end up speeding up IDE stuff as far as I understand. I could definitely be wrong, though.

🤬😂

Some data. I've profiled the compilation with perf. The procedure has been to clean build master, then move to a branch with a few changes (remove_camera_manual_commands), and build again.

Something very interesting that stands out is that rust takes only 25% of the time (after disabling thin LTO, it now takes ~120 single-thread seconds on my laptop). The rest of the time is taken by mysterious opt_* calls.

I've posted this on the Rust discord help channel (link); maybe this could be key, or help anyway.

In the meanwhile, setting opt-level=0 for both the main crate, and the deps, yields a build (of the branch) of approx. 64 single-thread seconds, which is a big win, Now, opt-level=0 on the deps is highly undesirable, but this gives some indication.

Data (missing from the previous comment 😬); taken via:

git checkout master
cargo clean
RUSTFLAGS="-Clink-arg=-fuse-ld=/home/saverio/local/mold/bin/mold" cargo +nightly build

git checkout remove_camera_manual_commands
RUSTFLAGS="-Clink-arg=-fuse-ld=/home/saverio/local/mold/bin/mold" perf record --call-graph dwarf --freq 100 cargo +nightly build
perf script | inferno-collapse-perf | inferno-flamegraph > perf.svg

perf.data.gz

perf

Ok, so, building the project is considerably sped up by disabling all the optimizations from the project crate (leaving the deps to opt-level=3 is ok). This still doesn't explain what exactly is slow in the compilation (the graph doesn't seem to have an obvious bottleneck).

The surprising part is that opt-level=1 can cause such slowness (on the (few) other projects I've worked on, it was almost as fast as opt-level=0). The opt_* calls on the graph are therefore LLVM optimizations (I was expecting not to see any, due to the low optimization level).

Given all the above, I think we should remove all the optimizations (that is, leave to the default opt-level=0) from the dev profile, until anybody experiences runtime slowdowns. This may never happen, since the deps are built with maximum optimization.

Probably, splitting the project in crates may restrict to the actual problem, but that may be a bit time consuming and lower priority than other tasks.

I think setting opt-level=0 is reasonable.

There's nothing we're doing in the game so far that probably needs much optimization. Bevy's doing all of our iteration for us and such, so unless we run into problems, we probably don't need it.

Yeah if opt-level=1 was causing the slowdown and its becoming annoying def axe it in favor of opt-level=0.