Project is slow to compile; double check that there isn't anything unintentionally slow

Question

Project is slow to compile; double check that there isn't anything unintentionally slow

64kramsystem opened this issue 2 years ago · comments

Description

Compiling the project is (relatively) very time consuming. Even making small changes causes slow compiles on my (8 threads) laptop - 35+ seconds. I'm even using the mold linker, which is as fast as it gets.

I have no idea what causes such slowdown. Compiling could be profiled, but maybe somebody has an idea (based on experience) of why it takes so long.

Alternatives & Prior Art

I've tried the dynamic Bevy feature, but it doesn't bring any improvement.

Zicklag · Answer 1 · Sun Jul 17 2022 07:48:57 GMT+0800 (China Standard Time)

I've noticed the same thing. My times are much better with 12 cores ending up 14 seconds at worst, and often better depending on the change, but it's not at the usual bevy project of around roughly 2-6 seconds.

The only think I can think of right off the bat is that I always put all my code in lib.rs ( and submodules ) and the main.rs file is always a shim that simply calls my_crate::run().

I do it because you can't integration test main.rs because it's outside the crate, but maybe it also effects compile times.

It think the times are better on nightly with the shared-generics flag.

I've also tried LLD and mold in an attempt to improve times, but apparently the blocker isn't the linker. I think there's something like a -Z timings flag you can pass to Cargo to get some stats, but I'll have to see exactly how to do it because I can't remember.

Alice Cecile · Answer 2 · Sun Jul 17 2022 10:16:11 GMT+0800 (China Standard Time)

Some very good compilation time debugging advice here by fasterthanlime and in the followup article.

Please let us know what helped!

Alice Cecile · Answer 3 · Sun Jul 17 2022 10:27:22 GMT+0800 (China Standard Time)

lto = true: this will cause extremely slow release builds. Are you seeing this problem for debug builds too?

In general, I would only turn on lto for final releases that are going to be in the hands of players.

Saverio Miroddi · Answer 4 · Sun Jul 17 2022 16:25:19 GMT+0800 (China Standard Time)

lto = true: this will cause extremely slow release builds. Are you seeing this problem for debug builds too?

In general, I would only turn on lto for final releases that are going to be in the hands of players.

Hello! Yes, I'm experiencing this on debug (default) builds.

Saverio Miroddi · Answer 5 · Sun Jul 17 2022 16:27:06 GMT+0800 (China Standard Time)

Some very good compilation time debugging advice here by fasterthanlime and in the followup article.

Please let us know what helped!

Thanks for the links, I'll check them out. It's strange because the only crate compiled (when I make changes to the project) is the project crate itself (this is why Bevy dynamic linking doesn't help), which is not large, so there must be something subtle going on.

Saverio Miroddi · Answer 6 · Sun Jul 17 2022 16:38:37 GMT+0800 (China Standard Time)

I've also tried LLD and mold in an attempt to improve times, but apparently the blocker isn't the linker. I think there's something like a -Z timings flag you can pass to Cargo to get some stats, but I'll have to see exactly how to do it because I can't remember.

I'm trying the self-profile flag. Timings doesn't reveal anything in this case unfortunately, because the only crate compiled is the project one.

Saverio Miroddi · Answer 7 · Sun Jul 17 2022 21:00:57 GMT+0800 (China Standard Time)

Some preliminary data. It seems that a significant part of the time is taken by LTO.

Raw data: punchy-0630965.mm_profdata.zip

Table:

Item	Self time	% of total time	Time	Item count	Incremental load time	Incremental result hashing time
LLVM_module_codegen_emit_obj	40.79s	26.958	40.79s	233	0.00ns	0.00ns
LLVM_lto_optimize	28.00s	18.506	28.00s	232	0.00ns	0.00ns
LLVM_module_optimize	24.02s	15.875	24.02s	107	0.00ns	0.00ns
LLVM_passes	17.02s	11.250	17.10s	1	0.00ns	0.00ns
LLVM_thin_lto_import	11.30s	7.466	11.30s	232	0.00ns	0.00ns
finish_ongoing_codegen	10.70s	7.073	10.70s	1	0.00ns	0.00ns

Flamegraph:

Saverio Miroddi · Answer 8 · Sun Jul 17 2022 21:04:24 GMT+0800 (China Standard Time)

First finding: disabling LTO in dev build saves a considerable time.

Zicklag · Answer 9 · Sun Jul 17 2022 21:49:21 GMT+0800 (China Standard Time)

Why was LTO even enabled for the dev build? We only put lto = true in the [profile.release]. How did you disable it?

Saverio Miroddi · Answer 10 · Sun Jul 17 2022 21:51:04 GMT+0800 (China Standard Time)

Why was LTO even enabled for the dev build? We only put lto = true in the [profile.release]. How did you disable it?

Without checking, AFAIR, there are multiple types of LTO (thin/fat). I think thin LTO is enabled by default also on dev - but I may be wrong 😬

Saverio Miroddi · Answer 11 · Sun Jul 17 2022 21:54:51 GMT+0800 (China Standard Time)

I couldn't find anything obvious (I'm no expert in the field).

The cargo llvm-lines tool shows what it seems are a high number of lines for Bevy method, but I don't know if that's the cause (I think that regardless, the number itself should be expected due to the system parameters nature).

Cranelift compiles the project, and shaves another ~35% of the time approximately, but that doesn't say anything about the root cause.

I wonder which parts of the project could be "heavy", e.g. Serde, but I don't have experience on this.

Saverio Miroddi · Answer 12 · Mon Jul 18 2022 05:08:51 GMT+0800 (China Standard Time)

Interesting. Cranelift compiles faster, but produces a noticeably slow binary (e.g. slow to start, to load assets etc.). Sadly, this means it likely can't be used to compile for running, but it may (not 100% sure) be used in IDEs (VSC) to make IDE operations faster (e.g. checking).

Zicklag · Answer 13 · Mon Jul 18 2022 05:10:26 GMT+0800 (China Standard Time)

I'm not sure that checking actually uses the codegen backend. I think checking is all rustc and no LLVM or cranelift, so that wouldn't end up speeding up IDE stuff as far as I understand. I could definitely be wrong, though.

Saverio Miroddi · Answer 14 · Mon Jul 18 2022 06:27:07 GMT+0800 (China Standard Time)

I'm not sure that checking actually uses the codegen backend. I think checking is all rustc and no LLVM or cranelift, so that wouldn't end up speeding up IDE stuff as far as I understand. I could definitely be wrong, though.

🤬😂

Saverio Miroddi · Answer 15 · Mon Jul 18 2022 07:08:31 GMT+0800 (China Standard Time)

Some data. I've profiled the compilation with perf. The procedure has been to clean build master, then move to a branch with a few changes (remove_camera_manual_commands), and build again.

Something very interesting that stands out is that rust takes only 25% of the time (after disabling thin LTO, it now takes ~120 single-thread seconds on my laptop). The rest of the time is taken by mysterious opt_* calls.

I've posted this on the Rust discord help channel (link); maybe this could be key, or help anyway.

In the meanwhile, setting opt-level=0 for both the main crate, and the deps, yields a build (of the branch) of approx. 64 single-thread seconds, which is a big win, Now, opt-level=0 on the deps is highly undesirable, but this gives some indication.

Saverio Miroddi · Answer 16 · Mon Jul 18 2022 07:12:41 GMT+0800 (China Standard Time)

Data (missing from the previous comment 😬); taken via:

git checkout master
cargo clean
RUSTFLAGS="-Clink-arg=-fuse-ld=/home/saverio/local/mold/bin/mold" cargo +nightly build

git checkout remove_camera_manual_commands
RUSTFLAGS="-Clink-arg=-fuse-ld=/home/saverio/local/mold/bin/mold" perf record --call-graph dwarf --freq 100 cargo +nightly build
perf script | inferno-collapse-perf | inferno-flamegraph > perf.svg

perf.data.gz

Saverio Miroddi · Answer 17 · Mon Jul 18 2022 07:36:59 GMT+0800 (China Standard Time)

Ok, so, building the project is considerably sped up by disabling all the optimizations from the project crate (leaving the deps to opt-level=3 is ok). This still doesn't explain what exactly is slow in the compilation (the graph doesn't seem to have an obvious bottleneck).

The surprising part is that opt-level=1 can cause such slowness (on the (few) other projects I've worked on, it was almost as fast as opt-level=0). The opt_* calls on the graph are therefore LLVM optimizations (I was expecting not to see any, due to the low optimization level).

Given all the above, I think we should remove all the optimizations (that is, leave to the default opt-level=0) from the dev profile, until anybody experiences runtime slowdowns. This may never happen, since the deps are built with maximum optimization.

Probably, splitting the project in crates may restrict to the actual problem, but that may be a bit time consuming and lower priority than other tasks.

Zicklag · Answer 18 · Mon Jul 18 2022 07:38:48 GMT+0800 (China Standard Time)

I think setting opt-level=0 is reasonable.

There's nothing we're doing in the game so far that probably needs much optimization. Bevy's doing all of our iteration for us and such, so unless we run into problems, we probably don't need it.

odecay · Answer 19 · Mon Jul 18 2022 08:28:25 GMT+0800 (China Standard Time)

Yeah if opt-level=1 was causing the slowdown and its becoming annoying def axe it in favor of opt-level=0.