Generate LLVM IR and object files in parallel

Question

Generate LLVM IR and object files in parallel

yorickpeterse opened this issue 8 months ago · comments

Description

LLVM IR and the resulting object files are generated sequentially. LLVM is quite slow, and as a result we spend about 80-85% of the time in LLVM when compiling the standard library tests.

#520 discusses caching the object files and skipping modules if the cache is still valid. While this is something we probably need to do, it's tricky to get right. Most notably, if module B depends on A and creates a new type specialization of a type stored in A, then we store that specialization in A. This requires flushing the cache of A, and keeping track of what specializations we created in the past (so we can tell the difference between new and the same specializations between runs).

Generating the LLVM IR/objects in parallel can further help speed things up, because even with caching I expect the LLVM side of things to still take up a decent amount of time.

To perform LLVM generation in parallel, there are several things we need to take care of:

The type database must be Sync so an immutable reference to it can be used by the LLVM threads. This is currently not the case due to TypePlaceholder using Cell.
LLVM data can't be shared in any way between threads, as this results in segmentation faults. This poses a problem for our Layouts type as it depends on a Context, and LLVM contexts aren't Sync, nor are the various Inkwell types (e.g. StructType).
- We'd have to split Layouts into Layouts and Methods, the former storing only LLVM types and the latter storing the MethodInfo objects
- We can then share MethodInfo, but re-generate Layouts for every thread once
- TargetData and TargetMachine aren't Sync either, so these need to be created on a per-thread basis
State isn't Sync because Config contains Box<dyn Presenter> and this isn't Sync either. Given presenter types are immutable, we can probably just flag it as Sync.
When scheduling the modules we can take advantage of the fact that we have an existing list of modules indexed using an integer. This means the queue can just be an atomic integer that's incremented using a CAS, with threads processing the module for which they successfully incremented the value

Related work

#674
#520

Yorick Peterse · Answer 1 · Mon Dec 25 2023 01:34:31 GMT+0800 (China Standard Time)

We'd have to split Layouts into Layouts and Methods, the former storing only LLVM types and the latter storing the MethodInfo objects

This may be a bit tricky. The MethodInfo types contain an LLVM FunctionType, which isn't Sync. I think we'd have to split MethodInfo into MethodTypeInfo and CallInfo, or something along those lines. MethodTypeInfo would only contain the LLVM data, and CallInfo the hashing/index information. This way we don't have to regenerate the hashing information for every thread.

Yorick Peterse · Answer 2 · Mon Dec 25 2023 06:03:12 GMT+0800 (China Standard Time)

An additional note to the above: splitting the type DB might not be viable. There are various methods where we transform types (e.g. by resolving them), and those operations would require the individual components we'd split the type database into (e.g. the types and placeholders). This would require a massive amount of changes, and even then I'm not sure it wouldn't cause problems elsewhere.