Generate LLVM IR and object files in parallel
yorickpeterse opened this issue · comments
Description
LLVM IR and the resulting object files are generated sequentially. LLVM is quite slow, and as a result we spend about 80-85% of the time in LLVM when compiling the standard library tests.
#520 discusses caching the object files and skipping modules if the cache is still valid. While this is something we probably need to do, it's tricky to get right. Most notably, if module B depends on A and creates a new type specialization of a type stored in A, then we store that specialization in A. This requires flushing the cache of A, and keeping track of what specializations we created in the past (so we can tell the difference between new and the same specializations between runs).
Generating the LLVM IR/objects in parallel can further help speed things up, because even with caching I expect the LLVM side of things to still take up a decent amount of time.
To perform LLVM generation in parallel, there are several things we need to take care of:
- The type database must be
Sync
so an immutable reference to it can be used by the LLVM threads. This is currently not the case due toTypePlaceholder
usingCell
. - LLVM data can't be shared in any way between threads, as this results in segmentation faults. This poses a problem for our
Layouts
type as it depends on aContext
, and LLVM contexts aren'tSync
, nor are the various Inkwell types (e.g.StructType
).- We'd have to split
Layouts
intoLayouts
andMethods
, the former storing only LLVM types and the latter storing theMethodInfo
objects - We can then share
MethodInfo
, but re-generateLayouts
for every thread once TargetData
andTargetMachine
aren'tSync
either, so these need to be created on a per-thread basis
- We'd have to split
-
State
isn'tSync
becauseConfig
containsBox<dyn Presenter>
and this isn'tSync
either. Given presenter types are immutable, we can probably just flag it asSync
. - When scheduling the modules we can take advantage of the fact that we have an existing list of modules indexed using an integer. This means the queue can just be an atomic integer that's incremented using a CAS, with threads processing the module for which they successfully incremented the value
Related work
We'd have to split Layouts into Layouts and Methods, the former storing only LLVM types and the latter storing the MethodInfo objects
This may be a bit tricky. The MethodInfo
types contain an LLVM FunctionType
, which isn't Sync
. I think we'd have to split MethodInfo
into MethodTypeInfo
and CallInfo
, or something along those lines. MethodTypeInfo
would only contain the LLVM data, and CallInfo
the hashing/index information. This way we don't have to regenerate the hashing information for every thread.
An additional note to the above: splitting the type DB might not be viable. There are various methods where we transform types (e.g. by resolving them), and those operations would require the individual components we'd split the type database into (e.g. the types and placeholders). This would require a massive amount of changes, and even then I'm not sure it wouldn't cause problems elsewhere.