EnzymeAD / Enzyme

High-performance automatic differentiation of LLVM and MLIR.

Home Page:https://enzyme.mit.edu

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Performance of type-analysis

vchuravy opened this issue · comments

@swilliamson7 code from EnzymeAD/Enzyme.jl#1284 (comment) highlights very long compilation time in type-analysis. Happy to share the vtunes file if needed.

The hottest function are all related to type-tree:

image

in particular VisitGEPOperator seems to be dominating

image

So at least some of the shiftindices calls within GEP should be able to be made extremely quick, which should cut down on this quite a bit.

I see almost all of the visitCallBase is hitting the julia rule, which means this isn't running interprocedurally (which is good) so this is possible in theory to extract to a single llvm function which has the same behavior

The version in michel2323/ShallowWaters.jl@6341944 where @michel2323 fixed Checkpointing.jl takes ~4m and shows a second bottleneck in the call to julia_type_rule. I wonder if we could cache those calls?

image

Even better than caching the call we could also just use type metadata and never call back into julia.

Enzyme now supports the ability to parse type data on args/insts/etc from metadata/attributes, that we could do a pass first within julia to add to all julia args that are known, then just still tell TA in Julia to be disabled [and it should still use the llvm native metadata].

here's an example where we do so currently for an alloca: https://github.com/EnzymeAD/Enzyme.jl/blob/1829706d90c4fa2110edb8ac341a8aed68ed3f6f/src/compiler.jl#L4130

we already push the julia type as attributes here: https://github.com/EnzymeAD/Enzyme.jl/blob/1829706d90c4fa2110edb8ac341a8aed68ed3f6f/src/compiler.jl#L4497
might as well also push the typetrees

I've found (and am completing a fix for) some accidentally quadratic behavior in ShiftIndices. Hopefully this should substantially cut down on that overhead, or at least reduce it.

@vchuravy let's split this the ShiftIndicies and julia typetree into separate issues. Can you open a new issue for the latter on Enzyme.jl?

And can you check to see if the linked PR resolved the ShiftIndices part of the flamegraph?