iree-org / iree-llvm-sandbox

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Listeners in Passes and Functional Patterns

Mogball opened this issue · comments

Based on what I have discussed with @nicolasvasilache and @ftynse and from what I've seen in this repo, I have 3 proposals, in order of increasing complexity/difficulty, of infra changes that might be useful. I'll just dump them here to start the discussion.

Pattern groups and registration

There should be some additional level of organization of patterns between individual patterns and the RewritePatternSet class, which is more of an unordered collection of patterns than a "set". A pattern "group" would be a set of patterns (perhaps uniqued by name) that can be combined with others or registered to a name.

E.g. mlir-opt --apply-greedily="group_a,group_b"

Passes and Rewrite Drivers should accept listeners

Passes and importantly the greedy rewrite driver should accept listeners so that the caller can monitor certain events, e.g. notifyOpReplaced. This should at least be implemented in the most common passes in MLIR. Rewrite driver should accept a listener directly, whereas listeners can be passed into passes via the pass manager. On a side note, there should be more rewrite driver "primitives" e.g. applyOnce.

There is currently a applyOpPatternsAndFold driver that applies a set of patterns "locally" on either a single operation or a set of operations. It provides some feedback as to whether an operation was erased. Does this not satisfy some of your requirements? Perhaps we can iterate on it to start.

Functional Patterns

Functional patterns (and by extent functional pattern groups) would enable structured composition of patterns. Alex's Linalg Transform interpreter is moving in this direction, by having a pattern "sequence" return operations that are passed into other patterns, although this is orchestrated by tagging operations with an attribute, finding these operations, and then mapping them to an SSA value in a map.

Functional patterns could be achieved in C++ as just a paradigm: a bunch of functions that return FailureOr<...> and accept a PatternRewriter. Implemented as

template <typename PatternT, typename ...Args>
auto applyOnce(PatternT pattern, PatternRewriter &rewriter, Args &&...args) {
  decltype(pattern(declval<Operation *>(), rewriter, args...)) result{};
  module->walk([&](Operation *op) {
    auto patternResult = pattern(op, rewriter, args...);
    if (failed(patternResult)) return WalkResult::advance();
    result = std::move(patternResult);
    return WalkResult::interrupt();
  });
  return result;
}

But I think it would be more interesting to implement this in PDL by

  1. Allowing patterns to return values (PDLValue)
  2. Allow patterns to "call" other patterns or pattern groups. The "call" would be a rewrite driver primitive, e.g. pdl.apply_once to apply a pattern at most once or pdl.apply_until to apply the pattern/pattern group until no more successes, and different pattern groups can define their own semantics. E.g. an ordered pattern group is just a list of patterns in decreasing benefit.

@Mogball thanks for looking into this and starting the discussion.

On our end we have been so far iterating on a top-down breakdown of "the transformations + state via attribute encoding" that we already know we want to apply. This is maybe more of a "try and reduce technical debt and see what principles appear".

What you propose makes sense and feels like a nice and clear "bottom-up" way of providing the components that would enable what we are trying to achieve here. It feels like you already have good ideas about the core principles that we are still discovering.

More manipulable pattern set, groups and names sound like a no-brainer and I imagine would play nicely with the upcoming PDLL work.

Re drivers and listeners, my issue has been that the existing abstractions are very load bearing so rather than try to extend something that I do not understand deeply, experimenting with something new that I understand made sense to me. But now that you seem engaged and confident about the core technical direction, I'd happily follow what you propose! As usual with top-down/bottom-up we need to make sure the abstractions connect so it is prob. still worthwhile to continue the prototype until it is well connected and we lower a bit the level of abstraction from the current level and make sure the python level still connects to e2e execution. But if you get something working and connected faster, I'll happily adopt early.

Re. functional patterns, @apaszke also had a bunch of comments and ideas on the direction. One thing to note is that most transformations we are talking about are implemented in the compiler return the "IR handles of interest" that the transformation produced to make the APIs chainable and composable.

Thanks for your proposals, I am looking forward to playing with some of this!

the transformations + state via attribute encoding

I'm not principally against using attributes to "mark" operations, encode state, or otherwise store information to be used by other patterns. In fact, I think this can lead to some interesting capabilities, e.g. using patterns to analyze operations and store analysis results as attributes. The discussion around how passes and transformations should handle attributes does not seem to have been resolved.

An approach I have in mind that would not require reworking attribute handling entirely would be to have important attributes implement an interface. I think the current mechanism for this is getDialectAttrs and setDialectAttrs, although it's not widely used.

In any case, Alex already has "transform-aware" passes. Perhaps those can be a starting point for upstream changes that implement "listeners" to passes.

The discussion around how passes and transformations should handle attributes does not seem to have been resolved.

The general discussion sounds a bit hard to reconcile with all possible interactions in the compiler, it seems a bit unlikely to converge..

An approach I have in mind that would not require reworking attribute handling entirely would be to have important attributes implement an interface. I think the current mechanism for this is getDialectAttrs and setDialectAttrs, although it's not widely used.

I have not thought about this particular part enough. I was wondering, whether we could take a pattern set + a drivers and register a list of attributes / interface that are known to "just propagate" ? It seems like propagation + cleanups could be implemented directly in the driver hooks alleviating the need for much deeper spreading, for now?

I have not thought about this particular part enough. I was wondering, whether we could take a pattern set + a drivers and register a list of attributes / interface that are known to "just propagate" ? It seems like propagation + cleanups could be implemented directly in the driver hooks alleviating the need for much deeper spreading, for now?

I like this idea, but it might be difficult to implement generically. I don't think it's practical to track which ops are getting erased/replaced, and which new ops their attributes should be propagated to in the driver.

It might have to be the responsibility of the canonicalization pattern itself to propagate attributes. This can be assisted with various new helper functions, e.g. rewriter.replaceOpWithNewOpAndPropAttrs (one could probably invent a better name for such a function).

Answering the original post first.

Thanks, these ideas look interesting!

The pattern group and registration idea sounds like it could replace the boilerplaty -convert-X-to-Y passes. Some downstream projects create their own uber-conversion passes that to -convert-A-and-B-and-C-and-D-and-E-and-F-to-Y, for example. The caveat is making this work cleanly with pass management + dialect registration. Passes must declare the dialects they can produce, and it's hard for me to see how to do that for a generic -apply-pattern-groups pass unless each group/pattern lists the dialects they produce.

Accepting listeners is a good idea in general IMO. We need to be wary of the overhead though. Currently, GreedyPatternRewriterDriver itself is an OpBuilder::Listener and a PatternRewriter. Figuring out the mechanism for chaining listeners is probably a good start, today we can only have one.

There is currently a applyOpPatternsAndFold driver that applies a set of patterns "locally" on either a single operation or a set of operations. It provides some feedback as to whether an operation was erased. Does this not satisfy some of your requirements? Perhaps we can iterate on it to start.

These are still recursive, i.e. they will keep applying the same pattern to the ops produced by applying this pattern. This recursion is undesirable. Consider a pattern that tiles an operation: we don't want it to keep tiling indefinitely, it has to apply once or a specified fixed number of times and stop. There are also performance considerations of not maintaining a worklist and avoiding spurious checks.

That being said, we may have a use case for "local" canonicalizations as we move forward with finer-grained transformation targeting.

Functional patterns could be achieved in C++ as just a paradigm: a bunch of functions that return FailureOr<...> and accept a PatternRewriter

This is quite interesting. Do you think such patterns could be compatible with the "main" pattern rewriter infrastructure (just discarding the non-failure result)?

I'm not principally against using attributes to "mark" operations, encode state, or otherwise store information to be used by other patterns. In fact, I think this can lead to some interesting capabilities, e.g. using patterns to analyze operations and store analysis results as attributes. The discussion around how passes and transformations should handle attributes does not seem to have been resolved.

An approach I have in mind that would not require reworking attribute handling entirely would be to have important attributes implement an interface. I think the current mechanism for this is getDialectAttrs and setDialectAttrs, although it's not widely used.

I am opposed to using attributes :) Well, not really, but relying on attributes is currently not composable with how most "core" transformation work. That is, they can drop non-inherent attributes at any point, so relying on such attributes is a liability that I'd like to minimize at worst and avoid completely at best. (That being said, putting attributes as debugging aid for humans is still interesting). Until there is some mechanism to propagate attributes across transformations (presumably, a core IR change + an interface that describes how exactly an attribute is propagated in cases other than 1->1 rewrite; IMO the generic propagation is the hardest part), the only failproof way of communicating through attributes is rolling your own passes, or regularly auditing third-party passes you run, that treat attributes in the way you expect. Not to scare you away from thinking about it, but my first email thread about non-discardable dates back to July 2019, and the issue is still there.

Passes must declare the dialects they can produce

Passes that are just wrappers around pattern groups declare the dependent dialects for those patterns anyways. No reason why patterns can't also declare dependent dialects.

This is quite interesting. Do you think such patterns could be compatible with the "main" pattern rewriter infrastructure (just discarding the non-failure result)?

I don't see why not (as I've written them). There are a lot of options to customize and define patterns now, beyond just extending matchAndRewrite endlessly.

Passes that are just wrappers around pattern groups declare the dependent dialects for those patterns anyways. No reason why patterns can't also declare dependent dialects.

Technically, they can. Practically, we are talking about changing hundreds (thousands?) of pattern classes to list dependent dialects. These things have a cost. If all patterns were declarative, we would have just inferred the dependent dialects from the declaration.

I don't see why not (as I've written them). There are a lot of options to customize and define patterns now, beyond just extending matchAndRewrite endlessly.

There are always pesky little C++ template and overloading things :) At a glance, this may work, but I haven't tried to code that.

Practically, we are talking about changing hundreds (thousands?) of pattern classes to list dependent dialects.

We don't have to change existing patterns, but just give patterns the option to list dependent dialects. Most patterns will continue to be used the same way they always have.

I also put up two patches to add listeners to the CSE and canonicalizer passes. River has some reservations about it, and I wasn't able to sync with him before the holidays, so that will be blocked upstream until afterwards. I've thought about how one would go about implementing functional patterns in PDL, and while I'm confident it can be done and be done nicely, it's not a trivial effort.

I'll try building "functional patterns" in C++ for the time being and see if that's the sort of thing that would solve the application control problem.

We don't have to change existing patterns, but just give patterns the option to list dependent dialects. Most patterns will continue to be used the same way they always have.

Then it creates two implicit kinds of patterns: those that correctly list dependent dialects and therefore can be used with the wrapper pass, and those that don't. I expect River and other to have reservations about it too, I would myself :)

I also put up two patches to add listeners to the CSE and canonicalizer passes. River has some reservations about it, and I wasn't able to sync with him before the holidays, so that will be blocked upstream until afterwards.

Yeah, that is expected and one of the reasons why I forked these things instead. The upstream change would be to just make the greedy rewriter class visible in the header so one could derive it.

I'll try building "functional patterns" in C++ for the time being and see if that's the sort of thing that would solve the application control problem.

Thanks! I suggest to implement a quick prototype here and see if that helps before trying to put it upstream. There are two long design discussions to be had: (1) evolving pass management to support more granularity (FWIW, I don't see a conceptual difference between a pass that runs PDL interpreter and a pass that runs another interpreter on my Linalg transform dialect) and (2) dealing with dependent dialects in patterns, more specifically in canonicalization patterns that became cross-dialect after splitting the standard dialect.

Hi everyone! Yeah, I'd love to see more functionally-inspired rewriting infrastructure to appear in MLIR. I started working on a prototype similar to the one that @Mogball proposed, but I hit one fundamental difficulty: mutability.

As it is today, single pattern applications are perfectly composable, because one gets a match method that lets you verify the match without any side-effects. But, consider what happens when you try to build a larger pattern out of two smaller patterns (let's call them a and b). How do you write a match for that? If a.match(op) succeeds then you can't follow up with b.match(op), since that b.match invocation would not see the IR changes that a.rewrite(op, rewriter) would apply. But, if we opportunistically perform a.rewrite(op, rewriter) and then b fails to match then we have a problem, since we have already mutated the IR and it doesn't seem trivial to invert that (barring excessive copying).

I'm sure that if we can work out the compositionaly issues, we can make a very nice rewriting framework and even embed the pattern application in MLIR. And compositionality is the critical feature of functional rewriting systems IMO. IR traversals can be expressed neatly using failable recursive computations (as in e.g. ELEVATE), but managing with failability is the part that has stopped me so far.

Any ideas as to how such rollbacks could be implemented? I would imagine that it might be doable since the dialect conversion framework does maintain two parallel copies of the IR, but I don't know enough details to actually have a good understanding of how this works.

Any ideas as to how such rollbacks could be implemented? I would imagine that it might be doable since the dialect conversion framework does maintain two parallel copies of the IR, but I don't know enough details to actually have a good understanding of how this works.

The dirty but foolproof way is to clone on the first pattern application and later discard the part we don't want anymore.
This of course won't work in the general case of non-scoped transforms but for composition of locally scoped transforms this has worked for me in the past.

What's a locally scoped transform? The difficult I see is that the rewrite method can have side effects that are potentially unbounded (e.g. it can replace all uses of a value in the whole program)

I don't have a good definition for it but basically if you have a set1{ops} that get transformed to set2{ops} ... setn{ops} and at every step the only modifications to the IR is to set1{ops} and their operands + results then you are good.

Something like CSE which escapes outside of "set1{ops}+ operands + results" is much more annoying and requires genuflexions to track (see some of the recent @ftynse commits re CSE).

also @michel-steuwer and @Bastacyclop once they accept the invite

(e.g. it can replace all uses of a value in the whole program)

maybe 1-hop is a better analogy; we can do "1-hop replacements of operands and results" with a simple clone; but not more.

The cleanup procedure is usually along the lines of:

  1. replace the current list of results by the currently valid one
  2. if failed to apply, you may want to rewriter.eraseOp in reverse order of creation yourself to avoid surprises if you introduced side effects that the basic DCE + fold cannot clean up.

Note that this needs to be done at the "transformation" level and transformations compose (but not patterns) since you'll need better APIs. The pattern itself is then a simple wrapper around a composition of transforms.

Right, I think that if you limit the scope of what the patterns can do then you might be able to implement reasonable rollbacks. It would be an interesting exercise to try to enforce those restrictions syntactically in the body of the handler, but an assumption which causes UB might be a good place to start 🤷

Dialect conversion postpones the replace of all uses until the end of the entire conversion process, so you always match on the original IR (unless patterns do in-place modifications), but that comes with its own set of issues. I have a pretty good understanding of how these things currently work if you want to discuss it in more detail, but this will have to wait until the end of holidays.

Yeah, that is expected and one of the reasons why I forked these things instead. The upstream change would be to just make the greedy rewriter class visible in the header so one could derive it.

That wouldn't solve the problem of tracking changes through CSE though, or any other non-pattern pass. Although I would assume that even a patch just exposing the greedy rewriter will face objections. I think once the holidays are over, we'll have a design discussion about how to support more advanced codegen strategies in MLIR.

That being said, among other things I think should be exposed publicly is the save/revert mechanism behind dialect conversion. Dialect conversion does unwind in-place modifications and it does apply patterns to generated ops, otherwise A -> B and B -> C where only C is legal would not work, and that's why conversion patterns accept an array of operands, so that they can match across the original IR and the "pending" IR. That being said, there are restrictions in conversion patterns e.g. block manipulation has to be done through the conversion rewriter's APIs.

Nice, that gives me more confidence that we could actually get this to work, for as long as people obey the rules around IR mutation. Perhaps the rewrite patterns would never get access to any real Operation instance, but would always be expected to work with views/adaptors such that they wouldn't ever be able to tell if the adaptor corresponds to a real operation or not, and would have to go through the limited and invertible rewriter API. How about we schedule some time in Jan and try to sketch a prototype together?

Perhaps the rewrite patterns would never get access to any real Operation instance, but would always be expected to work with views/adaptors such that they wouldn't ever be able to tell if the adaptor corresponds to a real operation or not, and would have to go through the limited and invertible rewriter API.

I've been wanting an "OperationView" class for a while for this purpose. It's essentially just a class one step above op adaptors.

The other option is to clone the scope of the transformation and swap it with the original if the chain of patterns succeeded, otherwise just drop it. We don't have to clone the entire IR -- passes are already scoped to certain operations and their regions e.g. FunctionPass, the contract being that passes can only modify within their scope, cannot query sibling operations (due to threading), but can query parent operations. This sidesteps the problem (for now) of trying to replicate dialect conversion's mechanisms, which get a little bit hairy.

Currently, scopes are limited to the regions of a single operation, but we can probably expand that to be a set of operations and their nested ops.

How about we schedule some time in Jan and try to sketch a prototype together?

SGTM.

@ftynse @Mogball side note: I think I saw some code fly by that was trying to limit the scope of pattern application to ops nested under another op. Rather than build this into the driver, it is likely much easier to add a matching constraint that checks isProperAncestor.

I'm not principally against using attributes to "mark" operations, encode state, or otherwise store information to be used
by other patterns. In fact, I think this can lead to some interesting capabilities, e.g. using patterns to analyze operations
and store analysis results as attributes. The discussion around how passes and transformations should handle attributes
does not seem to have been resolved.

@Mogball I don't see that discussion converging, the proposal is way too broad.
How about instead we have some simpler TransformationTrackableOpInterface with an attribute that must always be set.
The attribute is elided from parsing/printing if it is the default value but needs to always be cloned, otherwise verification fails.

But I am unclear that leaving attributes around is what we should strive for though as I don't find it to scale.
Tracking the IR handles we want explicitly feels like a nicer approach.

In principle, this isn't very different from transformations/passes that update an analysis (e.g. for efficiency reasons).
In practice, in our cases of interest, updating the analysis (i.e. tracked IR handles) is much much easier than rediscovering them on transformed IR.

Very interesting discussion here, and thanks @nicolasvasilache for tagging us.

In ELEVATE we achieve composability by seeing a rewrite as a function. Applying this function constitutes performing the rewrite, and you get back either the rewritten program or a failure. As @apaszke pointed out, this works well for us because of the immutable setting we worked in.

I would be very curious to see how a similar design could be achieved in MLIR.

I need a bit more explanation to understand the difference between a transformation and a pattern in the discussion here.

One thing that is very clear in MLIR is the separation between the RewritePattern (i.e. the matching and the rewrite) and the PatternRewriter that is driving the application of patterns. In ELEVATE, all of these peices are combined into a single function.

To achieve composability, we will need to break the separation in MLIR up. That's why I like the "Allow patterns to "call" other patterns".
Which raises the question, when do you want what mode of pattern application?

Last year, we had started playing around with an ELEVATE inspired interface in MLIR:
https://github.com/rise-lang/mlir/blob/elevate/mlir/include/mlir/Elevate2/core.h

We opted to create a new subclass of RewritePattern that is composable, as it performs the rewrite directly and returns the rewritten IR directly. (I don't think we properly address the mutability challenges that @apaszke raised in this early prototype). You can compose these types of patterns, with operators resulting in new rewrite patterns that you can compose again. We didn't pursue this prototype super far, as @martin-luecke, the PhD student working on this, is currently on a break.

I need a bit more explanation to understand the difference between a transformation and a pattern in the discussion here.

Transformation is a very loose term referring to any modification of the IR. Pattern is a RewritePattern or something similar. A transformation can be expressed by a pattern, but it can also be significantly more complex.

To achieve composability, we will need to break the separation in MLIR up. That's why I like the "Allow patterns to "call" other patterns".
Which raises the question, when do you want what mode of pattern application?

I wouldn't necessarily require merging the patterns and the driver as a precondition for composability. MLIR patterns are essentially (stateless) functions. The driver contains the state including the IR being rewritten. It is possible to play the conceptual of treating the "pattern function" as accepting the driver state and returning a new driver state. Being able to "call" patterns sounds the most important part.

Being able to "call" patterns sounds the most important part.

Would "calling" a pattern mean then invoking the driver, and if so, which one (if there are multiple)?

and you get back either the rewritten program or a failure. As @apaszke pointed out, this works well for us because of the immutable setting we worked in.

This actually sounds like a decent idea for a problem I brought up in #149 where @apaszke introduced his strategy dialect and this would work well with "scoped transformations". I.e., the result of a pattern application is a failure or a mutated copy of the nearest scope.

The mutability problem could be solved by keeping a copy of the scope from before the strategy was applied and reverting back to it. Since the strategy is in control of the IR scope, then the problem of invalidating IR references in any drivers with clone/revert is avoided.

Would "calling" a pattern mean then invoking the driver, and if so, which one (if there are multiple)?

This could be as simple as directly invoking the pattern on an operation pattern.matchAndRewrite(op, rewriter) or invoking some driver (e.g. apply the pattern once to the IR, apply it N times, or greedily, although greedy only makes sense for pattern sets).

This could be as simple as directly invoking the pattern on an operation pattern.matchAndRewrite(op, rewriter)

In our experiments we basically did that. Then we use the ideas of ELEVATE, which explain how to compose rewrites. The key here to making that work is that a "call" to apply a pattern has to return a handle to the modified IR.

We use for example the seq rewrite pattern for composition of two rewrite patterns which are to be applied after another:

RewritePattern s1 = RewritePattern1();
RewritePattern s2 = RewritePattern2();

SeqRewritePattern(s1, s2).matchAndRewrite(op, rewriter);

with

RewriteResult SeqRewritePattern::matchAndRewrite(Operation *op, PatternRewriter &rewriter) {
    RewriteResult result1 = s1.matchAndRewrite(op, rewriter);
    if (isSuccess(result1)) {
        return s2.matchAndRewrite(result1.op, rewriter);
    }
    return result1 //failure
}

First the rewrite s1 is applied to op and, if successful, s2 is applied to the resulting modified IR.

Following this design and with some small syntax helpers it is easy to compose more complex rewrites:

seq(topdown(fuseReduceMap()), normalize(betaReduction())).matchAndRewrite(op, rewriter);

Here for instance, starting with op we traverse the program topdown until we can apply the fuseReduceMap rewrite. Then, we apply the betaReduction rewrite at all possible occasions using normalize.
Note: topdown and normalize here are RewritePatterns as well, which control how and where the rewrites supplied to them are to be applied.

Ergo this system is based on RewritePatterns being able to call other RewritePatterns.