NVIDIA / Fuser

A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Initial top-down presegmentation phase

jacobhinkle opened this issue · comments

In this comment, ⁠@wujingyue brought up an interesting idea: #2146 (comment). The proposal is:

to modify segmentation to first attempt to segment only at segment_set and to write a pre-seg pass to add segment_set to the right place for this particular pattern.

This would let us enforce a segmentation by inserting those ops at definition. We could also extend this to ops that currently require their inputs to be segment inputs, namely reshape and resize ops. In those cases we might arrive at an optimal segmentation quicker and it might give us more optimal segmentations (TODO: find an example where reshapes determine a segmentation that would be difficult to find in our current scheme).

To implement this we could replace the current first step in segmentation where we try scheduling the complete fusion. Instead we would:

  1. Tentatively merge groups across edges that are not produced by a LoadStoreOpType::SegmentSet (or other conditions like ViewOp inputs as mentioned above). Note that this might result in the complete fusion in cases where the segment sets do not form a proper graph cut-set.
  2. Try to schedule each resulting group.
  3. For any group that failed to schedule properly, shatter it back into its individual Expr groups.
  4. Proceed with our existing scheduling algorithm. Note that we don't need to try fusing the already-accepted groups since we know they are maximal.

We could also extend this to ops that currently require their inputs to be segment inputs, namely reshape and resize ops.

They are requirements of the schedulers. Having the requirements realized as a preseg pass would mean a tight binding of the preseg pass and each scheduler, which may be error-prone to maintain.

What about presegmenting in some way based on the higher-level torch operations? Our schedulers are loosely mapped to a set of torch operations. Thunder could insert these annotations before running nvfuser.