ziglang / zig

General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.

Home Page:https://ziglang.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

introduce `@branchWeight` builtin

andrewrk opened this issue · comments

Originally specified at #489 (comment).

This proposal supplants:

Summary:

  • No "expect" builtin or similar
  • Each branch gets a default weight of 10
  • Error branches (#84) get a default weight of 1
  • Branch weight can be overridden with @branchWeight builtin.
  • Branch weights are of type u32

This generalizes better to switch expressions.

This allows source code to be annotated with PGO data.

The only issue I see here is that this doesn't work so well with #8220, because @branchWeight tells us nothing about the source of the branch; perhaps it is unlikely for a given prong to be reached from the initial switch, but very likely to be reached from a labeled continue. You could also argue that this applies to loops: maybe a loop body is unlikely to be initially entered, but highly likely to be repeated once entered (although this case can't really be nicely modeled with @expect either). However, on the whole I agree that this is a better model than @expect.

commented

@mlugg If I understand your comment correctly, the actual distinction here would be to mark the branches = code paths rather than the target block they lead to.
I think this could be modeled by introducing additional blocks along the way, and only specifying @branchWeight for these intermediate blocks.

Current syntax might make this more verbose than would be ideal, but here's some examples anyway:

if (iterator.hasNext()) {
  @branchWeight(2); //unlikely to enter
  while(true) {
    const next = iterator.next().?;

    // actual logic - no @branchWeight directly here

    if (!iterator.hasNext()) {
      @branchWeight(1); //very unlikely to exit
      break;
    }
    @branchWeight(20); //very likely to continue looping
  }
}

switch(nextToken()) {
  .end => {},
  else => |tok| {
    switch (tok) {
      .a => @branchWeight(1), //unlikely to be .a initially
    }
    inner: switch(tok) {
      .a => {

        // actual logic - no @branchWeight directly here

        const next_token = nextToken();
        switch(next_token) {
          .a => @branchWeight(20), //very likely to be .a as a followup
          // other cases...
        }
        continue :inner next_token; //or alternatively inline this continue into the switch above with `@branchWeight` to make it easier for the optimizer to understand this?
      },
      // other cases...
    }
  }
}

commented

The idea of a builtin on the block instead of the condition makes sense to support cases where the condition isn't visible (or would be annoying to make visible) in the source code like captures (while (x) |y|, if (x) |y|) and monadic control flow (orelse {}, catch {}).

One thing that still feels fuzzy is the idea of having arbitrary weight values. The original mentions that probability values are problematic when keeping consistent across code edits. But more concretely, there's the argument for switch cases and profiling:

Switch cases in reality have little control on how they're dispatched: If a case is marked as "likely/unlikely" but a jump table would be ideal, what should codegen do here? If cases are sparse (jump table unavailable), what is the codegen for two equally "likely/unlikely" cases? Branch hints primarily affect which path falls through a conditional and which is jumped to. Default Switch statements don't execute comparisons in order (you can have else => first) so there's no concept of fall-through. Labeled continue may allow expressing this, but branching within them is still a matter of likeliness (binary), not probability/weight.

Profiling (Profile Guided Optimization) tracks which branches are taken to form a probability model, but how this affects codegen is another story: PGO can either 1) reorganize branching to make most/least probable ones fall-through/jumped-to or 2) eliminate the (conditional) branch entirely. The latter is aggressive and only correct when the optimized program will only run again with similar inputs to when profiled. The former is still a decision that's binary, not probability/weighted.

Note

It's also unclear how PGO could take advantage of "source code annotations" here since it bases optimizations on what's executed not speculated. Maybe it could report if an unlikely-marked branch was profiled to be more likely? Either ways, no codegen changes that this builtin wouldn't already do.


I propose the builtin be renamed to something like @{set}branchHint(x) which takes in a comptime enum with variant names like .likely/.unlikely or .fall_through/.jumped_to. The semantic edge cases could then be defined:

  • Q: How to simulate old @setCold(x) behavior?
    A: The old "coldness" is a more a property of inlining (a type of branching, with more implications). At the function level, use noinline or callconv(if (x) .NoInline else .Auto) if x is comptime-computed (requires a new CallingConvention.NoInline). At the scope level, use this branching hint. We could also instead define branchHint at the function level to be similar to inline from C and __attribute__((cold)) from GCC-likes respectively.

  • Q: What happens if two branchHint()s are called in the same scope?
    A: We should mimic whatever @setRuntimeSafety does here. Would propose to just ignore any subsequent others that aren't the first.

  • Q: What happens if branchHint occurs at function level of inline fn?
    A: Inline function bodies IIRC are semantically copied to the caller's invocation-site. It should then either be cancelled out (as noted above) or affect the branch hint for the caller's body which contains the invocation.

  • Q: What to do if branching to the same location must be jumped-to in one path and fall-through in another?
    A: Borrowing from @rohlem, add the respective branch hints to the conditional block at each path before branching to the same location. Example:

    while (true) {
        if (x) break; // unlikely
        if (y) break; // likely
    }
    // branchHint(???)
    sameLocation();
    
    // should become
    
    while (true) {
        if (x) { branchHint(.unlikely); break; }
        if (y) { branchHint(.likely); break; }
    }
    sameLocation();

I’m not a fan of “weight” values with undefined semantics (in general), because it leads to paranoid developers doing things like @branchWeight(999999999) or @branchWeight(4_294_967_295).

(It may be too early, but) it would be nice if this had a documentation that;

  1. the compiler would use these values only relative to each other, never implement thresholds where weights over a magic number trigger different code-gen and
  2. these values would be compared only locally, and if I call a function that uses large branch-weights, I don’t have to increase the branch weights in my code.

These are just the concerns that pop into my head, I don’t know how it would actually be implemented.

Each branch gets a default weight of 10

This seems a little bit arbitrary. Why not use floats and specify 1.0 as the default weight?

LLVM's branch weight metadata encodes probability. To those who suggest to toss out this data, can you explain why LLVM devs have decided to include this information, and why they have made a mistake by doing so?

LLVM optimizes branches using weights to be compatible with PGO:

Branch weights might be fetch from the profiling file, or generated based on __builtin_expect

The weights are also influenced by variables like what's inside the branch. LLVM internally warns against using arbitrary weights given the clash with implementation thresholds. We should take their advice and prefer the binary likely/unlikely API