google / souper

A superoptimizer for LLVM IR

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Bogus redefinition in synthesized RHS?

fitzgen opened this issue · comments

I'm seeing synthesized RHSes that redefine variables from the LHS like this:

%3:i1 = var
%4:i32 = select %3, 2:i32, 1:i32
infer %4
%2:i32 = zext %3
%3:i32 = add 1:i32, %2
result %3

Is that redefinition of %3 valid? It doesn't look like it should be.

+cc @jubitaneja

aie!!! looks wrong. please tell us how to repro.

I'll leave steps to repro to @jubitaneja since this was in a set of results that she synthesized, and I'm not 100% sure exactly what flags she used.

I modified the enumerative synthesis and cache_infer script to run the synthesis experiment. Here is the branch that has the changes on the souper's side.
https://github.com/jubitaneja/souper/tree/cranelift

I took LHSes harvested from Nick's souper-harvester that is integrated in wasmtime/cranelift and just inferred the RHSes using modified cache_infer script.

Looking through the results, it looks like this only happens when the LHS doesn't start at %0 or aren't contiguous (eg the LHS defines %0 and %3).

@fitzgen you pointed out the right pattern. I tested this test case individually as well with master branch of souper and the issue can be reproduced.

$ cat s1.opt 
%3:i1 = var
%4:i32 = select %3, 2:i32, 1:i32
infer %4

$ souper-check -infer-rhs -souper-enumerative-synthesis-max-instructions=3 s1.opt 
; RHS inferred successfully
%2:i32 = zext %3
%3:i32 = add 1:i32, %2
result %3

I think there is some logic in writing RHSes with Lvalue numbering that counts the number of instructions on LHS and starts the count from there and hence, we are seeing the first instruction on RHS with lvalue label number as %2.

ok, I'll explain how this works (hopefully correctly -- it's been a couple of years since I messed with this part of souper)

internally, these instruction numbers don't exist, they're purely an artifact of printing

when you print souper to text, there's a context object that keeps track of names. if you want the names to be consistent across a LHS and RHS, you have to use the same context object.

now in the message above, we're seeing two different tool executions -- so they're clearly not sharing a context object. so there's no reason to expect consistent numbering.

I could easily be missing something, just taking a quick look here. but Jubi, if you want to show us a souper bug, you have to show the LHS and RHS being printed together, incorrectly. if you're printing them separately, you shouldn't expect any name consistency.

We should probably change souper-check so that the whole replacement is printed after enumerative synthesis, not just the RHS.

We should probably change souper-check so that the whole replacement is printed after enumerative synthesis, not just the RHS.

pretty sure there's a command line argument for doing exactly that

@zhengyang92 pointed out the option to print the entire replacement after the synthesis is done instead of writing the RHSes only. I tested with -print-replacement-split and it worked fine for this test case. So, for the testing we can add this option to cache_infer script.

$ cat s1.opt
%3:i1 = var
%4:i32 = select %3, 2:i32, 1:i32
infer %4

$ souper-check -infer-rhs -print-replacement-split -souper-enumerative-synthesis-max-instructions=3 s1.opt 
; RHS inferred successfully
%0:i1 = var
%1:i32 = select %0, 2:i32, 1:i32
infer %1
%2:i32 = zext %0
%3:i32 = add 1:i32, %2
result %3