eurecom-s3 / symcc

SymCC: efficient compiler-based symbolic execution

Home Page:http://www.s3.eurecom.fr/tools/symbolic_execution/symcc.html

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

symcc fails to solve simple equality constraints

yiyunliu opened this issue · comments

The following program reads a long int, assigns the long int to an unsigned long, then branches on whether the unsigned long is equal to the constant 10086:

#include <unistd.h>

int main() {
  long ival;
  unsigned long abs_ival;
  read(STDIN_FILENO, &ival, sizeof ival);

  abs_ival = (unsigned long)ival;
  
  if(abs_ival == 10086) {
    return 9;
  }

  return 10;
}

For this program, symcc is able to generate an input that's equal to 10086 and causes the program to return 9 instead of 10.

On the other hand, if we add in a conditional branch on the sign of ival, symcc fails to generate a byte sequence that corresponds to the constant 10086.

#include <unistd.h>
int main() {
  long ival;
  unsigned long abs_ival;
  read(STDIN_FILENO, &ival, sizeof ival);

  if (ival < 0) {
    abs_ival = 0U - (unsigned long)ival;
  }
  else {
    abs_ival = (unsigned long)ival;
  }
  
  if(abs_ival == 10086) {
    return 9;
  }

  return 10;
}

I'm aware that symcc uses optimistic solving under the hood, but this program is simple and the pure_concolic_execution.sh helper script is stuck at a fixed point of 2 inputs, neither of which corresponds to the constant 10086.

We found this issue when trying to symbolically execute python with symcc. The program above is a fragment of Python's long -> big int conversion function.

After investigating this issue further, I found that at -O0 level, the second program actually works.

I looked into the llvm IR of the second program at different optimization levels.
At -O0, there is a snippet that compares abs_ival to the constant:

  %44 = call i8* @_sym_build_integer(i64 10086, i8 64)
  %45 = call i8* @_sym_build_equal(i8* %40, i8* %44)

At -O3, I don't see a similar snippet that performs the comparison. That's probably why I don't see the solution of abs_ival == 10086 being generated by symcc.

I recall that according to the paper, symcc inserts its calls to the runtime at an earlier stage of the compilation pipeline so optimization levels shouldn't have a large impact on its behavior. Has that changed in the current implementation?

Here are links to the full LLVM code:
-O0: https://gist.github.com/yiyunliu/ae48fbfb0b7fdfc5977ccd7699c07dd0
-O3: https://gist.github.com/yiyunliu/a7e57838150a8f6ca99eff5e98731e2f