eurecom-s3 / symcc

SymCC: efficient compiler-based symbolic execution

Home Page:http://www.s3.eurecom.fr/tools/symbolic_execution/symcc.html

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Wrong handling of i1 in visitCastInst

ercoppa opened this issue · comments

Consider this example (inspired by a real-world code):

#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>

int bar(unsigned char a) {
  if (a == 0xCA) return -1;
  else return 0;
}

int main() {
  unsigned char input = 0;
  read(0, &input, sizeof(input));
  int r = bar(input);
  if (r == -1) printf("Bingo!\n");
  else printf("Ok\n");
  return r;
}

Clang for bar will emit with -O1 (when using -O2, the function bar is inlined, hiding the bug):

define dso_local i32 @bar(i8 zeroext %0) local_unnamed_addr #0 {
  %2 = icmp eq i8 %0, -54
  %3 = sext i1 %2 to i32
  ret i32 %3
}

Notice the sext operation. When instrumenting with SymCC, we get:

define dso_local i32 @bar(i8 zeroext %0) local_unnamed_addr #0 {
  call void @_sym_notify_basic_block(i64 18040285541467748) #5
  %2 = call i8* @_sym_get_parameter_expression(i8 0) #5
  %3 = icmp eq i8* %2, null
  br i1 %3, label %7, label %4

4:                                                ; preds = %1
  %5 = call i8* @_sym_build_integer(i64 202, i8 8) #5
  %6 = call i8* @_sym_build_equal(i8* nonnull %2, i8* nonnull %5) #5
  br label %7

7:                                                ; preds = %1, %4
  %8 = phi i8* [ null, %1 ], [ %6, %4 ]
  %9 = icmp eq i8 %0, -54
  %10 = icmp eq i8* %8, null
  br i1 %10, label %13, label %11

11:                                               ; preds = %7
  %12 = call i8* @_sym_build_bool_to_bits(i8* nonnull %8, i8 32) #5
  br label %13

13:                                               ; preds = %7, %11
  %14 = phi i8* [ null, %7 ], [ %12, %11 ]
  %15 = sext i1 %9 to i32
  call void @_sym_set_return_expression(i8* %14) #5
  ret i32 %15
}

The problem is that _sym_build_bool_to_bits builds an If-Then-Else like if (cond, 0x0...01, 0x0...0) which is correct only in case of a zext operation but not for a sext operation. Indeed, SymCC is not able to generate an alternative input on the example:

SYMCC_OUTPUT_DIR=`pwd`/out ./main < input.txt
This is SymCC running with the QSYM backend
Reading program input until EOF (use Ctrl+D in a terminal)...
[STAT] SMT: { "solving_time": 0, "total_time": 531 }
[STAT] SMT: { "solving_time": 285 }
[STAT] SMT: { "solving_time": 285, "total_time": 1115 }
[STAT] SMT: { "solving_time": 498 }
Ok

One possible fix could be to provide, e.g., _sym_build_bool_to_sign_bits and use it in visitCastInst for the i1 case iff the instruction is Instruction::SExt.

Let me know if you want a PR along this direction or if we should design a slightly different fix.

Nice find! And thanks for all the debug information 😊

I'm wondering if _sym_build_bool_to_bits is doing more than necessary 🤔 How about we make it return an expression for i1 unconditionally, which we then feed to either _sym_build_sext or _sym_build_zext? The downside would be an additional call into the runtime, but since there's no branching the CPU should be able to handle it rather well. And the code would fit nicely into visitCastInst... What do you think?

Your example is a really nice candidate for the test suite too. I can add it with the fix.

I have made PR #110. Let me know it if ok :)