klee / klee

KLEE Symbolic Execution Engine

Home Page:https://klee-se.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Track source-level program state when debug info is present

jryans opened this issue Β· comments

Context

KLEE tracks program state at the LLVM IR level. For some applications, it would be helpful to know how this maps back to some source-level state in whichever language was compiled to IR.

For example, the following C function...

int example(int n) {
  int y = 0;
  for (unsigned int i = 0; i < n; i++) {
    y += 4 + n;
  }
  return y;
}

...becomes something like the following IR using Clang 13 (-O1)...

define i32 @example(i32 %0) local_unnamed_addr #0 {
  %2 = icmp eq i32 %0, 0
  br i1 %2, label %9, label %3

3:                                                ; preds = %1
  %4 = add i32 %0, -1
  %5 = add i32 %0, 4
  %6 = mul i32 %4, %5
  %7 = add i32 %6, %0
  %8 = add i32 %7, 4
  br label %9

9:                                                ; preds = %3, %1
  %10 = phi i32 [ 0, %1 ], [ %8, %3 ]
  ret i32 %10
}

...which makes no mention of source-level variables like y, and KLEE is thus unable to follow them as it executes. This also means KLEE cannot report errors in terms of source-level variables either.

Desired outcome

Compilers like Clang can add debug info to the LLVM IR (enabled via the -g flag), which traditionally is emitted to a native binary and then read by debuggers like GDB, LLDB, etc. While current KLEE does use the file / line / column annotations in debug info when reporting stack traces, it could go further. As a future enhancement, it would be great for KLEE to use the variable debug info to map its IR-level program state up to source-level constructs when reporting to the user.

Workaround

While it's not the same as a real mapping of variables using debug info, you can get a modestly better view if your compiler names IR values based on source-level constructs. With Clang, you can add -fno-discard-value-names to achieve this, which gives something like the following...

define i32 @example(i32 %n) local_unnamed_addr #0 {
entry:
  %cmp7.not = icmp eq i32 %n, 0
  br i1 %cmp7.not, label %for.cond.cleanup, label %for.cond.cleanup.loopexit

for.cond.cleanup.loopexit:                        ; preds = %entry
  %0 = add i32 %n, -1
  %1 = add i32 %n, 4
  %2 = mul i32 %0, %1
  %3 = add i32 %2, %n
  %4 = add i32 %3, 4
  br label %for.cond.cleanup

for.cond.cleanup:                                 ; preds = %for.cond.cleanup.loopexit, %entry
  %y.0.lcssa = phi i32 [ 0, %entry ], [ %4, %for.cond.cleanup.loopexit ]
  ret i32 %y.0.lcssa
}

...where some of the IR values (such as %n for the function argument) appear with their source-level names. To be clear, this only tweaks the names alone. An unoptimised version would also have a %y IR value for the source-level variable y, but that value was removed by the optimiser, so we no longer see that name here. Source-level variables move through numerous IR values and memory locations during computation, so this value naming workaround is not enough to follow source-level program state.

I am currently working on this source-level support in KLEE as part of my ongoing research. I hope to eventually contribute it back here once it's ready for general use.

@jryans That sounds super interesting.

Just to clarify, KLEE supports debug information as long as your bitcode is compiled with it, i.e. clang-13 -O1 -g -c -emit-llvm would emit debug information as part of the IR as well, i.e. stack traces will contain the correct file/line(/column) information.

But I guess you are more focusing on the variable names? You plan to utilise the llvm.dbg.* intrinsics (https://llvm.org/docs/SourceLevelDebugging.html#format-common-intrinsics) in a more sophisticated way and map them to specific variables?

Sounds great and useful! πŸ˜„

Just to clarify, KLEE supports debug information as long as your bitcode is compiled with it, i.e. clang-13 -O1 -g -c -emit-llvm would emit debug information as part of the IR as well, i.e. stack traces will contain the correct file/line(/column) information.

Ah of course, I forgot about this use of debug info when writing up the issue. πŸ˜… I have edited my original post to acknowledge this existing support as part of stack trace reporting, so hopefully that will avoid any confusion. πŸ™‚

But I guess you are more focusing on the variable names? You plan to utilise the llvm.dbg.* intrinsics (llvm.org/docs/SourceLevelDebugging.html#format-common-intrinsics) in a more sophisticated way and map them to specific variables?

Yes, exactly. Glad to hear it sounds useful! πŸ˜„