klee / klee

KLEE Symbolic Execution Engine

Home Page:https://klee-se.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Wrong variable name observed in `.ktest` file

YikeZhou opened this issue · comments

Bug description

Two symbolic variables array_0 and array_1 were defined. KLEE completed and exited successfully. However, variable names in .ktest files seemed to be broken.

  • .kquery: both array_0 and array_1 appeared (GOOD)
  • .ktest: 2 objects shared the same name array_1 (BAD)

Example code

#include <alloca.h>
#include <stdio.h>
#include <string.h>

#include <klee/klee.h>

int main() {
  int array[2];

  for (int i = 0; i < 2; i++) {
    char *s = (char *)alloca(50);
    sprintf(s, "%s", "array");
    sprintf(s + strlen(s), "_%d", i);

    int temp;
    klee_make_symbolic(&temp, sizeof(temp), s);
    array[i] = temp;
  }

  int identical = 0;
  if (array[0] == array[1])
    identical++;

  return 0;
}

Compiled with clang:

clang-11 -emit-llvm -c -g -O0 -Xclang -disable-O0-optnone main.c

KLEE cmdline:

klee --libc=uclibc --posix-runtime --write-kqueries main.bc

Output of KLEE:

KLEE: NOTE: Using POSIX model: /usr/local/lib/klee/runtime/libkleeRuntimePOSIX64_Debug+Asserts.bca
KLEE: NOTE: Using klee-uclibc : /usr/local/lib/klee/runtime/klee-uclibc.bca
KLEE: output directory is "/home/zyk/Projects/C/klee_examples/for_loop/klee-out-0"
KLEE: Using Z3 solver backend
KLEE: WARNING: executable has module level assembly (ignoring)
KLEE: WARNING ONCE: calling external: syscall(16, 0, 21505, 94780079623312) at klee/runtime/POSIX/fd.c:1012 10
KLEE: WARNING ONCE: Alignment of memory from call "malloc" is not modelled. Using alignment of 8.
KLEE: WARNING ONCE: calling __klee_posix_wrapped_main with extra arguments.

KLEE: done: total instructions = 39295
KLEE: done: completed paths = 2
KLEE: done: partially completed paths = 0
KLEE: done: generated tests = 2

Inspecting klee-last/test000001.ktest with ktest-tool:

$ ktest-tool klee-last/test000001.ktest
ktest file : 'klee-last/test000001.ktest'
args       : ['main.bc']
num objects: 3
object 0: name: 'model_version'
object 0: size: 4
object 0: data: b'\x01\x00\x00\x00'
object 0: hex : 0x01000000
object 0: int : 1
object 0: uint: 1
object 0: text: ....
object 1: name: 'array_1'
object 1: size: 4
object 1: data: b'\xff\x00\x00\x00'
object 1: hex : 0xff000000
object 1: int : 255
object 1: uint: 255
object 1: text: ....
object 2: name: 'array_1'
object 2: size: 4
object 2: data: b'\x00\x00\x00\x00'
object 2: hex : 0x00000000
object 2: int : 0
object 2: uint: 0
object 2: text: ....

Content of klee-last/test000001.kquery:

array array_0[4] : w32 -> w8 = symbolic
array array_1[4] : w32 -> w8 = symbolic
array model_version[4] : w32 -> w8 = symbolic
(query [(Eq 1
             (ReadLSB w32 0 model_version))
         (Eq false
             (Eq (ReadLSB w32 0 array_0)
                 (ReadLSB w32 0 array_1)))]
        false)

This archive file contains the C source file along with bitcode and KLEE's output:
for_loop.tar.gz

Platform information

OS version: Ubuntu 22.04.1 LTS

Output of klee --version:

KLEE 3.0-pre (https://klee.github.io)
  Build mode: RelWithDebInfo (Asserts: ON)
  Build revision: 667ce0f1ef33c32fbe2d1836fc1b334066e244ca

LLVM (http://llvm.org/):
  LLVM version 11.1.0
  
  Optimized build.
  Default target: x86_64-pc-linux-gnu
  Host CPU: znver1

Just a quick guess: I think it's the name handling here:

mo->setName(name);
temp is hoisted, hence the same mo is renamed.

Yes, this is the issue, the code is essentially making symbolic the same variable twice, due to the way the LLVM code is generated.
A workaround is to allocate space for temp on each iteration on the heap.

The hoisting is indeed an issue.
@YikeZhou Can you use the option --klee-call-optimisation=false and check if this solves your problem?

There is a longer discussion here #1059 and here #1008.

Thank you for your advice. It’s been very helpful!

A workaround is to allocate space for temp on each iteration on the heap.

According to this, I've modified the for-loop in the example and it worked!

for (int i = 0; i < 2; i++) {
  char s[50];
  sprintf(s, "%s_%d", "array", i);

  int *temp = (int *)malloc(sizeof(int)); // <-- heap space allocated here
  klee_make_symbolic(temp, sizeof(*temp), s);
  array[i] = *temp;
  free(temp); // <-- and freed here
}

@MartinNowack I tried --klee-call-optimisation=false, but this problem still exists.

After investigating the relative discussions you have mentioned, I tried to compare:

  1. main() in main.bc (by clang directly)
  2. __klee_posix_wrapped_main() found in assembly.ll (generated by KLEE)

And the only difference (excluding debug info) I could find was this:
(TL;DR) One br instruction was removed.

; Function Attrs: noinline nounwind uwtable
define dso_local i32 @main() #0 !dbg !7 {
  %1 = alloca i32, align 4
  %2 = alloca [2 x i32], align 4
  %3 = alloca i32, align 4
  %4 = alloca [50 x i8], align 16
  %5 = alloca i32, align 4                        ; ■■■ The variable "temp" ■■■
  %6 = alloca i32, align 4

  ; ... omitted ...

10:                                               ; preds = %7
  call void @llvm.dbg.declare(metadata [50 x i8]* %4, metadata !24, metadata !DIExpression()), !dbg !30
  %11 = getelementptr inbounds [50 x i8], [50 x i8]* %4, i64 0, i64 0, !dbg !31
  %12 = load i32, i32* %3, align 4, !dbg !32
  %13 = call i32 (i8*, i8*, ...) @sprintf(i8* %11, i8* getelementptr inbounds ([6 x i8], [6 x i8]* @.str, i64 0, i64 0), i8* getelementptr inbounds ([6 x i8], [6 x i8]* @.str.1, i64 0, i64 0), i32 %12) #4, !dbg !33
  call void @llvm.dbg.declare(metadata i32* %5, metadata !34, metadata !DIExpression()), !dbg !35
  %14 = bitcast i32* %5 to i8*, !dbg !36
  %15 = getelementptr inbounds [50 x i8], [50 x i8]* %4, i64 0, i64 0, !dbg !37
  call void @klee_make_symbolic(i8* %14, i64 4, i8* %15), !dbg !38 ; ■■■ Make "temp" symbolic here ■■■
  %16 = load i32, i32* %5, align 4, !dbg !39
  %17 = load i32, i32* %3, align 4, !dbg !40
  %18 = sext i32 %17 to i64, !dbg !41
  %19 = getelementptr inbounds [2 x i32], [2 x i32]* %2, i64 0, i64 %18, !dbg !41
  store i32 %16, i32* %19, align 4, !dbg !42
  br label %20, !dbg !43                          ;             ■■■ This was optimized out in assembly.ll ■■■

20:                                               ; preds = %10 ■■■ This was optimized out in assembly.ll ■■■
  %21 = load i32, i32* %3, align 4, !dbg !44
  %22 = add nsw i32 %21, 1, !dbg !44
  store i32 %22, i32* %3, align 4, !dbg !44
  br label %7, !dbg !45, !llvm.loop !46

  ; ... omitted ...
}

Then I looked into the line pointed out by @251. Here is my guess:

After having executed klee_make_symbolic twice, two Arrays (named array_0 and array_1 respectively) were bound to the same MemoryObject named array_1.

klee/lib/Core/Executor.cpp

Lines 4308 to 4317 in 667ce0f

// Find a unique name for this array. First try the original name,
// or if that fails try adding a unique identifier.
unsigned id = 0;
std::string uniqueName = name;
while (!state.arrayNames.insert(uniqueName).second) {
uniqueName = name + "_" + llvm::utostr(++id);
}
const Array *array = arrayCache.CreateArray(uniqueName, mo->size);
bindObjectInState(state, mo, false, array);
state.addSymbolic(mo, array);

void ExecutionState::addSymbolic(const MemoryObject *mo, const Array *array) {
symbolics.emplace_back(ref<const MemoryObject>(mo), array);
}

However, KTest objects took the MemoryObject's name returned by Executor::getSymbolicSolution. This led to the problem.

res.push_back(std::make_pair(state.symbolics[i].first->name, values[i]));

Would it be ok to simply replace first by second here?

Looking forward to your response and thanks in advance!