eurecom-s3 / symcc

SymCC: efficient compiler-based symbolic execution

Home Page:http://www.s3.eurecom.fr/tools/symbolic_execution/symcc.html

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Use musl to overcome the problem of libc wrapper incomplete

tiedaoxiaotubie opened this issue · comments

Hi, I noticed previous issue #23 has mentioned we can try to use musl to replace some libc functions during the instrumentation. My question is: suppose we are instrumenting a large-scale problem, and we are not familiar with its building configuration, if I want to use musl to replace specific libc function in the target program (e.g., use the implementation of qsort to replace the qsort in libc), it there any convenient approach?

What if I first use symcc to instrument qsort, and then use LD_PRELOAD to replace all qsort with our instrumented qsort.so? However, the implementation of qsort is not self-contained, not sure whether it is doable.

I believe if you use LD_PRELOAD for qsort then the symbolic runtime will also use the instrumented version of qsort. This will add unnecessary overhead.

I'm trying to use partial linking and objcopy --localize-symbols to address the problem you were describing. At the moment, I can link one executable against both glibc and musl. Here's my minimal setup:

  • func.c contains a single function func that simply wraps around strlen. The goal is to link this strlen against the glibc one. This file is intended to model the use case of symcc where the symbolic runtime itself isn't symbolically executed (and therefore shouldn't be linked against the instrumented musl).
  • main.c calls func and strlen, where func still calls the copy of strlen from glibc and main uses the strlen from musl.

You can fine-tune which functions you'd like to call from musl by modifying the localization list.

Unfortunately, I can't see an easy way to integrate this into a large build system (there's some extra work required for partial linking). It also segfaults when I replicate the same approach with actual symcc.

What about use #define strlen musl_strlen in the target program source code? Since we only modify the source code of the target program, it won't affect symcc. musl_strlen is an extern, it is a wapper of the strlen implementation in musl. In this way, the original strlen will be replaced by musl_strlen during the compiling.

I think that would work. The macros can be defined in the headers of musl so the target program doesn't have to change. All we need is a version of musl with every exported function/symbol attached with the musl_ prefix. I really wish there's a more generic approach for avoiding conflicts. The instrumented version of libc++ works fine with libstdc++ because the c++ always renames the functions internally.