avast / retdec-regression-tests-framework

A framework for writing and running regression tests for RetDec and related tools.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Nondeterministic C source code parsing between platforms

PeterMatula opened this issue · comments

The C sources that are being tested are parsed using clang. The problem is that the result of this parsing (AST) is not always the same on all the supported platforms (Linux, Windows, macOS). Difference can probably occur even between machines using the same platform. Even if the same version of clang is used, there can be differences. It looks like system includes play a role here. The problem is most prominent in call expression parsing, but probably can occur in other situations as well.

Example:

#include <stdlib.h>

#include <arpa/inet.h>
#include <ctype.h>
#include <errno.h>
#include <fcntl.h>
#include <netinet/in.h>
#include <signal.h>
#include <stdbool.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stropts.h>
#include <sys/prctl.h>
#include <sys/select.h>
#include <sys/socket.h>
#include <sys/wait.h>
#include <time.h>
#include <unistd.h>

int main()
{
        int32_t set;
        sigaddset((struct _TYPEDEF_sigset_t *)&set, SIGINT):
}
  • Linux parses it ok and recognizes sigaddset call.
  • macOS does not parse the call at all - it completely ignores it.
  • macOS without the long list of includes parses it ok and recognizes sigaddset call.
  • macOS without the type cast (i.e. sigaddset(&set, SIGINT)) parses the call as __sigbits.

Another example is parsing of strlcpy() call (without proper type signature). Linux parses it ok, but macOS does not. It parses it only if the function (and its calls) have the full signature of

size_t strlcpy(char * restrict dst, const char * restrict src, size_t dstsize);

Solutions:

  • Can we force clang to use some custom set of includes that would be the same everywhere?
  • Can we remove the #include statements from C sources before parsing it? (=> I don't think so, without them, some other function calls may not get parsed.)
  • Are includes the only problem?