Nondeterministic C source code parsing between platforms
PeterMatula opened this issue · comments
The C sources that are being tested are parsed using clang. The problem is that the result of this parsing (AST) is not always the same on all the supported platforms (Linux, Windows, macOS). Difference can probably occur even between machines using the same platform. Even if the same version of clang is used, there can be differences. It looks like system includes play a role here. The problem is most prominent in call expression parsing, but probably can occur in other situations as well.
Example:
#include <stdlib.h>
#include <arpa/inet.h>
#include <ctype.h>
#include <errno.h>
#include <fcntl.h>
#include <netinet/in.h>
#include <signal.h>
#include <stdbool.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stropts.h>
#include <sys/prctl.h>
#include <sys/select.h>
#include <sys/socket.h>
#include <sys/wait.h>
#include <time.h>
#include <unistd.h>
int main()
{
int32_t set;
sigaddset((struct _TYPEDEF_sigset_t *)&set, SIGINT):
}
- Linux parses it ok and recognizes
sigaddset
call. - macOS does not parse the call at all - it completely ignores it.
- macOS without the long list of includes parses it ok and recognizes
sigaddset
call. - macOS without the type cast (i.e.
sigaddset(&set, SIGINT)
) parses the call as__sigbits
.
Another example is parsing of strlcpy()
call (without proper type signature). Linux parses it ok, but macOS does not. It parses it only if the function (and its calls) have the full signature of
size_t strlcpy(char * restrict dst, const char * restrict src, size_t dstsize);
Solutions:
- Can we force clang to use some custom set of includes that would be the same everywhere?
- Can we remove the
#include
statements from C sources before parsing it? (=> I don't think so, without them, some other function calls may not get parsed.) - Are includes the only problem?