Give more info when a UB trap is detected in zig cc debug mode

Question

Give more info when a UB trap is detected in zig cc debug mode

rivten opened this issue 4 years ago · comments

I'm sorry to jump out on that sort of issue again, because it was partially treated in issues such as #4830.

To comment @andrewrk in the previous issue :

Figuring out why such checks are always enabled in zig cc is left as an exercise for the reader.

I can speak to this. zig cc is intentionally an interface to the zig compiler, a bit higher level than using clang directly. With no optimization flags specified, zig cc infers debug mode. As you know from writing Zig code, debug mode has safety checks to prevent undefined behavior at runtime. This applies for C code as well, taking advantage of clang's UBSAN. The fact that debug mode is "default" is entirely intentional. I expect this to identify many bugs in existing codebases as people use zig cc out of convenience or curiosity and their code gets vetted by UBSAN for the first time.

The part about detecting issues in other code is 100% true. I've been using zig as a C compiler for some personal project and I've been detected already two UB in major code (nuklear and stb_printf).
But while this is incredibly useful at detecting faulty code, I think it could be better at giving out info about what's wrong. In the two cases, my workflow had been :

See that my code crashes in Debug build.
Launch gdb, trying to understand what went wrong, check out my user code several times.
Finally understand that the SIGILL triggered might be linked to clang UBSAN.
Recompile the whole project without build.zig using only clang -fsanitize=undefined, without the trap flag
Now run again, and see the exact error that clang outputs (even if it keeps the program running with UB). Now have a precise clue about what part of the C spec that has been violated by the library code that I'm using.

Just a plain crash without info is pretty harsh. I'm 100% in favor of not having UB code running, and I'm 100% in favor of detecting more UB code in the future, but I think right now we lack the tool to properly investigate the exact issue encountered by the UB sanitizer.

Maybe there's a flag that I'm missing. From clang doc, I tried -fno-sanitize-recover=undefined in my compile flags, but it did not worked.

Lachlan Easton · Answer 1 · Wed Sep 16 2020 16:22:00 GMT+0800 (China Standard Time)

This is high on my wish list. Locally I've commented out args.append("-fsanitize-trap=undefined"); from src/codegen.cpp and added exe.linkSystemLibrary("ubsan"); to my build.zig which gives the desired /home/lachlan/bx/include/tinystl/buffer.h:143:17: runtime error: applying non-zero offset 72 to null pointer for example. Without it, the trap location is at the end(ie. nowhere near the call site) of the (hundreds of lines long) calling function in a completely different file, which all makes it very hard to debug.
I could easily see people being turned away when there code that worked when compiled with clang suddenly fails without explanation when compiled with zig.

LemonBoy · Answer 2 · Wed Sep 16 2020 19:27:50 GMT+0800 (China Standard Time)

I had started a pure-Zig port of miniubsan in #5165

theinternetftw · Answer 3 · Thu Apr 01 2021 11:12:32 GMT+0800 (China Standard Time)

I just wanted to share this, both for users looking for solutions to getting more information out of these UBSan traps and who come across this page as I did, as well as to make sure the devs are aware of this already existing alternative on the off-chance they don't know about it.

I'm developing freestanding code with zig cc and wanted the benefits of getting more information than a trap allows. I've just discovered that passing -fno-sanitize-trap=undefined is sufficient to undo the -fsanitize-trap that zig does internally. This generates calls to the ubsan runtime instead of traps that can then be implemented by those using the compiler. Likewise, if users then pass -fsanitize-minimal-runtime, the calls are replaced with calls to the minimal runtime, which look much quicker to implement. Freestanding users considering this can see the "Port UBSan mini-runtime" link above to see how little is needed for the minimal case. (Edit: -fno-sanitize-recover=undefined should be added to any case above if you want to retain zig's crash-on-UB policy, otherwise it calls the recover versions of the runtime functions.)

I assume but have not tested that linking the ubsan library (like LakeByTheWoods does above) would allow non-freestanding users to get full functionality without implementing anything.

In any case, it's good to know that it appears that full UBSan functionality is available to zig and zig cc users without having to patch the compiler.

Tom Read Cutting · Answer 4 · Sun Jun 27 2021 19:49:38 GMT+0800 (China Standard Time)

Hi - I'm keen to have a proper crack at this issue, in particular doing this comprehensively & solving for the different cases people might have. This includes supporting both the "full" & "minimal" ubsan functions.

So first thing I want to establish, is that if we aren't just going to be using LLVM's ubsan handlers (see https://github.com/llvm-mirror/compiler-rt/blob/master/lib/ubsan), which benefits are we hoping to see in having custom zig-implemented versions of these functions?

I've had a look at #5165 and the comments people had there - and will try and take those all into account, but basic rough ideas include:

Providing a better experience for zig user that are using zig cc for zig libraries & want ubsan error handling to well for that case.
Providing a better experience for C/C++ users using zig as a C/C++ compiler.

I think ideally we would also provide options for making it easier for end-users to deal with ubsan traps (minimal or full-fat) with their own functions in some form if they want to as well - so that can better slot-into existing assertion/error-loggin or other debugging systems, whilst still providing sensible default behaviour.

Should this be implemented incrementall (maybe addressing the minimal runtime first) with multiple PRs? Or should I look into just doing this as a full-fat feature with all bells & whistles up front. I was also lookig at using as a starting point #5165 if that was OK with @LemonBoy .

Finally - this feels like it would be a fairly big feature - so are there any requirements for regression tests or similar to ensure this keeps working as intended?

EDIT: Update 2021-07-32, nothing interesting to report, just that I am working on it bit by bit

Tom Read Cutting · Answer 5 · Mon Jul 26 2021 17:30:53 GMT+0800 (China Standard Time)

My intent is currently to create a PR that will have implementations for the full ubsan runtime & the minimal ubsan runtime.

However - there's quite a lot that needs implementing for the full ubsan runtime - so I'm currently focusing on handling integer overflows specifically. If I were to create a PR (not ready for merge) sometime soon that just has that one specific case being handled. Would anyone please be able to give feedback on the overall approach?

I think that would probably be better from my perspective to make sure that I'm generally heading in the right direction - as opposed to creating a full-on ubsan implementation that then does not really align with the goals of the zig project & therefore would not be merged or require significant refactoring to be merged.

If so - I should be able to get something together for next weekend.

Tom Read Cutting · Answer 6 · Tue Jul 27 2021 23:09:04 GMT+0800 (China Standard Time)

In lieue of creating a full-on PR (because it is relatively minimal) - I've uploaded a gist containing an implementation of handling various integer overflows & division by 0 (alongisde minimal code copied from #5165 which I haven't really touched - I want to get "full" ubsan working first and then will try and come back and improve the minimal implementation accordingly).

@LemonBoy some feedback on what I have so far would be great if you would please be able to provide some! Just because there are a decent number of handlers to implement so having feedback on the first few so I can apply that first before fleshing out further handlers would be great.

https://gist.github.com/moosichu/ee3dd8407653640a36fb9625ac586082

Tom Read Cutting · Answer 7 · Wed Aug 11 2021 15:44:28 GMT+0800 (China Standard Time)

Progress is still being made on this slowly but surely, feedback on the gist would still be great if anyone could take the time (or I'm happy to re-organise it into another format for those purposes).

One of the questions I have is with regards to the following

thread 31003 panic: ubsan: Shift exponent 10000 is too large for 64-bit type 'uint64_t' (aka 'unsigned long')
src/Game/GameBoostrap/GameBootstrap.cpp:48:28

src/ThirdParty/ZigUbsan/ubsan.zig:178:23: 0x2884c7 in handleShiftOutOfBounds (ubsan)
            report_log("ubsan: Shift exponent {} is too large for {}-bit type {s}\n{s}:{}:{}\n", .{ exponent, base_integer_size, lhs_type_descriptor.getNameAsString(), source_location.file_name, source_location.line, source_location.column });
                      ^
src/ThirdParty/ZigUbsan/ubsan.zig:198:35: 0x288380 in handlers.abort_handler (ubsan)
            handleShiftOutOfBounds(abort_log, shift_out_of_bounds_data, lhs, rhs);
                                  ^
src/Game/GameBoostrap/GameBootstrap.cpp:48:28: 0x2202d3 in GameUpdate (src/Platform/main_sdl.cpp)
    auto result = someTest << bitShift;
                           ^
src/Game/GameMain.cpp:34:23: 0x21f30f in GameUpdateAndRender (src/Platform/main_sdl.cpp)
        gameBoostrap->GameUpdate(platformServices, hasReloaded, PROJECT_NAME);
                      ^
src/Platform/main_sdl.cpp:696:34: 0x21ebb7 in main (src/Platform/main_sdl.cpp)
                    shouldQuit = game_code.GameUpdateAndRender(&platformServices, hasReloaded) || shouldQuit;

So the panic system works quite well in showing where the error is occurring - however it slightly unfortunate that the error-handling code itself (the ubsan implementation) generates a couple of lines of noise (that maybe should be hidden?) before getting to the point of interest.

I feel like you might ideally want to get error reports that look like this:

thread 31003 panic: ubsan: Shift exponent 10000 is too large for 64-bit type 'uint64_t' (aka 'unsigned long')
src/Game/GameBoostrap/GameBootstrap.cpp:48:28: 0x2202d3 in GameUpdate (src/Platform/main_sdl.cpp)
    auto result = someTest << bitShift;
                           ^
src/Game/GameMain.cpp:34:23: 0x21f30f in GameUpdateAndRender (src/Platform/main_sdl.cpp)
        gameBoostrap->GameUpdate(platformServices, hasReloaded, PROJECT_NAME);
                      ^
src/Platform/main_sdl.cpp:696:34: 0x21ebb7 in main (src/Platform/main_sdl.cpp)
                    shouldQuit = game_code.GameUpdateAndRender(&platformServices, hasReloaded) || shouldQuit;

Is this something that would be possible to do?

Tom Read Cutting · Answer 8 · Fri Aug 20 2021 15:43:21 GMT+0800 (China Standard Time)

As a heads-up, one of the problems I'm running into is that some parts of ubsan seem to be only implementable in C++.

https://github.com/llvm-mirror/compiler-rt/blob/69445f095c22aac2388f939bedebf224a6efcdaf/lib/ubsan/ubsan_type_hash_itanium.cpp

A good example is the implementation of _ubsan::checkTypeInfoEquality which does some reflection on types by through the std::type_info type. So to perform the operations that requires I might have to call into some C++? Although it's not something I'm delighted about.

Tom Read Cutting · Answer 9 · Thu Sep 02 2021 22:16:24 GMT+0800 (China Standard Time)

Becuase it looks simpler to implement and could get me a PR together that I can actually use as a precedent to understand what might be needed for the ubsan PR - would an implementation of ASAN traps be something that might be merged upstream for the C/C++ clang compiler? Similar to ubsan - you can provide the asan flag but the lack of exposed trapping procedures leads to a failed compile.

Tom Read Cutting · Answer 10 · Tue Aug 08 2023 14:04:09 GMT+0800 (China Standard Time)

Just realised I forgot to mentioned that I have picked this up again! See: https://github.com/moosichu/zigsan.git

rdebath · Answer 11 · Mon Sep 04 2023 15:27:01 GMT+0800 (China Standard Time)

Just to NOTE: Luckily I decided to raise a bug when zig cc's code generator "created broken code".

Oh, and the idea: "No Optimise" ==== "Debug mode" ... is not even close to reality.
Most languages don't even have the concept of turning optimisation off (so it's obviously not generally useful) and the problem of bugs disappearing when the compile changes even has it's own name.

Claudio Bley · Answer 12 · Tue May 07 2024 02:39:39 GMT+0800 (China Standard Time)

I tried to pass -fno-sanitize-trap=undefined as described by theinternetftw, but that requires the ubsan library.

Since I am cross compiling to Linux arm64 / Darwin x86_64 + arm64, and I don't have that library at hand (for all systems), the only way to make it work was to pass -fno-sanitize=all to the compiler and linker in order to disable the UB sanitizer completely. Hope that helps someone with the same problem.

BTW, I recompiled my app with clang and the component that was using UB turned out to be libunwind from llvm (llvm/llvm-project#91144)