Example of a Crashing App in Rust

This repository is a how-to guide on debugging a crashing app using lldb and core-dump-handler.

Prerequisites

This example assumes you have installed the core-dump-handler into your kubernetes cluster.

Install the cdcli client on your machine. Download the latest build from releases https://github.com/IBM/core-dump-handler/releases page.Extract the cdcli from the zip folder and place it in a folder that is in your $PATH.

Creating a core dump

To start with you need to generate a core dump. The code in the example-crashing-rust-app project takes care of that.

The project code has three nested calls inside a main function with the final call creating an explicit panic!.

Just enough for you to see how the call stack lays out for an application and do some investigation around that.

example-crashing-rust-app is a normal Rust project with the following release build configuration in the Cargo.toml.

[profile.release]
debug = true
panic = "abort"

The debug = true line adds the -g flag to the build so the exe will contain symbols to assist with debugging.

While panic = "abort" enables panics to generate core dumps so not only will it catch system errors but we can also raise them from application logic as well.

Log into your kubernetes cluster and run the prebuilt image in a pod on the server. This will fail automatically and cause a core dump to be created.

kubectl run -i -t crasher --image=quay.io/icdh/example-crashing-rust-app --restart=Never

Locate the image

Now look in your object storage and find the name of the zip file that was created.

e.g. d19ef2ef-35d3-4224-8293-f4f9509868f8-dump-1634327833-crasher-example-crashin-1-11.zip

Each item in the name breaks down as

d19ef2ef-35d3-4224-8293-f4f9509868f8 - The guid to ensure the name is unique.
dump - the type of zip
1634327833 - the time the dump occurred
crasher-example-crashin - the name of the application (N.B this is truncated)
1 - The pid of the process
11 - The signal that was sent to the process

Start Debugging

Now you can run the cdcli command to start a debugging session.

An example command is:

cdcli -c d19ef2ef-35d3-4224-8293-f4f9509868f8-dump-1634327833-crasher-example-crashin-1-11.zip -i quay.io/icdh/example-crashing-rust-app -e example-crashing-rust-app

Where the -c option is the core zip file in the bucket. The original image is referenced with the -i option. As the name of the exe is longer than the OS allows you also need to supply the full name of the exe with the -e parameter.

A full list of config options can be seen by running

cdcli --help

Once you have ran the cdcli command You will be presented with the following output.

Debugging: example-crashing-rust-app 
Runtime: default 
Namespace: observe
Debug Image: quay.io/icdh/default 
App Image: quay.io/icdh/example-crashing-rust-app
Sending pod config using kubectl
stdout: debugger-06e3166c-f113-4291-81f8-8cf2839942c1
Defaulted container "debug-container" out of: debug-container, core-container
error: unable to upgrade connection: container not found ("debug-container")

Retrying connection...
Defaulted container "debug-container" out of: debug-container, core-container

If for some reason the container fails to start the you can kill the session by pressing CTL-C

Notice the cdcli will keep retrying to connect to the container if it isn't started yet.

You are now logged into a container on the kubernetes cluster and will see a command prompt.

[debugger@debugger-06e3166c-f113-4291-81f8-8cf2839942c1 debug]$

Inspect the contents of the debug environment

Now run an ls command to see the content of the folder.

ls
d19ef2ef-35d3-4224-8293-f4f9509868f8-dump-1634327833-crasher-example-crashin-1-11
d19ef2ef-35d3-4224-8293-f4f9509868f8-dump-1634327833-crasher-example-crashin-1-11.zip  
init.sh  
rundebug.sh

You can see the folder containing the core dump and some helper scripts. The init.sh script is used by the system to layout the folder structure and isn't needed for debugging.

Run the env command to see that the location of the core file and the executable are available as environment variables.

...
S3_BUCKET_NAME=cos-core-dump-store
EXE_LOCATION=/shared/example-crashing-rust-app
PWD=/debug
HOME=/home/debugger
CORE_LOCATION=d19ef2ef-35d3-4224-8293-f4f9509868f8-dump-1634327833-crasher-example-crashin-1-11/d19ef2ef-35d3-4224-8293-f4f9509868f8-dump-1634327833-crasher-example-crashin-1-11.core
...

Start a debugging session

You can now start a debug session by simply running the rundebug.sh script.

./rundebug.sh

You will see the command that is ran and be given the lldb command prompt with the core and the exe preloaded.

(lldb) target create "/shared/example-crashing-rust-app" --core
"d19ef2ef-35d3-4224-8293-f4f9509868f8-dump-1634327833-crasher-example-crashin-1-11/
d19ef2ef-35d3-4224-8293-f4f9509868f8-dump-1634327833-crasher-example-crashin-1-11.core"
Core file '/debug/d19ef2ef-35d3-4224-8293-f4f9509868f8-dump-1634327833-crasher-example-crashin-1-11/
d19ef2ef-35d3-4224-8293-f4f9509868f8-dump-1634327833-crasher-example-crashin-1-11.core' 
(x86_64) was loaded.
(lldb)

Now you are ready to start inspecting the core dump.

First you can now look at the backtrace by running the bt command

bt
thread #1, name = 'example-crashin', stop reason = signal SIGSEGV
    frame #0: 0x00007f29d6a66d39 example-crashing-rust-app`abort + 129
    frame #1: 0x00007f29d6a4fc47 example-crashing-rust-app`panic_abort::__rust_start_panic::abort::hc9ba977db9d5330c at lib.rs:43:17
    frame #2: 0x00007f29d6a4fc26 example-crashing-rust-app`__rust_start_panic at lib.rs:38:5
    frame #3: 0x00007f29d6a4532c example-crashing-rust-app`rust_panic at panicking.rs:670:9
    frame #4: 0x00007f29d6a452cb example-crashing-rust-app`std::panicking::rust_panic_with_hook::hca09fd4c19242a20 at panicking.rs:640:5
    frame #5: 0x00007f29d6a339f4 example-crashing-rust-app`std::panicking::begin_panic::_$u7b$$u7b$closure$u7d$$u7d$::hc735bce12f1e36d5 at panicking.rs:542:9
    frame #6: 0x00007f29d6a339bc example-crashing-rust-app`std::sys_common::backtrace::__rust_end_short_backtrace::ha173c0e3158c9985(f=<unavailable>) at backtrace.rs:141:18
    frame #7: 0x00007f29d6a3105c example-crashing-rust-app`std::panicking::begin_panic::h6197dbc48048c483(msg=(data_ptr = "", length = 4)) at panicking.rs:541:12
   frame #8: 0x00007f29d6a33c88 example-crashing-rust-app`example_crashing_rust_app::bar::h48db1e5d2e4e6220(input=(data_ptr = "hello world\xd6)\U0000007f", length = 11)) at main.rs:17:5
   frame #9: 0x00007f29d6a33c06 example-crashing-rust-app`example_crashing_rust_app::foo::h6f1c5c5323d069a1(input=<unavailable>) at main.rs:12:5
    frame #10: 0x00007f29d6a33bf9 example-crashing-rust-app`example_crashing_rust_app::do_test::ha134fca868990e15 at main.rs:7:5
    frame #11: 0x00007f29d6a33b86 example-crashing-rust-app`example_crashing_rust_app::main::h31e9353150d7f0d7 at main.rs:2:5

You could use the long hand

thread backtrace all

You can see at the start that the program exited with a SIGSEGV or segmentation fault raised by the panic in our code.

The call stack represents the order of calls as they were executed before the panic. Let's select the last call in our logic before the the first panic was raised. In the example output that would be frame #8 Type the command or the line that corresponds to example-crashing-rust-app example_crashing_rust_app::bar

f 8

This is short hand for the following which can also be typed.

frame select 8

The output of either command will be

frame #8: 0x00007f29d6a33c88 example-crashing-rust-app`example_crashing_rust_app::bar
::h48db1e5d2e4e6220(input=(data_ptr = "hello world\xd6)\U0000007f", length = 11)) 
at main.rs:17:5

The output represents the currently selected frame and also shows the values passed. In this case the value was input=(data_ptr = "hello world\xd6)\U0000007f", length = 11)

As the function bar doesn't do much lets look at the frame where the string was created.

f 10

Now you can inspect the variables to see what the inner values of the function was.

frame variables

Both of these commands will show us the value of the text variable before it was passed to the function.

(alloc::string::String) text = {
  vec = {
    buf = {
      ptr = (pointer = "hello world\xd6)\U0000007f", _marker = 
        core::marker::PhantomData<unsigned char> @ 0x00007ffd76719e00)
      cap = 12
      alloc = {}
    }
    len = 11
  }
}

Integrating the source code

That's great if the code is small and easy to follow but what about more complex scenarios? If you have access to the code you can configure the debugger to use it and print out the code when you select the frame.

Exit the debugger by typing quit

quit

Now check out the souce code repository.

git clone https://github.com/No9/example-crashing-rust-app.git

Now start the debugger

./rundebug

And set the source code to your downloaded location

settings set -- target.source-map "/app" "/debug/example-crashing-rust-app"

N.B. /app relates to the WORKDIR location in the Dockerfile

Now when you move to a frame you also get the related source code.

f 10

Returns

frame #10: 0x00007f29d6a33bf9 example-crashing-rust-app`example_crashing_rust_app::do_test::
   ha134fca868990e15 at main.rs:7:5
   4   	
   5   	pub fn do_test() -> Result<(), Box<dyn std::error::Error>> {
   6   	    let text = format!("hello {}", "world"); 
-> 7   	    foo(&text.as_str());
   8   	    Ok(())
   9   	}
   10

With an arrow -> at line indicating where the next call on the stack was made. Line 7 on this example.

Clean up

Now quit the debugger.

quit

And exit the pod

exit

The debugging pod should now be deleted

pod "debugger-e2775f05-a5ff-4023-80fc-a14180c3b9e6" deleted

Summary

Well done you've just done a core dump analysis on a Rust application! You should now be able to understand the benefits of capturing cores as they provide a very easy way to capture issues in environments that aren't easy to access and should also give you the confidence to panic applications when they reach an unknown state rather than trying to make erroneous computations.

No9 / example-crashing-rust-app