Hugal31 / yara-rust

Rust bindings for VirusTotal/Yara

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

bindgen pre-generated size_t has the wrong type on x86 and causes UB

Orycterope opened this issue · comments

When I'm compiling for windows x86, the following trivial test fails:

    #[test]
    fn check_yara_scan_mem() {
        const RULES: &str = r#"
            rule test {
                strings:
                    $a = { 42 }
                condition:
                    $a
            }
        "#;

        let rules = Compiler::new()
            .unwrap()
            .add_rules_str(RULES)
            .unwrap()
            .compile_rules()
            .unwrap();
        let matching = rules.scan_mem(&[0x00, 0x01, 0x42, 0x03], 10)
            .unwrap();
        assert_eq!(matching.len(), 1, "byte not found");
    }
check_yara_scan_mem panicked at 'called `Result::unwrap()` on an `Err` value: YaraError { kind: Unknown(53) }

On the line corresponding to rules_scan_mem().unwrap(). Error 53 corresponds to ERROR_CALLBACK_REQUIRED.

In yara-rust, the call boils down to this code in internals::scan :

    let result = unsafe {
        yara_sys::yr_rules_scan_mem(
            rules,
            mem.as_ptr(),
            mem.len().try_into().unwrap(),
            flags,
            Some(scan_callback),
            &p_callback as *const Box<dyn FnMut(CallbackMsg<'a>) -> CallbackReturn> as *mut _,
            timeout,
        )
    };

Some(scan_callback) is the function pointer to the callback, and p_callback is the user data that will be passed by libyara as an argument to scan_callback. (It's a pointer the actual callback closure that will be reconstructed by our scan_callback and then called, but this is not relevant here).

The thing is: Some(scan_callback) is definitely not a function pointer, and does not have the same size as a function pointer.

Here are the arguments that rust pushes on the stack before doing the FFI call:

[esp + 0x1c] = 0xa (timeout)
[esp + 0x18] = 0x1c23efc0 (&p_callback)
[esp + 0x14] = 0xebd820 (fn scan_callback)
[esp + 0x10] = 0x0 (tag for variant "Some")
[esp + 0x0c] = 0x0 (flags)
[esp + 0x08] = 0x4 (mem len]
[esp + 0x04] = 0x019b57d4 (mem addr)
[esp + 0x00] = 0x143b04c8 (rules)

This is one stack slot more than expected by the C function yr_rules_scan_mem, because of the space needed for the Option's variant tag. Because of this, yr_rules_scan_mem sees all variables shifted by one slot:

Screenshot_20211001_231855

callback is now 0x0 (the tag for variant Some), user_data is now callback, and timeout amounts to the address of user_data as seconds.

Officially, yr_rules_scan_mem is defined in rules.h as

YR_API int yr_rules_scan_mem(
    YR_RULES* rules,
    const uint8_t* buffer,
    size_t buffer_size,
    int flags,
    YR_CALLBACK_FUNC callback,
    void* user_data,
    int timeout)

where YR_CALLBACK_FUNC is

typedef int (*YR_CALLBACK_FUNC)(
    YR_SCAN_CONTEXT* context,
    int message,
    void* message_data,
    void* user_data);

Definitely no Option here. However, bindgen (I'm using the bundled bindings) has generated:

pub type YR_CALLBACK_FUNC = ::std::option::Option<
    unsafe extern "C" fn(
        context: *mut YR_SCAN_CONTEXT,
        message: ::std::os::raw::c_int,
        message_data: *mut ::std::os::raw::c_void,
        user_data: *mut ::std::os::raw::c_void,
    ) -> ::std::os::raw::c_int,
>;

No idea where this Option comes from.

Somehow this bug is miraculously not triggered in x64, either because the ABI is different (no stack slots, args are passed via registers), or because Option<unsafe extern "C" fn(...)> is properly optimized to be the same size as the function pointer, just like Option<NonNull<T>> is.

For now my conclusion is that bindgen is at fault here. I'll try to pinpoint the reason, and when I find it I'll open an issue on their repo linking to this one.

because Option<unsafe extern "C" fn(...)> is properly optimized to be the same size as the function pointer

This is correct, and is guaranteed by rustc. Here's one instance of the documentation guaranteeing it:

The most common type that takes advantage of the nullable pointer optimization is Option<T>, where None corresponds to null. So Option<extern "C" fn(c_int) -> c_int> is a correct way to represent a nullable function pointer using the C ABI (corresponding to the C type int (*)(int)).

I am very confident we're looking at a rustc compiler bug.

The issue is this typedef, in yara-sys/bindings/yara-4.1.2-windows.rs:

pub type size_t = ::std::os::raw::c_ulonglong;

This typedef is valid on x64, but not on x86, where size_t is 4 bytes.
The issue is that, with the bundled-4_1_2, we are using pre-generated bindings that have been generated on x64, which are not valid on x86.
This actually only impacts the windows version. On linux, it is bound to long, which is ok on 32 and 64 bits.

So right now, the bundled-4_1_2 is not safe to use when compiling for 32-bits.
A possible solution (should be safe afaict) is that those bindings should depend on the target used. If no pre-generated bindings exist for a given target, the build.rs should fail.

There is also the size_t_is_usize flag that could be used: https://docs.rs/bindgen/0.59.1/bindgen/struct.Builder.html#method.size_t_is_usize
But i'm not sure it is safe to reuse a binding file generated for a target with another target like we can right now.

I think the bundled pre-generated bindings should really be target-specific.

Also, they are meant as an easy way for a developer to quickly start working with Yara without installing the headers, but IMHO, generating the bindings on the go should be the preferred way.

@Hugal31 I can make PR with pre-generated bindigs for target. I think, is better after merge: #55

I think the bundled pre-generated bindings should really be target-specific.

Also, they are meant as an easy way for a developer to quickly start working with Yara without installing the headers, but IMHO, generating the bindings on the go should be the preferred way.

@Hugal31 I can add generate bindings for target here: #78. What do you think?

@Hugal31 I can add generate bindings for target here: #78. What do you think?

Yes, that sounds good better than nothing. Once we have a binding generated for a few targets, we can restrain the use of those bindings to those specific targets and avoid UB like this one.