rust-random / rand

The following program fails to run in Miri with an unaligned memory access:

use rand::Rng;

fn main()
{
    let mut rng = rand::thread_rng();
    let _val = rng.gen::<i32>();
    let _val = rng.gen::<usize>(); // causes unaligned memory access
}

This is caused by this line of code:

rand/rand_core/src/block.rs

Line 186 in 852988b

unsafe { *(&results[index] as *const u32 as *const u64) }

That casts a 4-aligned pointer to a *const i64 and then uses that for a read. i64 reads must be 8-aligned. There is a comment in the souce indicating that this "requires little-endian CPU supporting unaligned reads", but that is not sufficient: the alignment requirement is made by LLVM, not by the CPU. In Rust, all accesses must be aligned on all platforms, no matter what the CPU says; any unaligned access causes UB (on all platforms). Unaligned accesses are only allowed with read_unaligned and write_unaligned.

There is another problem with this code as well: It violates pointer provenance rules. &results[index] as *const u32 creates a raw pointer that may be used to access this particular element, but it is not legal to use this pointer to access neighboring elements that were never cast to a raw pointer. You could try to fix this with something like &results[index..index+1] as *const [u32] as *const u32 as *const u64.

Is the performance of the "fallback code path" really so much worse that all these hacks are worth it?

fill_bytes and try_fill_bytes in the same file also seem to be based on the incorrect assumption that unaligned accesses are okay in Rust when compiling for x86:

    // As an optimization we try to write directly into the output buffer.
    // This is only enabled for little-endian platforms where unaligned writes
    // are known to be safe and fast.

Thanks @RalfJung; some good questions here.

This particular example is around 13% faster with the HC128 generator on my system (gen_u64_hc128 benchmark), but with much less difference for other generators (1-4%). In my opinion, that's enough of a difference to be worth taking if it's free, but not if it causes UB.

Following your suggestions, I get around the same performance with this version of the code:

            if cfg!(any(target_arch = "x86", target_arch = "x86_64")) {
                // requires little-endian CPU supporting unaligned reads:
                let ptr: *const u64 = &results[index..index+1] as *const [u32] as *const u32 as *const u64;
                unsafe { read_unaligned(ptr) }
            }

Is this acceptable in your opinion? Knowing where to draw the line between unsafe platform-specific optimisations and safe, portable code isn't always easy.

The same performance observations appear to apply to the fill_bytes code: HC128 is around 10% faster, while other generators aren't affected much.

Of note is that HC128 is currently the algorithm behind thread_rng, but will likely be replaced in the future.

Is this acceptable in your opinion?

This looks good, I'll try to run it in Miri as well. Do you have this in a branch somewhere?

Of note is that HC128 is currently the algorithm behind thread_rng, but will likely be replaced in the future.

That's in fact how I noticed, as you can see from my example above. I should probably try to run the entire rand test suite in Miri... will it exercise all these code paths irrespective of what thread_rng chooses?

I can confirm that RalfJung@8303309 makes my small tests pass in Miri.

Running the entire test suite should probably cover most of the unsafe code, but it would be worth testing all the sub-crates in the repo (e.g. BlockRng64 is only used by Isaac64Rng which is not a dependency of the main crate).

Thanks again for stepping in to fix the unsafe pointer access; as you can tell we can do with a bit of help here.

I'm afraid the fill_bytes problem is much harder to fix. This creates an unaligned reference at

rand/rand_core/src/block.rs

Lines 234 to 237 in 9828cdf

    
           let dest_u32: &mut R::Results = unsafe { 
        
               &mut *(dest[filled..].as_mut_ptr() as 
        
               *mut <R as BlockRngCore>::Results) 
        
           };

and then passes that to generate. References must always be aligned even if they are not used.

So to fix this, generate would have to take a raw pointer instead. But that seems to be a trait method so that's not that easy...

Here, filled should always be a multiple of four, but of course that's not enough since the dest argument may not be 32-bit aligned.

I'm not sure how important the direct copy is to performance (it's only going to be encountered with large fill requests), hence we could refactor this to generate to the self.results buffer or a temporary and then copy from there.

Refactoring the trait isn't completely out of the question (it doesn't have a massive number of users), but I don't wish to force block RNG implementations to use ugly unsafe code, therefore would prefer generate does not take a raw pointer.

	let dest_u32: &mut R::Results = unsafe {
	&mut *(dest[filled..].as_mut_ptr() as
	*mut <R as BlockRngCore>::Results)
	};

rand performs unaligned memory access (and invalid raw ptr usage)