RustCrypto / RSA

RSA implementation in pure Rust

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

modpow implementation is not constant-time

phayes opened this issue · comments

Hi there,

I'm the author of sidefuzz (https://github.com/phayes/sidefuzz) and I have found what appears to be variable-time behavior in the rsa::internals::encrypt() function. Specifically, rsa::internals::encrypt() appears to be variable-time in relation to the message. Note that I haven't worked this up into an actual exploit, but merely demonstrated that this function isn't constant-time in relation to the message inputed.

Specifically, the message 20d90c8af42aac9b1ee53dc9a0187201 takes 549894 instructions to encrypt, while the message 5a28cec68d47f6fe3b1df54c9f320f6d takes 552427 instruction to encrypt. This is a difference of 2533 instructions, or about 0.5%. So it's a slight difference, but probably exploitable with sufficient sampling.

I have crated a fuzzing targets here: https://github.com/phayes/sidefuzz-targets

You can confirm this difference with the sidefuzz tool like so:

sidefuzz check ./target/wasm32-unknown-unknown/release/rsa_encrypt_message.wasm 5a28cec68d47f6fe3b1df54c9f320f6d 20d90c8af42aac9b1ee53dc9a0187201
samples: 20000, t-value: 219771.0790572351, confidence: 100%
Found timing difference of 2533 instructions between these two inputs with 100% confidence:
input 1: 5a28cec68d47f6fe3b1df54c9f320f6d (552427 instructions)
input 2: 20d90c8af42aac9b1ee53dc9a0187201 (549894 instructions)

My first suspicion was that this was due to num_bigint_dig::BigUint::from_bytes_be() being variable-time, but fuzzing that function specifically results in what appears to be constant-time behavior. So I'm not actually sure where the problem is.

IIRC RSA blinding (which I believe is used in this implementation) uses random number, so while execution time is still variable, it does not correlate with a protected secret.

Hi @newpavlov ,

I don't think that's the case here. The fuzzing target is directly testing this function:

/// Raw RSA encryption of m with the public key. No padding is performed.
#[inline]
pub fn encrypt<K: PublicKey>(key: &K, m: &BigUint) -> BigUint {
    m.modpow(key.e(), key.n())

So the problem is going to be in the implementation of modpow.

I've also got another fuzzing target that does full pkcs1v15 padding with a statically seeded CPRNG. It displays the same variable-time behavior (with a statically seeded deterministic PRNG mind-you), but the first fuzzing target was the more minimal case, so that's what I reported. (sidefuzz also accounts for RNGs by repeated sampling and taking a t-value, but this doesn't apply here)

Admittedly, this could also be an artifact of the fuzzer (which would be a bug in the fuzzer), but I don't think that's the case here either.

crypto-bigint now has a constant-time modpow implementation in the form of DynResidue::pow, however it uses Montgomery form internally so it may not be the fastest option for straight modpow since converting in/out of Montgomery form is costly, and it still monomorphizes around a fixed number of limbs.

For the purposes of rsa (and dsa) we probably need to add a new Uint type like UintVec which uses a number of limbs chosen at runtime as a proper num_bigint_dig::BigUint replacement.

Edit: work on crypto_bigint::BoxedUint has started.

Until this is addressed it seems like something we should more prominently highlight in security-related documentation.

Edit: opened #373 (and merged!)

Hi! 👋

I've recently been working on RSA side-channel attacks, and found that many implementations are vulnerable. That's described in the Marvin Attack.

Thanks to help by @ueno, who contributed the test harness, I was able to run the test against rust-crypto (exact versions of all packages are in the PR).

Unfortunately I have to report that the side-channel leakage from the numerical library is very substantial and easily detectable over the network. As such, I consider RustCrypto RSA to be vulnerable and exploitable.

Test results from a run with 100k repeats per probe on an i9-12900KS @ 5.225GHz:

Sign test mean p-value: 0.2189, median p-value: 0.06354, min p-value: 3.946e-60
Friedman test (chisquare approximation) for all samples
p-value: 8.827139162990364e-108
Worst pair: 2(no_padding_48), 4(signature_padding_8)
Mean of differences: -8.07298e-07s, 95% CI: -9.06918e-07s, -7.019407e-07s (±1.025e-07s)
Median of differences: -8.44500e-07s, 95% CI: -9.78000e-07s, -7.200000e-07s (±1.290e-07s)
Trimmed mean (5%) of differences: -7.27307e-07s, 95% CI: -8.13648e-07s, -6.316737e-07s (±9.099e-08s)
Trimmed mean (25%) of differences: -8.48652e-07s, 95% CI: -9.17708e-07s, -7.769631e-07s (±7.037e-08s)
Trimmed mean (45%) of differences: -9.56301e-07s, 95% CI: -1.05768e-06s, -8.603802e-07s (±9.865e-08s)
Trimean of differences: -4.72000e-07s, 95% CI: -5.76750e-07s, -3.635625e-07s (±1.066e-07s)

pairwise results are in report.txt

confidence intervals for the measurements are as follows:
conf_interval_plot_trim_mean_25

legend:

ID,Name
0,header_only
1,no_header_with_payload_48
2,no_padding_48
3,no_structure
4,signature_padding_8
5,valid_0
6,valid_48
7,valid_192
8,valid_246
9,valid_repeated_byte_payload_246_1
10,valid_repeated_byte_payload_246_255
11,zero_byte_in_padding_48_4

explanations of that are in the step2.py script in marvin-toolkit repo.

In other words, the protection from adding blinding is not sufficient, and Rust Crypto has at least the same issue as the CVE-2022-4304 in OpenSSL.

@tomato42 indeed that's using PKCS#1v1.5 without random blinding, so with modpow being non-constant-time, it's to be expected. We should definitely get rid of any APIs which permit private key use without random blinding.

Also just a note for the future, per our SECURITY.md we would've appreciated an advisory for this opened under a private security disclosure.

@tarcieri If the de-blinding isn't performed using constant-time code, then use of blinding won't remove the side-channel signal, that's what the bug in OpenSSL was about: removal of the blinder and conversion to the constant-length bytes needs to be side-channel free too.

Also just a note for the future, per our SECURITY.md we would've appreciated an advisory for this opened under a private security disclosure.

ah, sorry about that; since this was linked as a security issue, I've assumed that the effect of it on security of RSA decryption was assumed public too.

Our answer until now has been that the application of random blinding prevented such sidechannels, so the fact that isn't the case is news to us

sorry, I'm confused, on one hand you say that the the privkey.decrypt(Pkcs1v15Encrypt, &buffer); won't use blinding, yet that's the public API that you say is protected by use of blinding in README.md...?

The README.md is wrong in that case.

(Also I'm just discovering this and haven't had time to look over any code yet to confirm specific details)

ok, I'm still running test with a larger sample, to see if there aren't additional sources of leakage. If you have any additional questions, feel free to ask

I'd be curious if you saw similar sidechannels in non-PKCS#1v1.5 constructions like OAEP decryption or PSS signatures

if you could propose a change on top of that PR that adds an OAEP decryption, I can run tests for that too (there are scripts in marvin-toolkit for testing OAEP too)

but since the leak is clearly from the numerical library, that means OAEP is also vulnerable, as that leak happens before any OAEP checks

even if PSS leaks like that, it doesn't make the implementation as easily exploitable, as both PKCS#1v1.5 an OAEP attacks are chosen ciphertext attacks, with PSS (or with PKCS#1 v1.5 signatures) the attacker can only reasonably affect the hash being signed, not the whole plaintext
(in other words, it's a fundamentally different attack, so the test for it needs to be different too)

Can you confirm you see the sidechannel against the raw blinded modpow operation (when used with e.g. rand_core::OsRng)? That's really what I'm curious about.

Sorry, I'm not a rust developer, you will need to hand-hold me (provide the complete code to execute) if you want me to run any tests

The branch containing that code has disappeared: https://github.com/ueno/marvin-toolkit/tree/wip/rust-crypto

You can test the low-level "hazmat" API: https://docs.rs/rsa/latest/rsa/hazmat/fn.rsa_decrypt.html

It would look something like:

https://github.com/tomato42/marvin-toolkit/pull/4/files#diff-c0f6b76f1d52cf643fb2f30ef85b81b20ace1aa7577d1f4603db0e091e052d66R50-R63

...replaced with:

    loop {
        let mut buffer = vec![0; args.size];
        let n = infile.read(&mut buffer)?;
        if n == 0 {
            break;
        }

        assert!(n == buffer.len());

        let now = Instant::now();
        let c = rsa::BigUint::from_bytes_be(&buffer);
        let _ =  rsa::hazmat::rsa_decrypt(Some(&mut rand_core::OsRng), &privkey, &c);
        writeln!(outfile, "{}", now.elapsed().as_nanos())?;
    }

Note you'll need to add rand_core as a dependency and enable the hazmat feature of rsa:

https://github.com/tomato42/marvin-toolkit/pull/4/files#diff-76856434fe4ad8f1f778ba0452c8d8e49e764d25adf1374d385634cd251e1ca2R6-R9

[dependencies]
anyhow = "1"
clap = { version = "4", features = ["derive"] }
rand_core = { version = "0.6", features = ["getrandom"] }
rsa = { version = "0.9", features = ["hazmat"] }

(will test rsa_decrypt from hazmat later)

With 1 million observations per sample (same setup) I'm getting:

Sign test mean p-value: 0.09834, median p-value: 2.633e-08, min p-value: 0.0
Friedman test (chisquare approximation) for all samples
p-value: 0.0
Worst pair: 2(no_padding_48), 4(signature_padding_8)
Mean of differences: -2.06196e-06s, 95% CI: -2.09851e-06s, -2.022175e-06s (±3.817e-08s)
Median of differences: -3.22400e-06s, 95% CI: -3.24600e-06s, -3.205000e-06s (±2.050e-08s)
Trimmed mean (5%) of differences: -1.99461e-06s, 95% CI: -2.02364e-06s, -1.965729e-06s (±2.895e-08s)
Trimmed mean (25%) of differences: -2.08608e-06s, 95% CI: -2.10654e-06s, -2.067761e-06s (±1.939e-08s)
Trimmed mean (45%) of differences: -3.09282e-06s, 95% CI: -3.11547e-06s, -3.070944e-06s (±2.227e-08s)
Trimean of differences: -2.22769e-06s, 95% CI: -2.25800e-06s, -2.198494e-06s (±2.975e-08s

conf_interval_plot_trim_mean_25

and pairwise results:
report.txt

Which show that the errors in padding checks also leak (compare valid_48 to valid_246: sign test p-value of 5.13e-7)

ran the tests against this:

        let c = rsa::BigUint::from_bytes_be(&buffer);
        let _ =  rsa::hazmat::rsa_decrypt(Some(&mut rand_core::OsRng), &privkey, &c);

and have confirmed that it's leaking too.

Same overall configuration but with 2.5 million observations per probe:

Sign test mean p-value: 0.1915, median p-value: 0.008017, min p-value: 1.428e-15
Friedman test (chisquare approximation) for all samples
p-value: 6.50448690941627e-27
Worst pair: 0(no_padding_48), 3(too_short_payload_0_1)
Mean of differences: -1.11105e-07s, 95% CI: -1.42260e-07s, -7.775684e-08s (±3.225e-08s)
Median of differences: -9.50000e-08s, 95% CI: -1.19000e-07s, -7.300000e-08s (±2.300e-08s)
Trimmed mean (5%) of differences: -9.89875e-08s, 95% CI: -1.25208e-07s, -7.384255e-08s (±2.568e-08s)
Trimmed mean (25%) of differences: -9.23689e-08s, 95% CI: -1.18946e-07s, -6.803432e-08s (±2.546e-08s)
Trimmed mean (45%) of differences: -9.30858e-08s, 95% CI: -1.17708e-07s, -7.039496e-08s (±2.366e-08s)
Trimean of differences: -9.47500e-08s, 95% CI: -1.20250e-07s, -7.056250e-08s (±2.484e-08s)

conf_interval_plot_trim_mean_45

Since for raw RSA padding doesn't matter I've executed a slightly different set of tests:

ID,Name
0,no_padding_48
1,no_structure
2,signature_padding_8
3,too_short_payload_0_1
4,too_short_payload_0_3
5,too_short_payload_0_7
6,too_short_payload_0_15
7,valid_0
8,valid_repeated_byte_payload_245_0

and pairwise statistical test results:
report.txt

I'm not sure what solution we can provide here short of a fully constant time implementation (which was the planned long-term solution for this problem, using e.g. a Barrett reduction as provided by crypto-bigint, but that will require a substantial amount of work)

I guess I need to figure out how to run your reproducer.

Can you explain a bit more what those charts are showing? Are you actually able to recover keys?

I'm not sure what solution we can provide here short of a fully constant time implementation

what's necessary, is ensuring that the unblinding happens in constant time with respect to the output; that means you need to convert the whatever arbitrary precision format is in use for the modulo exponentiation into constant length integers, then multiply them using real constant-time code, and finally do constant time reduction modulo. Something like what's in https://github.com/tomato42/ctmpi but in rust, not in C. Leakage in conversion from one format to the other is not a problem as that value is uncorrelated with the RSA plaintext, so leakage of it doesn't provide useful information to the attacker (as long as the blinding/unblinding factors remain secret).

That being said, if your mod_exp is leaky, you really have to employ both base blinding and exponent blinding. Details of that, as well as links to papers showing attacks against implementations that used just one kind of blinding are in the https://datatracker.ietf.org/doc/html/draft-kario-rsa-guidance-02
But to fix raw RSA implementation against Bleichenbacher/Marvin you just need to fix the last multiplication and reduction modulo.

Can you explain a bit more what those charts are showing?

they show the estimation of the difference in timing for different inputs. Like in the last one, no_padding_48 (probe 0) is processed about 83 ns slower than no_structure (probe 1): that's why it's marked 1-0 (or "probe 1 time subtract probe 0 time")

Are you actually able to recover keys?

those graphs show that it's possible, using time of the decryption, to execute one instance of the Bleichenbacher oracle. To decrypt a RSA ciphertext (or forge a signature) the attacker would need to run the test good few thousand times. See the details on the Marvin vulnerability page and in the associated papers. Since my goal is proving that there is no side-channel, not actually attacking implementations, I don't have good tooling to actually perform the attack (but then there's 25 years of literature on the topic, so once you have a working oracle, there are ready solutions for the rest).

Another run for rsa_decrypt, this time with 10.5 million observations per sample:

Sign test mean p-value: 0.1359, median p-value: 2.134e-07, min p-value: 1.433e-60
Friedman test (chisquare approximation) for all samples
p-value: 9.61782206869971e-128
Worst pair: 0(no_padding_48), 8(valid_repeated_byte_payload_245_0)
Mean of differences: -8.28395e-08s, 95% CI: -9.87044e-08s, -6.762672e-08s (±1.554e-08s)
Median of differences: -9.10000e-08s, 95% CI: -1.02000e-07s, -8.000000e-08s (±1.100e-08s)
Trimmed mean (5%) of differences: -8.62429e-08s, 95% CI: -9.87246e-08s, -7.492710e-08s (±1.190e-08s)
Trimmed mean (25%) of differences: -8.64124e-08s, 95% CI: -9.79358e-08s, -7.520178e-08s (±1.137e-08s)
Trimmed mean (45%) of differences: -9.07003e-08s, 95% CI: -1.00972e-07s, -7.931308e-08s (±1.083e-08s)
Trimean of differences: -9.07500e-08s, 95% CI: -1.02000e-07s, -8.025000e-08s (±1.087e-08s)

conf_interval_plot_trim_mean_45

and pairwise statistical results: report.txt

Which suggests that for the leakage to happen, the whole most significant word needs to be equal 0. This will make attacks against OAEP with 2048 bit keys rather impractical against 64 bit architectures. But it does mean that both 32 bit architectures, and less common key sizes, like 2049 or 2056 bit keys, are rather realistic to attack.

Note: I haven't excluded the possibility of leakage happening with probe 3 or probe 4 (16 and 32 most significant bits being zero respectively), but it does look exactly the same as the leakage pattern for CVE-2022-4304, where I did do that.

what's necessary, is ensuring that the unblinding happens in constant time with respect to the output; that means you need to convert the whatever arbitrary precision format is in use for the modulo exponentiation into constant length integers, then multiply them using real constant-time code, and finally do constant time reduction modulo. Something like what's in https://github.com/tomato42/ctmpi but in rust, not in C.

The core modpow operation in num-bigint-dig should already implemented in terms of Montgomery multiplication: https://github.com/dignifiedquire/num-bigint/blob/6f73f0a4025164325e85f8645dad6712b3110c51/src/monty.rs#L129

That alone is insufficient, however, per the OP. I'd generally worry that num-bigint is a general-purpose arbitrary precision bignum library that wasn't designed for constant-time operation.

I could potentially attempt to completely sidestep num-bigint for this, extracting the raw limbs out of the BigUints and performing modpow in code which has been carefully written to avoid any secret-dependent branching, similar to what we have already in crypto-bigint: https://github.com/RustCrypto/crypto-bigint/blob/a02c505/src/uint/modular/reduction.rs

Rather than try to hunt down where timing sidechannels are occurring in num-bigint, I've been attempting to build out BoxedUint in crypto-bigint, which now has partial support for sharing the Montgomery multiplication implementation we already provide for the stack-allocated Residue and DynResidue types.

It's designed around a fixed (albeit dynamic) precision rather than arbitrary precision, and supports decoding BoxedUints with a precision specified a priori, allowing us to ensure that the number of limbs is constant and always equal to the limbs of the modulus. This should eliminate sidechannels due to fewer limbs because zeros are being stripped, which I suspect is what's happening per the earlier measurements.

If we go that route, it will be a somewhat invasive, breaking change, but also one we've been wanting to do for awhile (see #51). I'm open to alternative options which work with num-bigint-dig, but I'm not sure we'll actually solve this problem without switching to code which has been carefully written from the ground-up to operate in constant-time.

I am fully in favor of finally switching 😅 thanks @tarcieri for pushing crypto-bigint forward!

FWIW it looks like Go used to have similar problems due to their use of big.Int and solved the issue in a similar manner, by ripping out big.Int and replacing it with a carefully written internal cryptographic integer library: golang/go@8a81fdf

I've opened a WIP @rustsec advisory here: rustsec/advisory-db#1825

@tarcieri I can probably spend some time helping on this, can you point me at the missing pieces/where you are currently at?

@dignifiedquire great! I've been grinding away at crypto_bigint::{BoxedUint, BoxedResidue} and they should have most of the functionality necessary at this point: just added Montgomery Ladder-based modpow and modular inversion support. There are probably a bunch of little gaps but they'll mostly be things like adding core::ops impls that are missing.

I can cut a prerelease of crypto-bigint and then I think it's ready to start integrating. The trickiest thing will be ensuring constant precision throughout (i.e. that all values match the modulus in precision), although for now several operations have no widening support and will panic on a precision mismatch, so hopefully that "helps".

I was thinking a first step might be to reimplement rsa_decrypt and then attempt to validate that the new implementation closes the sidechannel.

It's also using subtle now which should assist in implementing constant-time operations, although it's stimeyed a bit by a Copy bound on ConditionallySelectable, which prevents the CtOption constant-time combinators from working: dalek-cryptography/subtle#94

I've filed RUSTSEC-2023-0071 for this vulnerability.

@tomato42 not sure if you want to add that to https://people.redhat.com/~hkario/marvin/ or wait for a CVE assignment

Thanks for the mention, yes, I've seen it. If you want, I can add it, but I planned to wait for CVE assignment.

To consumers of the rsa crate

While the attack requires the attacker to measure precisely the processing time of the ciphertext, one of the main contributions of the Marvin Attack paper is that it's is possible to measure differences of just few nanoseconds (billionth's of a second) even over the network. The ability to do that depends only of the number of measurements the attacker is able to perform, not on the size of the side channel.

The old wisdom of "side channels smaller than 100ns are not attackable over the network" is incorrect, see the vulnerability page for details.

@tomato42 after we're able to mitigate the core rsa::hazmat::rsa_decrypt API, it seems like we'll need to start handling padding failures via implicit rejection (producing a pseudorandom output)? My only familiarity with that approach is from Kyber

we'll need to start handling padding failures via implicit rejection (producing a pseudorandom output)? My only familiarity with that approach is from Kyber

if you consider deprecation of PKCS#1 v1.5 encryption padding to be not a realistic option, then yes, that's the best way to provide an API that's about as safe to use as one for OAEP

@tomato42 we've been assigned CVE-2023-49092

@tarcieri Thanks! CVE added to the page.

it seems like we'll need to start handling padding failures via implicit rejection (producing a pseudorandom output)? My only familiarity with that approach is from Kyber

This same approach is also used in most TLS implementations that support RSA encryption. You generate a fake 48 byte pre master secret, then attempt RSA decryption; if the padding is invalid, or if the padding is valid and the PMS is not 48 bytes, or if the embedded TLS version number in the PMS is incorrect, you return the pseudorandom PMS [*] and allow the Finished checks to fail, since at that point the attacker can not distinguish if the padding was correct or not.

[*] And of course all of this must be in constant time as well.

Yes, but that (generating new random 48 byte long pre master secret to use in case of padding check errors) works in TLS only because the random PMS is later mixed with ServerHello.random before it's used to decrypt the Finished message.

only because the random PMS is later mixed with ServerHello.random

Unfortunate it true since that suggests that for other protocols, which do not perform any mixing but only attempt to use the key directly, there is no possible fix. Since the only two possible options are returning some explicit indicator (which immediately leaks the padding validity, due to the application performing a different action based on if the padding is valid or not) or returning a randomly generated string as a rejection symbol.

(TBH though, I don't at all see why this would be the case or how mixing in the server random would change the situation.)

due to the application performing a different action based on if the padding is valid or not) or returning a randomly generated string as a rejection symbol.

and that's why with implicit rejection on the PKCS#1v1.5 level, the one described in https://datatracker.ietf.org/doc/draft-kario-rsa-guidance/ and in the Marvin paper, the API call always returns a message, and for every key-ciphertext pair, it will return the same ciphertext over and over, even if in actuality the PKCS#1v1.5 padding check failed. Thus the attacker has no oracle to tell which ones are good and which ones are bad, as all "look good".

(TBH though, I don't at all see why this would be the case or how mixing in the server random would change the situation.)

because we have two cases:

  1. good padding, decrypts to a message, unexpected value (e.g. not matching the AES ciphertext), but the same value every time
  2. bad padding, decrypts to a random message on every try

so the behaviour of application in case 1 will be consistent for every try, while in case 2 it will be more random (in case of aes-cbc there's 1 in 256 chance that that padding check passes, which will provide huge difference in behaviour)

in case of TLS, those two different situations are also combined with the ServerHello.random, which is different for every try, so the subsequent actions based on that random value will also be random, independent if the decryption returns the same value every time or if it returns random value every time

how stable is the rfc draft? is it at a stage where it makes sense to start implementing?

The algorithm is already implemented and shipped in NSS and OpenSSL, so bearing something catastrophic discovered during review, I consider the algorithm final.

FWIW ring that is used by rustls also implements constant-time bigint arithmetic for RSA purposes: https://github.com/briansmith/ring/blob/main/src/arithmetic/

This could be worth a look for potential similar vulnerabilities and/or code sharing.

@Shnatsel contributions to marvin-toolkit welcome (see example/ directory)

For posterity, here's one source of timing variability: calls to "normalize" within Karatsuba

The comment says "the adds go faster if we drop any unneeded 0s from the end".

Similar problems exist within the implementation of Almost Montgomery Multiplication, which not only calls normalize repeatedly but also does some data-dependent branching/operations:

Any update on this?

#394 is where work on this is happening. You can see updates there.