Validate const value ranges in CLIF Validator

Question

Validate const value ranges in CLIF Validator

afonso360 opened this issue 3 years ago · comments

Feature

As discussed in #3056 (comment) we allow const instructions with values that are way too large for their type (i.e. 0xff00 for an i8). We should reject these modules in the CLIF validator.

Benefit

Rejecting these modules is a good way to prevent bugs in code generators using cranelift as a backend. As well as preventing issues where backends are not expecting constant values larger than the instruction type. Similar to #3056 (although this one is "legal") but with even more surprising values.

Implementation

The ideal place to implement this is in the CLIF validator, so that we can reject these CLIF modules.

Because CLIF values do not have signedness until time of use, we must allow both signed and unsigned ranges of values (-128..127 or 0..255 in the i8 case).

Alternatives

We can keep ignoring the issue (as we do now), or we can define some other behavior, such as masking the bottom bits depending on the type. But this has all the hallmarks of a bug in whatever code generator is producing this code, so we should probably not silently allow these modules.

Jamey Sharp · Answer 1 · Sat Feb 04 2023 03:16:56 GMT+0800 (China Standard Time)

I think we do need this verifier pass. I'm finding it's difficult to reason about high bits in mid-end optimization passes, so we're getting them wrong frequently.

bjorn3 · Answer 2 · Sat Feb 04 2023 04:07:13 GMT+0800 (China Standard Time)

Because CLIF values do not have signedness until time of use, we must allow both signed and unsigned ranges of values (-128..127 or 0..255 in the i8 case).

If we switch from Imm64 to a new Uimm64, it should be possible to limit it to just 0..255 and always zero-extend when creating the iconst from whichever frontend creates the clif ir. That would also remove a considerable amount of casts from cg_clif as rustc uses unsigned integers for immediates rather than signed integers.

Jamey Sharp · Answer 3 · Thu Mar 02 2023 02:59:21 GMT+0800 (China Standard Time)

Fixing this issue doesn't need us to replace Imm64 with Uimm64. We can certainly discuss doing that too—it's probably a good idea—but for this purpose, we can mask off the high bits even though the type is technically signed.

We also don't need to allow e.g. -128..127. I think the conclusion we've recently come to in #5700 is that the upper bits should all be zero, which means an i8 can only be 0..255. If it's used in a signed way, those 8 bits will be sign-extended as needed to reinterpret them as -128..127.

So fixing this issue only requires that cranelift/codegen/src/verifier/mod.rs check that, for example, an iconst.i8 only has bits set within a mask of 0xFF. If somebody wants to pick this up I'm happy to go into more detail about how to do that.

Timothée Jourde · Answer 4 · Tue Aug 15 2023 08:03:16 GMT+0800 (China Standard Time)

Hi! 🙂 If these checks are still needed I'd like to contribute!

Jamey Sharp · Answer 5 · Tue Aug 15 2023 08:22:31 GMT+0800 (China Standard Time)

Yes please, that would be great! Let us know if you have any questions.

Timothée Jourde · Answer 6 · Tue Aug 15 2023 22:36:57 GMT+0800 (China Standard Time)

@jameysharp in addition to Opcode::Iconst, should I also cover Opcode::Vconst? Also maybe Opcode::F32const and Opcode::F64const?

Jamey Sharp · Answer 7 · Wed Aug 16 2023 01:32:54 GMT+0800 (China Standard Time)

Good question! I don't think any of those have range limits outside of their representation. For example, the operand to f32const is represented as an Ieee32, which is just a 32-bit value that preserves exactly the provided bits. There are no values for Ieee32 that are not valid for use with f32const, so we don't need to check anything about that instruction in the verifier. The trouble with iconst is that we always store the value in a 64-bit integer but, depending on the type attached to the instruction, not all of the bits are valid, so that's why we need special checks for that instruction in particular.

Timothée Jourde · Answer 8 · Wed Aug 16 2023 02:27:47 GMT+0800 (China Standard Time)

Got it! Thanks for the explanation. 🙂 SIMD consts and their variable length representations were a bit scary. 😅 I have a patch that adds checks for iconst, now I need to add some tests and that should be good. 👍