GaloisInc / saw-script

The SAW scripting language.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Allow more MIR types to be "flexibly" embedded into Cryptol

RyanGlScott opened this issue · comments

Currently, the SAW MIR backend suffers from some notable drawbacks:

  1. Both the MIR u32 and i32 type are mapped to [32] in Cryptol, but when mapping [32] from Cryptol back into MIR (e.g., through the mir_term command), we arbitrarily choose to convert it to u32, not i32. (And similarly for all other primitive integral types.) To prevent this from being overly burdensome, we relax the type equality judgment in the MIR backend such that u32 and i32 are deemed to be equal types (see the checkCompatibleTys function), but this runs the risk of accepting ill-typed SAW specifications.
  2. MIR struct and enum types cannot be mapped into Cryptol at all. In order to interface between MIR structs/enums and Cryptol, you have to build the struct/enum values out of Cryptol-compatible values, which can often be clunky and cumbersome.

Issue (1) is mildly annoying, but issue (2) is very annoying, as it prevents certain classes of Rust functions from having elegant SAW specifications. For instance, anything involving the Wrapping type (a struct) becomes very tedious to write, which is made worse by the fact that Wrapping is used a lot in the broader Rust ecosystem.

Both of these issues ultimately have the same root cause: MIR's type system is richer than Cryptol's type system, and as a result, there are some MIR types that cannot be represented in Cryptol without necessarily losing some information. For instance, given this code:

struct S1 {
    x: u32,
    y: u64,
}

struct S2 {
    x: u32,
    y: u64,
}

Then we could envision mapping the S1 struct to the Cryptol record type { x : [32], y : [64] }. However, we could just as well map S2 to the same type. Therefore, what MIR type should we get if we write mir_term {{ _ : { x : [32], y : [64] } }}? Should we get S1? S2? Another struct? The answer was unclear to me when I first designed the SAW MIR backend, so I ultimately excluded MIR structs from being mapped into Cryptol. However, feedback suggests that this restriction goes to far, so we should consider how to make something like this possible.

My proposal: we make SAW's dynamic typing more "flexible". That is, when you write mir_term {{ _ : { x : [32], y : [64] } }}, it should be able to represent S1, S2, or any other struct with compatible field types depending on the context it is used. This would be a departure from established SAW conventions, as SAW currently expects all SAWScript expressions to have a single, unambiguous type. (See the typeOfSetupValue function.) But I think changing the conventions here would be worthwhile, as it would make the SAW<->Cryptol interoperability story much nicer.

To spell things out in a little more detail:

  1. Integral types such as u<N> and i<N> would continue to map to the Cryptol [<N>] type as they do currently. The Cryptol [<N>] type could map to u<N> or i<N> depending on the surrounding context it is used.
  2. Struct types would map to Cryptol record types (if the struct type has named fields) or Cryptol tuple types (if the struct type does not have named fields). For struct newtypes, e.g., Wrapping<ty>, they would map to the same Cryptol type that <ty> does. The resulting Cryptol type could then map back to any number of MIR types that have the same field names and types.

This proposed refactoring would primarily benefit the MIR backend, but it would also benefit the LLVM backend. For instance, LLVM has both unpacked struct types (e.g., { u32, u64 }) and unpacked struct types (e.g., <{ u32, u64 }>), but this distinction is lost when we map it into a Cryptol tuple in SAW (we map all Cryptol tuples back to unpacked structs in LLVM). Using the "flexible" typing discipline described above, we could allow SAW specifications involving both unpacked and packed structs alike.

Another use case for flexible Term typechecking is being able to map a Cryptol [32] value to a MIR char value. (Note that in Rust, each char is a single Unicode code point that requires 32 bits to represent.) Cryptol doesn't have a native char type, as typing a character literal like 'a' will desugar to the corresponding [8] value (in a's case, it desugars to 0x61). As a result, it would be handy if a user could write mir_term {{ zext 'a' : [32] }} and have it treated like the Rust character 'a'.

Note that before we can properly support the use of Cryptol enums in MIR specifications, we must first be able to import them as SAWCore. That part is blocked on #2052.