Allow more MIR types to be "flexibly" embedded into Cryptol
RyanGlScott opened this issue · comments
Currently, the SAW MIR backend suffers from some notable drawbacks:
- Both the MIR
u32
andi32
type are mapped to[32]
in Cryptol, but when mapping[32]
from Cryptol back into MIR (e.g., through themir_term
command), we arbitrarily choose to convert it tou32
, noti32
. (And similarly for all other primitive integral types.) To prevent this from being overly burdensome, we relax the type equality judgment in the MIR backend such thatu32
andi32
are deemed to be equal types (see thecheckCompatibleTys
function), but this runs the risk of accepting ill-typed SAW specifications. - MIR struct and enum types cannot be mapped into Cryptol at all. In order to interface between MIR structs/enums and Cryptol, you have to build the struct/enum values out of Cryptol-compatible values, which can often be clunky and cumbersome.
Issue (1) is mildly annoying, but issue (2) is very annoying, as it prevents certain classes of Rust functions from having elegant SAW specifications. For instance, anything involving the Wrapping
type (a struct) becomes very tedious to write, which is made worse by the fact that Wrapping
is used a lot in the broader Rust ecosystem.
Both of these issues ultimately have the same root cause: MIR's type system is richer than Cryptol's type system, and as a result, there are some MIR types that cannot be represented in Cryptol without necessarily losing some information. For instance, given this code:
struct S1 {
x: u32,
y: u64,
}
struct S2 {
x: u32,
y: u64,
}
Then we could envision mapping the S1
struct to the Cryptol record type { x : [32], y : [64] }
. However, we could just as well map S2
to the same type. Therefore, what MIR type should we get if we write mir_term {{ _ : { x : [32], y : [64] } }}
? Should we get S1
? S2
? Another struct? The answer was unclear to me when I first designed the SAW MIR backend, so I ultimately excluded MIR structs from being mapped into Cryptol. However, feedback suggests that this restriction goes to far, so we should consider how to make something like this possible.
My proposal: we make SAW's dynamic typing more "flexible". That is, when you write mir_term {{ _ : { x : [32], y : [64] } }}
, it should be able to represent S1
, S2
, or any other struct with compatible field types depending on the context it is used. This would be a departure from established SAW conventions, as SAW currently expects all SAWScript expressions to have a single, unambiguous type. (See the typeOfSetupValue
function.) But I think changing the conventions here would be worthwhile, as it would make the SAW<->Cryptol interoperability story much nicer.
To spell things out in a little more detail:
- Integral types such as
u<N>
andi<N>
would continue to map to the Cryptol[<N>]
type as they do currently. The Cryptol[<N>]
type could map tou<N>
ori<N>
depending on the surrounding context it is used. - Struct types would map to Cryptol record types (if the struct type has named fields) or Cryptol tuple types (if the struct type does not have named fields). For struct newtypes, e.g.,
Wrapping<ty>
, they would map to the same Cryptol type that<ty>
does. The resulting Cryptol type could then map back to any number of MIR types that have the same field names and types.
This proposed refactoring would primarily benefit the MIR backend, but it would also benefit the LLVM backend. For instance, LLVM has both unpacked struct types (e.g., { u32, u64 }
) and unpacked struct types (e.g., <{ u32, u64 }>
), but this distinction is lost when we map it into a Cryptol tuple in SAW (we map all Cryptol tuples back to unpacked structs in LLVM). Using the "flexible" typing discipline described above, we could allow SAW specifications involving both unpacked and packed structs alike.
Another use case for flexible Term
typechecking is being able to map a Cryptol [32]
value to a MIR char
value. (Note that in Rust, each char
is a single Unicode code point that requires 32 bits to represent.) Cryptol doesn't have a native char
type, as typing a character literal like 'a'
will desugar to the corresponding [8]
value (in a
's case, it desugars to 0x61
). As a result, it would be handy if a user could write mir_term {{ zext 'a' : [32] }}
and have it treated like the Rust character 'a'
.
Note that before we can properly support the use of Cryptol enums in MIR specifications, we must first be able to import them as SAWCore. That part is blocked on #2052.