What happens to the validiy requirements of the return value on a tail call?
RalfJung opened this issue · comments
This came up here. @WaffleLapkin provided an example:
#![feature(explicit_tail_calls)]
use std::num::NonZeroU8;
use std::mem::transmute;
fn a() -> u8 { 0 }
fn b() -> NonZeroU8 { unsafe {
become transmute::<fn() -> u8, fn() -> NonZeroU8>(a)();
} }
fn c() {
let _x: u8 = unsafe { transmute::<fn() -> NonZeroU8, fn() -> u8>(b)() };
}
As a normal call, there would be UB when b
returns, since the return place at that moment contains something that is invalid for b
's return type (despite being valid for c
's return type).
However, with tail calls, b
is long gone by the time a
returns. a
returns directly to c
, and both caller and callee see a return type of u8
. So arguably this program should be allowed? More fundamentally, not allowing this program means that Miri / the spec would have to keep around the return type of all the stack frames that were "popped early" due to tail calls, and ensure that the eventual return type is valid according to all of them -- which seems to go against the very idea of a tail call.
Another idea could be that tail-calling a function with a wrong type is UB no matter if it returns a compatible value. That would allow us to check everything when "entering" the tail call and Miri won't need to keep the stack around. (I'm not sure if this is a good idea).
Optimization wise I'm not sure we need UB at all though. It seems like to do inlining or to replace tail calls with normal calls you need to know the function body anyway, at which point you can just check if it's called with the right type.
The way I've conceptualized become
so far is that, at the spec level, it only moves end of scope cleanup (drop glue, invalidate local places) to before the become
d call (instead of after the return
ed call), but otherwise does a standard function call. I understand this is more of a "requested" tail call than the "required" tail call that people would like to have.
Maybe it's due to having discussed function call compatibility recently, but this initially smells to me like it should use similar rules. To try to clarify, that the value is "encoded" onto the ABI when returned and "decoded" from the ABI by the call that receives the return; this interpretation would support become
being "transparent" and no UB occurring in this case.
this interpretation would support become being "transparent" and no UB occurring in this case.
You are contradicting your first paragraph here. With regular return
, my example above is definitely UB.
You are contradicting your first paragraph
Ah, sorry; this was actually intentional, but I failed to make it clear why. Obviously, if become
is a "requested" tail call, the validity requirements are no different than with return
.
However, if become
is a "required" tail call, then I suggest that this example should not be UB, and that this follows from an interpretation of become
as "transparent" to the return
ABI; that the return
ABI skips over any become
frames, moving via ABI encode/decode pair directly from the return
ed value to the non-become
d function call result.
This is technically still a distinct axis from become
providing a codegen guarantee of TCO. I think the idea underlying this is that become
transparency is what "required" tail calls would mean at the AM level.
A very ugly alternative which makes this example UB without requiring the AM to maintain stack context for arbitrary-many "become
frames" would be to only validate the "top" frame. The purpose would be to permit propagation of function result type annotation across transmute
(so _x
could be inferred nonzero since it was produced by fn b
).
I don't like it, but it would work. Having the AM maintain the stack frames and just optimizing out the frames for the concrete seems a much better approach, and Miri could optimize to not accumulate become
frames, only maintaining a smallvec of return types.
Another awkward-bad idea: forbid become
ing function pointers. (This would be required for versions of become
where the compiler would build a trampoline context to guarantee constant stack usage on targets where TCO can't be guaranteed.)
To extract an actual (but still partial) answer to the question of the OP:
If become
guarantees concrete TCO (i.e. causes a build error when the target codegen won't TCO the specific setup), then the "transparent" semantics are strongly desirable; it follows from ABI reasoning that seems reasonable given that low level ABI guarantee.
If become
will still compile when TCO can't be guaranteed, then I prefer the semantics where the AM stack frame remains and drop timing is the only opsem impact of become
. (This doesn't preclude a warning or even deny-by-default for non-TCO scenarios.)
Because I have a strong desire to avoid check
/build
error divergence and use cases that enjoy the drop timing impact independent of TCO, I have a preference for the latter, but see the benefit to the former.