What happens to the validiy requirements of the return value on a tail call?

Question

What happens to the validiy requirements of the return value on a tail call?

RalfJung opened this issue 5 months ago · comments

This came up here. @WaffleLapkin provided an example:

#![feature(explicit_tail_calls)]
use std::num::NonZeroU8;
use std::mem::transmute;

fn a() -> u8 { 0 }

fn b() -> NonZeroU8 { unsafe {
    become transmute::<fn() -> u8, fn() -> NonZeroU8>(a)();
} }

fn c() {
    let _x: u8 = unsafe { transmute::<fn() -> NonZeroU8, fn() -> u8>(b)() };
}

As a normal call, there would be UB when b returns, since the return place at that moment contains something that is invalid for b's return type (despite being valid for c's return type).

However, with tail calls, b is long gone by the time a returns. a returns directly to c, and both caller and callee see a return type of u8. So arguably this program should be allowed? More fundamentally, not allowing this program means that Miri / the spec would have to keep around the return type of all the stack frames that were "popped early" due to tail calls, and ensure that the eventual return type is valid according to all of them -- which seems to go against the very idea of a tail call.

Waffle Maybe · Answer 1 · Thu Feb 15 2024 20:22:04 GMT+0800 (China Standard Time)

Another idea could be that tail-calling a function with a wrong type is UB no matter if it returns a compatible value. That would allow us to check everything when "entering" the tail call and Miri won't need to keep the stack around. (I'm not sure if this is a good idea).

Optimization wise I'm not sure we need UB at all though. It seems like to do inlining or to replace tail calls with normal calls you need to know the function body anyway, at which point you can just check if it's called with the right type.

Christopher Durham · Answer 2 · Sun Feb 25 2024 05:22:13 GMT+0800 (China Standard Time)

The way I've conceptualized become so far is that, at the spec level, it only moves end of scope cleanup (drop glue, invalidate local places) to before the becomed call (instead of after the returned call), but otherwise does a standard function call. I understand this is more of a "requested" tail call than the "required" tail call that people would like to have.

Maybe it's due to having discussed function call compatibility recently, but this initially smells to me like it should use similar rules. To try to clarify, that the value is "encoded" onto the ABI when returned and "decoded" from the ABI by the call that receives the return; this interpretation would support become being "transparent" and no UB occurring in this case.

Ralf Jung · Answer 3 · Sun Feb 25 2024 19:24:35 GMT+0800 (China Standard Time)

this interpretation would support become being "transparent" and no UB occurring in this case.

You are contradicting your first paragraph here. With regular return, my example above is definitely UB.

Christopher Durham · Answer 4 · Mon Feb 26 2024 02:35:06 GMT+0800 (China Standard Time)

You are contradicting your first paragraph

Ah, sorry; this was actually intentional, but I failed to make it clear why. Obviously, if become is a "requested" tail call, the validity requirements are no different than with return.

However, if become is a "required" tail call, then I suggest that this example should not be UB, and that this follows from an interpretation of become as "transparent" to the return ABI; that the return ABI skips over any become frames, moving via ABI encode/decode pair directly from the returned value to the non-becomed function call result.

This is technically still a distinct axis from become providing a codegen guarantee of TCO. I think the idea underlying this is that become transparency is what "required" tail calls would mean at the AM level.

A very ugly alternative which makes this example UB without requiring the AM to maintain stack context for arbitrary-many "become frames" would be to only validate the "top" frame. The purpose would be to permit propagation of function result type annotation across transmute (so _x could be inferred nonzero since it was produced by fn b).

I don't like it, but it would work. Having the AM maintain the stack frames and just optimizing out the frames for the concrete seems a much better approach, and Miri could optimize to not accumulate become frames, only maintaining a smallvec of return types.

Another awkward-bad idea: forbid becomeing function pointers. (This would be required for versions of become where the compiler would build a trampoline context to guarantee constant stack usage on targets where TCO can't be guaranteed.)

To extract an actual (but still partial) answer to the question of the OP:

If become guarantees concrete TCO (i.e. causes a build error when the target codegen won't TCO the specific setup), then the "transparent" semantics are strongly desirable; it follows from ABI reasoning that seems reasonable given that low level ABI guarantee.

If become will still compile when TCO can't be guaranteed, then I prefer the semantics where the AM stack frame remains and drop timing is the only opsem impact of become. (This doesn't preclude a warning or even deny-by-default for non-TCO scenarios.)

Because I have a strong desire to avoid check/build error divergence and use cases that enjoy the drop timing impact independent of TCO, I have a preference for the latter, but see the benefit to the former.