AnyDSL / thorin

The Higher-Order Intermediate Representation

Home Page:https://anydsl.github.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

using conditionally chosen function in cuda()/nvvm() triggers assertion

michael-kenzel opened this issue · comments

the following code will reproduce the issue:

#[import(cc = "thorin")] fn nvvm(_dev: i32, _grid: (i32, i32, i32), _block: (i32, i32, i32), _body: fn() -> ()) -> ();
#[import(cc = "device")] fn threadfence() -> ();

#[export]
fn test(b: bool) -> () {
	let fun = if b { @|| { asm("nanosleep.u32 0;"); } } else { threadfence };

	nvvm(0, (1, 1, 1), (1, 1, 1), @|| {
		fun();
	});
}

compilation via artic with --emit-llvm results in

src/thorin/util/cast.h:42: L* thorin::scast(R*) [with L = thorin::Global; R = thorin::Def]: Assertion `(!r || dynamic_cast<L*>(r)) && "cast not possible"' failed.

Various seemingly irrelevant changes to the code such as, e.g., turning the else option in the initalization into a lambda that simply forwards to the original function

	let fun = if b { @|| { asm("nanosleep.u32 0;"); } } else { @|| threadfence() };

seem to resolve the issue in some cases but not others. None of these workarounds appear to be reliable in the context of a more complex codebase; something what worked in one example won't work in another.

The way the runtime support code is written, it expects the body to be a global containing a continuation (lift_builtins.cpp is supposed to take care of that), but because of the way your example is written, this doesn't happen properly. One of the problems seems to be that threadfence is not getting handled properly there, maybe because it's external ?

I originally though the issue would be that your if gets turned into a select or a phi, but actually there is some magic that drops the nvvm() call inside the branches of the if (don't ask me how that works, I'm not sure myself !), so that's not it. This needs a lot more attention than I can spare right now.

IMO, the way the "runtime plumbing" part of thorin works is rather brittle and hard to understand, it would not hurt if someone would rewrite it to be saner. I might do a pass over it when I wire in shady's runtime to it, but in the meantime you're welcome to try to make sense of this and I can help you out on Discord if you need.