Emulation backend
sampsyo opened this issue · comments
A new backend could generate "normal" C code that can run on any ordinary machine, without requiring the bsg_manycore
headers, to emulate the target device. This would be nice for testing functional correctness on a development machine before running it "for real" there.
Especially in an era where bugs in the hardware and runtime are likely to be commonplace, this tool could help narrow down where incorrectness comes from.
I think the plan is to migrate the current gcc backend to do this (from your proposal here #17). Currently the gcc backend doesn't really handle multiple group instantiations in the code section and just tries to place things in the main function of the generated C program. To support multiple groups and also act more like the underlying hardware, we can use pthreads and generate a multithreaded C program to emulate the code.
I think the naive way is to just always generate a thread for each tile and allocate code to each thread based off the code section (this is probably also closest to hardware as well in terms of how things are running). But we can probably also get away with generating a thread for each group instantiation in the code section instead but this maybe requires more thought. I'll think about it some more, but this is how I'm currently thinking of approaching this.
Yeah! I think pthreads would be a good idea—and one thread per tile is probably the right way to go. (If there were only one thread per group, we could get into trouble with deadlock when tiles within the same group need to communicate to make progress.)
It might make sense to provide a separate run-time library that you need to link against to make emulated code work. (That would be in contrast to just generating all the code for a complete, self-contained program inline.) In the first version, this could contain a little machinery for spinning up the pool of pthreads and dispatching work to them. In subsequent versions, this could contain replacements for the various HammerBlade address calculation functions to emulate local SRAM and global DRAM within the simulated tiles.