[Question] Why does this work ?

Question

[Question] Why does this work ?

rcmz opened this issue 2 months ago · comments

Not an issue, but I don't know where else to ask this.

By searching "static recompilation" online, you find a lot of people saying it does not work / it is a bad idea:
https://andrewkelley.me/post/jamulator.html#conclusion
https://stackoverflow.com/questions/11215689/why-does-emulation-have-to-be-done-in-real-time
https://cs.stackexchange.com/questions/155511/why-is-static-recompilation-not-possible

So I'm curious if you have insides into why this project worked so well ?

Sebastian Sarbora · Answer 1 · Wed May 15 2024 04:17:12 GMT+0800 (China Standard Time)

He built different.

Wiseguy · Answer 2 · Sat Jun 22 2024 11:40:35 GMT+0800 (China Standard Time)

Sorry for the delayed response, this was a bit of a difficult question to answer and it ended up slipping my mind. I think there are a few reasons why this project ended up being successful, so I'll try to cover a few that I think are the most relevant.

Games on systems in the N64 generation and later were developed very differently than NES games. N64 games were developed in C (or rarely C++), meaning the vast majority of code in a given game will have been generated by a compiler besides a small amount of handwritten assembly (which mostly gets replaced since it's part of the system library as I'll cover later).

Compiler-generated code is fairly predictable and structured in terms of behavior, since it has to conform to an ABI. The handwritten assembly in a given game will generally conform closely enough to the ABI to allow the compiler-generated code to use it, so that tends to work out as well.
Self-modifying code is also something that comes up a lot when talking about static recompilation. Another benefit of these games having been written in C originally is that there are only a few types of self-modifying code that you'll encounter. The most common form is overlays, where multiple sections of code could potentially be loaded into the same region of memory. By hooking the parts of the game where code is loaded to maintain a table of function addresses, it becomes very easy to handle this by just using a function pointer lookup for indirect jumps (for example jalr in the context of MIPS).

Another common form of self-modifying code on N64 is relocatable overlays. These are like the overlays in the previous point, but set up such that the sections can be loaded into any point in memory rather than a fixed location. The function table address technique I mentioned will work fine here too. From there, the recompiler also has to emit a little bit of extra code in relocatable overlays to handle data accesses in order to account for them being loadable at any address. Zelda64Recomp deals with relocatable overlays, which should act as good evidence that this technique works well.

Other types of self-modifying code are pretty uncommon on N64, and they get even less common as you move forward in time from there. I've only encountered two cases where the techniques I mentioned wouldn't handle a piece of self-modifying code, and both were easily worked around by using the code replacement system I'll mention later.
Games of this era and beyond were also built with a system library which is used by game code to control the hardware, rather than games controlling the hardware directly. This allows a project using this tool to simply replace the functions provided by the system library with versions that were built from the ground up for modern systems.

Another side effect of games being built with a system library is that game code generally tends to not interact directly with memory-mapped registers since the system library handled that for developers. This allows you to avoid implementing costly address space lookup at runtime, and you can instead just convert load/store operations into normal memory accesses via simple pointer arithmetic. This makes performance significantly better than even the best dynamic recompilation implementations, since those still have to deal with the original hardware's memory mappings.

The functionality to replace code from the game with a new implementation is useful beyond just the system library as well. By writing new versions of specific functions in the game and then recompiling those, you're able to very easily make changes to the game that would be tough to do if you were editing the ROM directly. Most of the enhancements in Zelda64Recomp are provided through this approach, such as the gyro aim and high framerate fixups for cases that RT64's automatic detection messed up. This is something that I think is completely novel in this project, as I haven't seen any other static recompilation projects that use the recompilation process to replace the original binary's functions (but I could be wrong here).

You can also use code replacement to work around code that doesn't translate well to C. There are no cases like that Zelda64Recomp, but I could imagine a case where you replace some form of self-modifying code that wasn't covered in the previous section with a new implementation of that code that doesn't need to modify itself.
Rather than trying to determine the structure of a ROM automatically, which is especially difficult for N64 ROMs as they have very little defined structure, this tool expects the user to provide that info. The layout of code in the ROM is provided as an input to this tool alongside the ROM, either via an elf file or a symbol file like this one from Zelda64Recomp.

It's pretty easy to get that information by hand if you have experience with reverse engineering N64 ROMs (I did it in less than 2 days for another ROM I was testing), especially if you take advantage of the tools that have been made for doing this like splat. Not trying to automate that process reduces the complexity of a static recompiler pretty significantly and also removes a lot of potential errors during the process, at the expense of adding some upfront work when starting a project.
This one is more just an opinion of mine based on other projects I've seen in the past, but I think the approach of generating a very literal C translation of the CPU instructions (an idea that I got from ido static recomp) ends up working better than other approaches.

A lot of binary translation efforts use LLVM for the purpose of translating CPU instructions to other architectures, which has the benefit of being more direct than this approach. However, it adds a lot of complexity, and I think it's less flexible and harder to work with than the approach this project does.

That C translation approach allows you to very easily mix and match generated code with handwritten code. Many instructions will just emit calls to macros that I handwrote, which simplifies recompiling those compare to having to generate the entire logic for a given instruction. It'd definitely be doable to represent that macro as LLVM IR and reuse it each time you encounter the instruction, but it's much simpler to just make a tool write "ADD32" to a file than to build up the IR for a 32-bit addition (with the proper casting and sign extension logic), an an example.

You can also very easily insert new code directly into the generated code during recompilation with this technique by having the recompiler add extra text into the output file. This is nice for very small patches where the function replacement system I mentioned earlier is overkill. Additionally, having C code (even if it's less readable than normal C code a person would write) makes debugging the output of this tool much easier (in my opinion) than having to debug the output of a direct binary translation process.

This ended up being a much longer response than I was expecting, but hopefully you feel it answers your question. I guess if I were to TLDR it, it'd be something like:

Targeting 5th gen (or later) game consoles and requiring the user to provide some info about the layout of code in the ROM makes a lot of the problems normally associated with static recompilation go away. From there, being able to easily replace the system library with modern versions allows games to run without needing to replicate hardware behavior, since you're targeting an API instead. That same code replacement system also allows for manual fixups when needed and also allows adding changes and enhancements to games.

rcmz · Answer 3 · Sat Jun 22 2024 17:58:14 GMT+0800 (China Standard Time)

Thanks for such a detailed response !
From what I get of it, what also made this project successful was viewing static recompilation not as a fully automatic tool, but as a way to automate most of a recompilation effort. That way, like a recompilation, you have the convenience of working directly with source code (making it easier to patch bugs / add new features), but static recompilation did most of the tedious work of rewriting every function.
Thanks again for this answer and for this project :)