Support symbol name aliases
UsernameFodder opened this issue · comments
Is your feature request related to a problem? Please describe.
Various projects rely on pmdsky-debug symbols, including https://github.com/SkyTemple/pmdsky-debug-py and https://github.com/SkyTemple/c-of-time. When symbols are renamed, it can cause compatibility issues downstream.
Describe the solution you'd like
Add support for an optional aliases
field in the YAML files, like this:
- name: SomeFunction
aliases:
- SomeFunctionAlias1
- SomeFunctionAlias2
This will make things symbols searchable by more names (including old ones), while making it possible for tooling to decide whether or not to use the extra alias names.
resymgen gen
should include aliases in the.ghidra
(as extra symbols) and.json
(in a separate"aliases"
property) output formats, since Ghidra can support alternate symbol names. The.sym
output format should only use the primary name, since No$GBA only supports one symbol name per address.resymgen fmt
should order alias names alphabetically.resymgen check
:- Alias names should be subject to the same naming convention checks (
--function-names
and--data-names
) as primary symbol names. - Alias names should be included in the
--unique-symbols
check so they don't clash with any other symbol or alias names.
- Alias names should be subject to the same naming convention checks (
resymgen merge
should properly deduplicate aliases when merging alias lists.- The
resymgen
docs and unit tests should be updated accordingly.
Aliases should not be included in the header files, since this would lead to a lot of duplication. Instead, they should be auto-generated for projects that need them (like c-of-time). The headers-with-docstrings
artifact should include these generated aliases.
The following tooling should also support aliases:
- If possible, the GitHub Actions PR checks should compare the symbol tables in the base commit with the new changes, and fail if a symbol has been renamed without adding the old name as an alias. This is probably doable by seeing if there are any symbol names in the base commit that no longer exist in the new version (if an old symbol is being moved, there's nothing that can be done with aliases).
symdiff.py
should detect updates to alias lists.import_symbols_json.py
andimport_symbols_ntr_ghidra.py
should create additional labels for aliases.add_header_docstrings.py
should support adding aliases alongside header docstrings.
Describe alternatives you've considered
- Never rename symbols. Doesn't seem practical.
- Start using some sort of review procedure around renames, and scrutinize renames more carefully. We should probably do this anyway, but it would probably be more effective when coupled with aliases. Just scrutinizing things would still require downstream repos to make changes to ensure compatibility, which is still cumbersome.
- Duplicate symbols instead of renaming them. This is the same idea as aliases, but worse, leading to more duplication and potential issues with code that assumes function addresses are unique.
Additional context
This has been a long-running shortcoming of pmdsky-debug. It's probably about time to try to do something about it.
- @theCapypara and @End45 raised concerns around #236, see discussion on that PR.
- tech-ticks raised concerns around #230 and the effects of renaming compiler builtins on c-of-time.
RFC:
@theCapypara: Would this allow us to solve stability issues with SkyTemple? If we update pmdsky-debug-py to include alias names in its mapping, I think as long as pmdsky-debug always keeps old names as aliases, SkyTemple should never break?
@tech-ticks: I think this will require linker script updates in c-of-time to map the aliases properly? Is it possible to map multiple names to the same address? Assuming that works, I think if we generate the aliases in the headers, things should just work?
@End45: Any general thoughts? Does this seems reasonable? Regarding the issues you raised around No$GBA in #236 (review), if we kept the old names as the primary ones, but added the new ones as aliases, I think that should make things better?
@AnonymousRandomPerson: I think it would make sense to update the sync-to-pmdsky-debug script to add aliases instead of renaming symbols in the event of a conflict? Is there anything else we ought to change?
As long as the original symbol name is adequate at describing the symbol, is it worth the added complexity of these aliases to add a subjectively better name? I feel it would cause less confusion to have a single consistent name for symbols if we can.
The strict requirement to alias the previous name creates instances where an undesirable old name will be forced to stick around permanently, such as with temporary symbol names that need clarification (e.g., ARM9_UNKNOWN_TABLE__NA_2097FF8
) or with symbols that we discover are incorrectly labeled. Perhaps a missing alias can be surfaced as a warning without necessarily blocking the PR.
If the goal is to improve searchability, a lower complexity solution could be adding additional keywords to the description field. This could mean aliases or just other words that people commonly try and search for.
The strict requirement to alias the previous name creates instances where an undesirable old name will be forced to stick around permanently, such as with temporary symbol names that need clarification (e.g., ARM9_UNKNOWN_TABLE__NA_2097FF8) or with symbols that we discover are incorrectly labeled. Perhaps a missing alias can be surfaced as a warning without necessarily blocking the PR.
That's true, this should probably be a non-mandatory check.
As long as the original symbol name is adequate at describing the symbol, is it worth the added complexity of these aliases to add a subjectively better name? I feel it would cause less confusion to have a single consistent name for symbols if we can.
I think this would also address the issues around apps that depend on specific names. Having aliases means we wouldn't be forced to choose between keeping an old name around forever, or updating an old name and breaking the downstream app, provided it knows how to use aliases. The searchability part could be solved with the description field, but that doesn't solve the issues with SkyTemple and c-of-time.
This seems like a good compromise to me. It allows linking the symbols in this repo to the ones in the decomp, even if they use a different naming scheme there. Meanwhile we can keep using the simplified names, which avoids the problems that have already been explained.
@UsernameFodder Thanks, this sounds like a good approach!
I would propose to solve the ambiguity issue that the YAML contains comments explaining when aliases are expected to be removed, and remove any deprecated alias after a year or so. That way apps have ways and time to adapt but they are eventually forced to use the one canonical name.
From my standpoint more ideally the date could even be part of the YAML data itself instead of just a comment. This way I could auto generate warnings for SkyTemple that would be raised when people try to apply a patch using deprecated symbols.
re:
@AnonymousRandomPerson: As long as the original symbol name is adequate at describing the symbol, is it worth the added complexity of these aliases to add a subjectively better name? I feel it would cause less confusion to have a single consistent name for symbols if we can.
True, but as said existing code already uses the old symbols and it's not just SkyTemple itself, also third party patches and ROM Hack projects and setups that researchers already use. I think my solution above would solve this.
@AnonymousRandomPerson: The strict requirement to alias the previous name creates instances where an undesirable old name will be forced to stick around permanently, such as with temporary symbol names that need clarification (e.g., ARM9_UNKNOWN_TABLE__NA_2097FF8) or with symbols that we discover are incorrectly labeled. Perhaps a missing alias can be surfaced as a warning without necessarily blocking the PR.
I also think that making it a case-by-case judgement if the old alias is the same as the previous one is a good idea and maybe it's enough. But we should really only do this for temporary unknown aliases that are likely not used by patches or SkyTemple etc.
I would propose to solve the ambiguity issue that the YAML contains comments explaining when aliases are expected to be removed, and remove any deprecated alias after a year or so. That way apps have ways and time to adapt but they are eventually forced to use the one canonical name.
From my standpoint more ideally the date could even be part of the YAML data itself instead of just a comment. This way I could auto generate warnings for SkyTemple that would be raised when people try to apply a patch using deprecated symbols.
I unfortunately haven't figured out a good way to support YAML comments in resymgen
. We could just add a note to the description?
I actually wouldn't mind keeping aliases around forever if people use them. I don't expect them to add much overhead, so deprecation doesn't seem all that necessary? It seems more beneficial to keep old names around for posterity. If you want to add warnings, you could probably just warn whenever someone is using any alias instead of the primary name? Then if we really want to remove an alias, we can just wait a while and then drop it.
I think this will require linker script updates in c-of-time to map the aliases properly? Is it possible to map multiple names to the same address? Assuming that works, I think if we generate the aliases in the headers, things should just work?
Yes, generate_linkerscript.py
needs to be updated, but everything else should work without changes.