aarch64: incorrect register in regs_access() for bl instruction

Question

aarch64: incorrect register in regs_access() for bl instruction

find0x90 opened this issue 5 months ago · comments

The regs_access() function returns 'sp' as a read register for the bl instruction.

Below is a small script that reproduces the issue between version 4.0.2 and the most recent commit as of this comment.

#! /usr/bin/env python3
# cs_test.py

from capstone import *

try:
    md = Cs(CS_ARCH_ARM64, CS_MODE_ARM | CS_MODE_LITTLE_ENDIAN)
except:
    md = Cs(CS_ARCH_AARCH64, CS_MODE_ARM | CS_MODE_LITTLE_ENDIAN)
md.detail = True

instruction_bytes = b"\xec\x6a\x01\x95"

inst = list(md.disasm(instruction_bytes, offset=0x0, count=1))[0]

print(inst)

regs_read, regs_written = inst.regs_access()
regs_read = [inst.reg_name(r) for r in regs_read]
regs_written = [inst.reg_name(r) for r in regs_written]

print(regs_read, regs_written)

4.0.2:

$ ./test.py
<CsInsn 0x0 [ec6a0195]: bl #0x405abb0>
[] ['x30']

next branch b9c260e:

$ ./test.py
<CsInsn 0x0 [ec6a0195]: bl 0x405abb0>
['sp'] ['x30']

Rot127 · Answer 1 · Fri Jan 05 2024 19:55:10 GMT+0800 (China Standard Time)

It is incorrectly defined in LLVM:

let isCall = 1, Defs = [LR], Uses = [SP] in {
    def BL : CallImm<1, "bl", [(AArch64call tglobaladdr:$addr)]>;
} // isCall

find0x90 · Answer 2 · Sat Jan 06 2024 01:55:01 GMT+0800 (China Standard Time)

@Rot127 can you help me understand the change you made? I'd like to be able to contribute fixes in the future if I find more issues.

I understand removing Uses = [SP] but why change isCall to isBranch? Aren't bl and blr the procedure call instructions for AArch64?

find0x90 · Answer 3 · Sat Jan 06 2024 01:56:37 GMT+0800 (China Standard Time)

Actually, I just looked at blr in the llvm/lib/Target/AArch64/AArch64InstrInfo.td file. It is listed as isCall and also has Uses = [SP]. Is that wrong as well?

Rot127 · Answer 4 · Sat Jan 06 2024 19:26:10 GMT+0800 (China Standard Time)

but why change isCall to isBranch?

You are right. I did the changes in a rush and was sloppy. BL and BLR are considered calls. Thanks for pointing it out!

and also has Uses = [SP]. Is that wrong as well?

Uses = [SP] is wrong. I can't see any mentioning of SP usage in the ISA.

can you help me understand the change you made? I'd like to be able to contribute fixes in the future if I find more issues.

The changes in our LLVM repo are the definitions of the architecture. From those definitions we generate our disassembler logic.

If we discover a flaw in the definition, we need to change the it in the td files first and generate our decoding tables again.
For details see the documentation.
Please let me know which parts of the docs are not clear or badly written (if any). Didn't get feedback to them yet and I had certainly blind spots while writing it.

Rot127 · Answer 5 · Sat Jan 06 2024 19:54:20 GMT+0800 (China Standard Time)

The TLDR is:

Compile llvm-tblgen (see: https://github.com/Rot127/capstone/blob/as-docs/docs/AutoSync.md#update-an-architecture)
Run ./Updater/ASUpdater.py -a AArch64 -s IncGen --inc-list Mapping to generate new tables.
Copy tables (or in this case I only copied the few lines. Because the generated tables require #2231).

Though, if you are can't spend the time to get into the quirks with updating, better wait until v6 is released. The update system is new and still have unpolished corners which can be confusing.

find0x90 · Answer 6 · Mon Jan 08 2024 02:07:54 GMT+0800 (China Standard Time)

Cool, if I spot any other errors I'll report them and also give this process a shot to see if I can contribute. Thanks for all the hard work on this! The recent updates to Capstone are very much appreciated.

find0x90 · Answer 7 · Sun Jan 14 2024 13:54:19 GMT+0800 (China Standard Time)

Just tested and it's fixed for me, thanks!