aarch64: incorrect register in regs_access() for bl instruction
find0x90 opened this issue · comments
The regs_access() function returns 'sp' as a read register for the bl
instruction.
Below is a small script that reproduces the issue between version 4.0.2 and the most recent commit as of this comment.
#! /usr/bin/env python3
# cs_test.py
from capstone import *
try:
md = Cs(CS_ARCH_ARM64, CS_MODE_ARM | CS_MODE_LITTLE_ENDIAN)
except:
md = Cs(CS_ARCH_AARCH64, CS_MODE_ARM | CS_MODE_LITTLE_ENDIAN)
md.detail = True
instruction_bytes = b"\xec\x6a\x01\x95"
inst = list(md.disasm(instruction_bytes, offset=0x0, count=1))[0]
print(inst)
regs_read, regs_written = inst.regs_access()
regs_read = [inst.reg_name(r) for r in regs_read]
regs_written = [inst.reg_name(r) for r in regs_written]
print(regs_read, regs_written)
4.0.2:
$ ./test.py
<CsInsn 0x0 [ec6a0195]: bl #0x405abb0>
[] ['x30']
next branch b9c260e:
$ ./test.py
<CsInsn 0x0 [ec6a0195]: bl 0x405abb0>
['sp'] ['x30']
It is incorrectly defined in LLVM:
let isCall = 1, Defs = [LR], Uses = [SP] in {
def BL : CallImm<1, "bl", [(AArch64call tglobaladdr:$addr)]>;
} // isCall
@Rot127 can you help me understand the change you made? I'd like to be able to contribute fixes in the future if I find more issues.
I understand removing Uses = [SP]
but why change isCall
to isBranch
? Aren't bl
and blr
the procedure call instructions for AArch64?
Actually, I just looked at blr
in the llvm/lib/Target/AArch64/AArch64InstrInfo.td
file. It is listed as isCall
and also has Uses = [SP]
. Is that wrong as well?
but why change isCall to isBranch?
You are right. I did the changes in a rush and was sloppy. BL
and BLR
are considered calls. Thanks for pointing it out!
and also has Uses = [SP]. Is that wrong as well?
Uses = [SP]
is wrong. I can't see any mentioning of SP
usage in the ISA.
can you help me understand the change you made? I'd like to be able to contribute fixes in the future if I find more issues.
The changes in our LLVM repo are the definitions of the architecture. From those definitions we generate our disassembler logic.
If we discover a flaw in the definition, we need to change the it in the td
files first and generate our decoding tables again.
For details see the documentation.
Please let me know which parts of the docs are not clear or badly written (if any). Didn't get feedback to them yet and I had certainly blind spots while writing it.
The TLDR is:
- Compile llvm-tblgen (see: https://github.com/Rot127/capstone/blob/as-docs/docs/AutoSync.md#update-an-architecture)
- Run
./Updater/ASUpdater.py -a AArch64 -s IncGen --inc-list Mapping
to generate new tables. - Copy tables (or in this case I only copied the few lines. Because the generated tables require #2231).
Though, if you are can't spend the time to get into the quirks with updating, better wait until v6
is released. The update system is new and still have unpolished corners which can be confusing.
Cool, if I spot any other errors I'll report them and also give this process a shot to see if I can contribute. Thanks for all the hard work on this! The recent updates to Capstone are very much appreciated.
Just tested and it's fixed for me, thanks!