capstone-engine / capstone

Capstone disassembly/disassembler framework for ARM, ARM64 (ARMv8), Alpha, BPF, Ethereum VM, HPPA, M68K, M680X, Mips, MOS65XX, PPC, RISC-V(rv32G/rv64G), SH, Sparc, SystemZ, TMS320C64X, TriCore, Webassembly, XCore and X86.

Home Page:http://www.capstone-engine.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

aarch64: incorrect register in regs_access() for bl instruction

find0x90 opened this issue · comments

The regs_access() function returns 'sp' as a read register for the bl instruction.

Below is a small script that reproduces the issue between version 4.0.2 and the most recent commit as of this comment.

#! /usr/bin/env python3
# cs_test.py

from capstone import *

try:
    md = Cs(CS_ARCH_ARM64, CS_MODE_ARM | CS_MODE_LITTLE_ENDIAN)
except:
    md = Cs(CS_ARCH_AARCH64, CS_MODE_ARM | CS_MODE_LITTLE_ENDIAN)
md.detail = True

instruction_bytes = b"\xec\x6a\x01\x95"

inst = list(md.disasm(instruction_bytes, offset=0x0, count=1))[0]

print(inst)

regs_read, regs_written = inst.regs_access()
regs_read = [inst.reg_name(r) for r in regs_read]
regs_written = [inst.reg_name(r) for r in regs_written]

print(regs_read, regs_written)

4.0.2:

$ ./test.py
<CsInsn 0x0 [ec6a0195]: bl #0x405abb0>
[] ['x30']

next branch b9c260e:

$ ./test.py
<CsInsn 0x0 [ec6a0195]: bl 0x405abb0>
['sp'] ['x30']

It is incorrectly defined in LLVM:

let isCall = 1, Defs = [LR], Uses = [SP] in {
    def BL : CallImm<1, "bl", [(AArch64call tglobaladdr:$addr)]>;
} // isCall

@Rot127 can you help me understand the change you made? I'd like to be able to contribute fixes in the future if I find more issues.

I understand removing Uses = [SP] but why change isCall to isBranch? Aren't bl and blr the procedure call instructions for AArch64?

Actually, I just looked at blr in the llvm/lib/Target/AArch64/AArch64InstrInfo.td file. It is listed as isCall and also has Uses = [SP]. Is that wrong as well?

but why change isCall to isBranch?

You are right. I did the changes in a rush and was sloppy. BL and BLR are considered calls. Thanks for pointing it out!

and also has Uses = [SP]. Is that wrong as well?

Uses = [SP] is wrong. I can't see any mentioning of SP usage in the ISA.

can you help me understand the change you made? I'd like to be able to contribute fixes in the future if I find more issues.

The changes in our LLVM repo are the definitions of the architecture. From those definitions we generate our disassembler logic.

If we discover a flaw in the definition, we need to change the it in the td files first and generate our decoding tables again.
For details see the documentation.
Please let me know which parts of the docs are not clear or badly written (if any). Didn't get feedback to them yet and I had certainly blind spots while writing it.

The TLDR is:

Though, if you are can't spend the time to get into the quirks with updating, better wait until v6 is released. The update system is new and still have unpolished corners which can be confusing.

Cool, if I spot any other errors I'll report them and also give this process a shot to see if I can contribute. Thanks for all the hard work on this! The recent updates to Capstone are very much appreciated.

Just tested and it's fixed for me, thanks!