[AArch64] Instr. with groups `HasNEON`, don't have `HasNEONorSME` and similar assigned.

Question

[AArch64] Instr. with groups `HasNEON`, don't have `HasNEONorSME` and similar assigned.

grahamwoodward opened this issue a month ago · comments

Running an AWS Ubuntu aarch64 VM
Linux ip-10-252-39-126 6.5.0-1018-aws #18~22.04.1-Ubuntu SMP Fri Apr 5 17:56:39 UTC 2024 aarch64 aarch64 aarch64 GNU/Linux

Questions	Answers
OS/arch/bits	Ubuntu AArch64.
Architecture	armv8
Source of Capstone	`git clone`
Version/git commit	`24d99a9`

I'm using Capstone in a performance tool I'm trying to write (relating to retired instructions). It's only for aarch64 and I have a simple test benchmark that I'm compiling like so

aarch64-linux-gnu-gcc vectorise2.c -o vectorise2 -O2 -ftree-vectorize -march=armv8.6-a

and

aarch64-linux-gnu-gcc vectorise2.c -o vectorise2 -O2 -ftree-vectorize -march=armv8.5-a+sve

When compiling with armv8.6-a and running the test benchmark I'm then running my tool and asking it to show HasNEON and HasNEONorSME and it appears to give me the details indicating that the retired/decoded instruction is HasNEON but not HasNEONorSME. If it's in the group/feature HasNEON, then why not HasNEONorSME? Both are those are true?

Similarly, when compiler with 8.5-a+sve, then I see a zero count for HasSVE however I see a >0 count for HasSVEorSME...again that doesn't make sense...if the retired instruction is in the group/feature HasSVEorSME, then surely I should see a count for HasSME or a count for HasSVE?

Rot127 · Answer 1 · Fri Apr 26 2024 19:46:26 GMT+0800 (China Standard Time)

Likely another faulty definition in the LLVM tablegen files.
I am currently updating to LLVM 18 (#2298) and there are plenty of changes to AArch64 and SME/SVE. So it might be fixed already there.

Graham Woodward · Answer 2 · Fri Apr 26 2024 19:53:40 GMT+0800 (China Standard Time)

@Rot127 Is that a commit I can pull down to my clone?

Rot127 · Answer 3 · Fri Apr 26 2024 19:57:03 GMT+0800 (China Standard Time)

Not yet. Some disassembly tests still fail. But if you provide me with some test cases I would be thankful. This way I can check them before for you in the LLVM code and report back if it is indeed a LLVM bug (will take a while until it is fixed I assume) or something in CS.

Graham Woodward · Answer 4 · Fri Apr 26 2024 20:10:38 GMT+0800 (China Standard Time)

Sure....my test case is simply

#define LEN 1024
unsigned long long a[LEN], b[LEN], c[LEN];

int main()
{
    while(1) {
        int i = 0;
        for (; i < LEN; i++)
            c[i] = a[i] + b[i];
    }
    return 0;
}

and then compile like
aarch64-linux-gnu-gcc vectorise2.c -o vectorise2 -O2 -ftree-vectorize -march=armv8.6-a
and
aarch64-linux-gnu-gcc vectorise2.c -o vectorise2 -O2 -ftree-vectorize -march=armv8.5-a+sve

An objdump shows instructions such as
628: 4ee18400 add v0.2d, v0.2d, v1.2d
and
638: 04e10000 add z0.d, z0.d, z1.d
respectively for the two compilations.

My perf tool is configuring Capstone like so

  122   /* Initialise Capstone, which we use to disassemble the instruction */
  123   if (cs_open(CS_ARCH_AARCH64, CS_MODE_ARM, &handle) != CS_ERR_OK) {
  124       fprintf(stderr, "Failed to initialise Capstone! Aborting.\n");
  125       exit(1);
  126   }
  127   cs_option(handle, CS_OPT_DETAIL, CS_OPT_ON);
  128   cs_option(handle, CS_OPT_SKIPDATA, CS_OPT_ON);

My tool is using perf events to sample retired instructions and I use Capstone to disassemble the instructions and tell me about the groups/features.

In the first case for instruction
4ee18400 add v0.2d, v0.2d, v1.2d

it appears to have a group count of 1 and that group is HasNEON...my first question is why isn't it also in the HasNEONorSME group?

In the second case for instruction
04e10000 add z0.d, z0.d, z1.d

It appears to have a group count of 1 but not HasSME...it is instead HasSMEorSVE...so why this way and why only 1 group again?

Thanks, hope that helps

Graham Woodward · Answer 5 · Fri Apr 26 2024 21:27:41 GMT+0800 (China Standard Time)

Not yet. Some disassembly tests still fail. But if you provide me with some test cases I would be thankful. This way I can check them before for you in the LLVM code and report back if it is indeed a LLVM bug (will take a while until it is fixed I assume) or something in CS.

what version is being used at present? I have llvm-14 installed and also a build of llvm-18 and the -14 version struggles with this

echo 0x00, 0x0c, 0xe2, 0x25, | llvm-mc -triple=aarch64 -disassemble -mattr=+all
'+all' is not a recognized feature for this target (ignoring feature)
        .text
<stdin>:1:1: warning: invalid instruction encoding
0x00, 0x0c, 0xe2, 0x25,

(note even without -mattr=+all it fails with the encoding) but my local build of 18 is OK

echo 0x00, 0x0c, 0xe2, 0x25, | ./llvm-mc -triple=aarch64 -disassemble -mattr=+all
        .text
        whilelo p0.d, w0, w2

Rot127 · Answer 6 · Sat Apr 27 2024 15:15:44 GMT+0800 (China Standard Time)

The v5 branch is based on LLVM-7. Current next branch uses LLVM-16 for ARM, PPC, AArch64 (Alpha: LLVM-3) and the others still LLVM-7. I update ARM, PPC, AArch64 over the next weeks to LLVM-18.

it appears to have a group count of 1 and that group is HasNEON...my first question is why isn't it also in the HasNEONorSME group?

The add instructions are defined here if I am not mistaken. And as you can see, it has only the HasNEON group assigned. Not the HasNEONorSVE. Because we generate the Capstone code from these definitions in the td files we also inherit all the flaws which come with hand-written things.

The underlying problem is, that the current AArch64 definitions need to assign groups like HasNEONorSME manually (just as it has been done in the file above). It is not derived dynamically from the fact, that it already has HasNEON assigned. It is common to find these kind of issues in hand-written td files, unfortunately.

I'll fix it with #2298 in our LLVM fork by checking during generation.

Graham Woodward · Answer 7 · Sat Apr 27 2024 15:21:56 GMT+0800 (China Standard Time)

This could be true for other aarch64 instructions?

Rot127 · Answer 8 · Sat Apr 27 2024 15:29:57 GMT+0800 (China Standard Time)

Yes, there is a good chance that every instruction which should be of one of the following groups, isn't assigned to them:
HasSVEorSME, HasSVE2orSME, HasSVE2orSME2, HasSVE2p1_or_HasSME, HasSVE2p1_or_HasSME2, HasSVE2p1_or_HasSME2p1, HasNEONorSME

Rot127 · Answer 9 · Sat Apr 27 2024 15:30:46 GMT+0800 (China Standard Time)

But we can fix this in our LLVM fork pretty easily. If we detect during code generation that a instruction is part of HasSVE, we can just emit HasSVEorSME as well.

Graham Woodward · Answer 10 · Sat Apr 27 2024 15:51:51 GMT+0800 (China Standard Time)

But we can fix this in our LLVM fork pretty easily. If we detect during code generation that a instruction is part of HasSVE, we can just emit HasSVEorSME as well.

do you have an ETA for v6?

Rot127 · Answer 11 · Sat Apr 27 2024 16:06:23 GMT+0800 (China Standard Time)

#2298 though should be done in two or three weeks. If you need to use it before, check the issues listed in the PR and if they affect you.

An ETA for v6 not really. If you want to use Capstone for AArch64 definitely use the next branch.
The todo list for v6 release is still quite long, I doubt it can be released within 6 months.

Graham Woodward · Answer 12 · Sat Apr 27 2024 16:07:13 GMT+0800 (China Standard Time)

#2298 though should be done in two or three weeks. If you need to use it before, check the issues listed in the PR and if they affect you.

An ETA for v6 not really. If you want to use Capstone for AArch64 definitely use the next branch. The todo list for v6 release is still quite long, I doubt it can be released within 6 months.

"If you want to use Capstone for AArch64 definitely use the next branch." pretty sure I'm using the next branch

Rot127 · Answer 13 · Sat Apr 27 2024 16:09:45 GMT+0800 (China Standard Time)

Yes, I just wanted to point it out again, the difference between current next and the v5 release is huge.
Also, the work towards v6 is mostly sponsored by RizinOrg, so I won't "just stop".

Graham Woodward · Answer 14 · Sat Apr 27 2024 16:17:23 GMT+0800 (China Standard Time)

Yes, I just wanted to point it out again, the difference between current next and the v5 release is huge. Also, the work towards v6 is mostly sponsored by RizinOrg, so I won't "just stop".

right so even though I'm using next, I don't have the above commit? Can I pull/check that commit out or in my tool link to it as the git submodule?

Rot127 · Answer 15 · Sat Apr 27 2024 16:21:49 GMT+0800 (China Standard Time)

#2298 is the PR which updates the AArch64 module to the state of LLVM-18. When it is done it will be merged into next.

But it isn't done yet. Some instructions don't get disassembled correctly and need to be fixed. So it won't be useful to you in the current state. I can let you know, when it is in a usable state. Than you can use the branch of the PR.

Graham Woodward · Answer 16 · Sat Apr 27 2024 20:20:09 GMT+0800 (China Standard Time)

@Rot127 excellent thanks very much

Graham Woodward · Answer 17 · Sun Apr 28 2024 01:55:07 GMT+0800 (China Standard Time)

The v5 branch is based on LLVM-7. Current next branch uses LLVM-16 for ARM, PPC, AArch64 (Alpha: LLVM-3) and the others still LLVM-7. I update ARM, PPC, AArch64 over the next weeks to LLVM-18.

it appears to have a group count of 1 and that group is HasNEON...my first question is why isn't it also in the HasNEONorSME group?

The add instructions are defined here if I am not mistaken. And as you can see, it has only the HasNEON group assigned. Not the HasNEONorSVE. Because we generate the Capstone code from these definitions in the td files we also inherit all the flaws which come with hand-written things.

How did you work out that that's the add?

The underlying problem is, that the current AArch64 definitions need to assign groups like HasNEONorSME manually (just as it has been done in the file above). It is not derived dynamically from the fact, that it already has HasNEON assigned. It is common to find these kind of issues in hand-written td files, unfortunately.

I'll fix it with #2298 in our LLVM fork by checking during generation.

Can you explain more about the generation and how it works? Is that a capstone thing?

Rot127 · Answer 18 · Sun Apr 28 2024 14:12:11 GMT+0800 (China Standard Time)

How did you work out that that's the add?

In AArch64InstrFormats.td the basic instructions formats are defined. In this case all vector instructions with a common encoding which have three vector registers (BaseSIMDThreeSameVector).

With a debugger I checked in Capstone what the LLVM internal ID is which it gave (one of the) the instructions from above: add v0.2d, v0.2d, v1.2d = ADDv2i64.

Guessing that it inherits it, I checked where this class is inherited and contains a portion of the LLVM name:

> grep -rnI "BaseSIMDThreeSameVector" llvm/lib/Target/AArch64/ | grep v2i64
llvm/lib/Target/AArch64/AArch64InstrFormats.td:5878:  def v2i64 : BaseSIMDThreeSameVector<1, U, 0b111, opc, V128,

Checking AArch64InstrFormats.td:5878 it shows that it defines a multiclass (SIMDThreeSameVector) there, which defines multiple instructions at once (one of them the v2i64 variant). Grepping for this aggin shows us that indeed an add instruction of this type wass defined.

> grep -rnI " SIMDThreeSameVector" llvm/lib/Target/AArch64/ | grep ADD
llvm/lib/Target/AArch64/AArch64InstrInfo.td:1452:defm FCADD : SIMDThreeSameVectorComplexHSD<1, 0b111, complexrotateopodd,
llvm/lib/Target/AArch64/AArch64InstrInfo.td:5170:defm ADD     : SIMDThreeSameVector<0, 0b10000, "add", add>;
...

In short, you have to manually the inheritance paths and have a feeling how these definitions are commonly written. SO you can make good guesses where to start looking.

Can you explain more about the generation and how it works? Is that a capstone thing?

This is documented here and in the README.md of our LLVM fork.

Graham Woodward · Answer 19 · Mon Apr 29 2024 15:21:11 GMT+0800 (China Standard Time)

@Rot127 this might be a daft question but why isn't LLVM included as a git submodule that's then cloned/checked out when you initially clone Capstone? Wouldn't that mean Capstone always uses the latest LLVM?

Rot127 · Answer 20 · Mon Apr 29 2024 15:50:36 GMT+0800 (China Standard Time)

The TableGen backends are heavily modified to emit C code. LLVM backends only emit C++ code. So having our fork and rebasing in on top of LLVM is easier and gives the possibility of gettings PRs etc.

Graham Woodward · Answer 21 · Mon Apr 29 2024 16:17:25 GMT+0800 (China Standard Time)

The TableGen backends are heavily modified to emit C code. LLVM backends only emit C++ code. So having our fork and rebasing in on top of LLVM is easier and gives the possibility of gettings PRs etc.

would it not be better if LLVM themselves maintained the TG to omit C code? presumably they don't want to?

Rot127 · Answer 22 · Mon Apr 29 2024 16:49:21 GMT+0800 (China Standard Time)

We proposed this idea, but there was no real feedback or interest. At least none which translated to actions.
Also because it would require to redesign and rewrite a lot of the backends (see the discussion here: https://reviews.llvm.org/D138323). And because for LLVM internal stuff they work just fine, the priority is understandably very low.
I will push the topic again when I find time to build an argument. Because the problem is more general than just "emit C code".