ARM-software / LLVM-embedded-toolchain-for-Arm

A project dedicated to building LLVM toolchain for 32-bit Arm embedded targets.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Support for MVE and float-abi=hard for Cortex M85

renesas-kyle-finch opened this issue · comments

Using -mcpu=cortex-m85 -mfloat-abi=hard when compiling I am getting an error that instructions vsub.f32 and vfma.f32 require mve.fp. By passing -v flag to clang, I can see the triple selected is -triple thumbv8.1m.main-none-unknown-eabihf.

I haven't been able to figure out the right combination of arguments to enable FPU and MVE.

If I run clang --target=arm-none-eabi -mcpu=cortex-m85 -mfloat-abi=hard -print-multi-lib | grep mve, I get the following results

arm-none-eabi/armv8.1m.main_soft_nofp_nomve;@-target=thumbv8.1m.main-none-unknown-eabi@mfpu=none
arm-none-eabi/armv8.1m.main_hard_nofp_mve;@-target=thumbv8.1m.main-none-unknown-eabihf@march=thumbv8.1m.main+dsp+mve@mfpu=none

But If I run arm-none-eabi-gcc -mcpu=cortex-m85 -mfloat-abi=hard -print-multi-lib | grep mve, I get the following results which seem to indicate MVE and float-abi=hard should be compatible.

thumb/v8.1-m.main+mve/hard;@mthumb@march=armv8.1-m.main+mve@mfloat-abi=hard
thumb/v8.1-m.main+pacbti+mve/bp/hard;@mthumb@march=armv8.1-m.main+pacbti+mve@mbranch-protection=standard@mfloat-abi=hard

Is what I am doing not supported or am I not doing it correctly?

Would it be possible to get a small reproducer? With a simple test file

        .text
        vfma.f32 q0, q3, q7
        vcvt.f16.s16 q1, q7, #1
        vsub.f32        q0, q2, q1

I can assemble this with clang (16) --target=arm-none-eabi -mcpu=cortex-m85 -c

Looking at the definition of the cortex-m85 it should enable MVE and floating point by default. If you are using -march then you'll need -march=armv8.1-m+mve.fp . This can be used with -mcpu as well -mcpu=cortex-m85+mve.fp although it shouldn't be necessary.

The Arm Compiler 6 (commercial compiler based on LLVM but is true for clang in general) at https://developer.arm.com/documentation/101754/0620/armclang-Reference/armclang-Command-line-Options/-mcpu?lang=en may help.

With the contents you provided in test.s, I got the following output with clang (17) from LLVM 17:

clang --target=arm-none-eabi -mcpu=cortex-m85 -c -o test.o test.s 
clang: warning: no multilib found matching flags: --target=thumbv8.1m.main-none-unknown-eabi -march=thumbv8.1m.main+dsp+mve+mve.fp+fp16+ras+lob+pacbti+nocrc+nocrypto+nosha2+noaes+nodotprod+nofp16fml+nobf16+nosb+noi8mm+nocdecp0+nocdecp1+nocdecp2+nocdecp3+nocdecp4+nocdecp5+nocdecp6+nocdecp7 -mfloat-abi=softfp -mfpu=fp-armv8-fullfp16-d16 [-Wmissing-multilib]
clang: note: available multilibs are:
--target=aarch64-none-unknown-elf
--target=armv4t-none-unknown-eabi -mfpu=none
--target=armv5e-none-unknown-eabi -mfpu=none
--target=thumbv6m-none-unknown-eabi -mfpu=none
--target=armv7-none-unknown-eabi -mfpu=none
--target=armv7-none-unknown-eabihf -mfpu=vfpv3-d16
--target=armv7r-none-unknown-eabi -mfpu=none
--target=armv7r-none-unknown-eabihf -mfpu=vfpv3-d16
--target=thumbv7m-none-unknown-eabi -mfpu=none
--target=thumbv7em-none-unknown-eabi -mfpu=none
--target=thumbv7em-none-unknown-eabihf -mfpu=fpv4-sp-d16
--target=thumbv7em-none-unknown-eabihf -mfpu=fpv5-d16
--target=thumbv8m.main-none-unknown-eabi -mfpu=none
--target=thumbv8m.main-none-unknown-eabihf -mfpu=fpv5-d16
--target=thumbv8.1m.main-none-unknown-eabi -mfpu=none
--target=thumbv8.1m.main-none-unknown-eabihf -march=thumbv8.1m.main+fp16 -mfpu=fp-armv8-fullfp16-sp-d16
--target=thumbv8.1m.main-none-unknown-eabihf -march=thumbv8.1m.main+dsp+mve -mfpu=none

If I add -mfloat-abi=hard to the above, then it seems to assemble okay with the test.

With my own code, if I specify --target=arm-none-eabi -mcpu=cortex-m85 I get an error elsewhere indicating math.h could not be found. But when I add -mfloat-abi=hard, I get the originally reported error:

source.s:386:2: error: invalid instruction, any one of the following would fix this:
        vsub.f32        q0, q4, q1
        ^
source.s:386:6: note: invalid operand for instruction
        vsub.f32        q0, q4, q1
            ^
source.s:386:2: note: instruction requires: mve.fp
        vsub.f32        q0, q4, q1
        ^
source.s:388:2: error: instruction requires: mve.fp
        vfma.f32        q1, q0, q2

There is at least one problem here, which is that we don't have a armv8.1m.main_hard_fp_mve library variant, only armv8.1m.main_hard_nofp_mve. This will mean that multilib selection will fail and you'll get strange errors like the math.h not being found. I'll mention that to the team next week.

One workaround for that is to manually set the include and library directories to the armv8.1m.main_hard_fp directory. This will be compatible with hardfp and mve+fp. As far as I know the math library when compiled in mve+fp configuration won't use MVE instructions anyway.

For example:

-isystem /path/to/LLVMEmbeddedToolchainForArm-16.0.0-Linux-x86_64/bin/../lib/clang-runtimes/arm-none-eabi/armv8.1m.main_hard_fp/include -L/path/to/LLVMEmbeddedToolchainForArm-16.0.0-Linux-x86_64/bin/../lib/clang-runtimes/arm-none-eabi/armv8.1m.main_hard_fp/lib

There is probably a multilib.yaml flag mapping could map +mve+fp to armv8.1m.main_hard_fp, but I can't think of that off the top of my head.

I'm at a loss to explain why source.s isn't working. That is assuming the build system/make is using the same --target and -mcpu as the test.s file.

I will try manually setting the include and library directories manually. But adding a flag mapping to multilib.yaml would be ideal. Our build system is being used for other architectures besides armv8.1.

Regarding source.s vs test.s, in this case, source.s is generated by LLVM from our source.c file. The --target and -mcpu are the same between source.s and test.s though. I can try and come up with a simple project to reproduce the issue stripping our a bunch of our other proprietary stuff. There seems to be good information when I use clang -v

We've continued to look into this.

I think there's two parts. The first is assembling instructions with -mcpu=cortex-m85 (or -mcpu=cortex-m85+fp+mve) we've not been able to reproduce this. The -mfloat-abi=hard is a calling convention so it doesn't affect what instructions are available. Possible that an .arch directive could have overridden it locally (docs https://sourceware.org/binutils/docs/as/ARM-Directives.html). We would probably need an example source file and a command line.

The second is the available multilibs. On investigation there is a difference between LLVM 16 (latest official release which had a downstream multilib patch) and LLVM 17 which has the upstream multilib patch. There is a LLVM 17 preview release here https://github.com/ARM-software/LLVM-embedded-toolchain-for-Arm/releases/tag/preview-17.0.0-devdrop0

In summary:
LLVM16 only has a viable multilib for -mcpu=cortex-m85+nomve+nofp -> armv8.1m.main_soft_nofp_nomve
The closest workarounds I can find are to add an explicit fpu or use march.

  • -mcpu=cortex-m85+fp+mve -mfpu=fp-armv8-fullfp16-sp-d16 -> armv8.1m.main_hard_fp
  • -march=armv8.1-m.main+fp+mve -mfloat-abi=hard -> armv8.1m.main_hard_fp

LLVM17 has a viable multilib for -mcpu=cortex-m85 -mfloat-abi=hard -> armv8.1m.main_hard_fp
LLVM17 is missing a viable multilib for -mcpu-cortex-m85 -mfloat-abi=softfp (sadly the default) as we don't currently have a softfp variant. There is a soft floating point variant, but that won't use the floating point hardware at all -mcpu=cortex-m85 -mfloat-abi=soft -> armv8.1m.main_soft_nofp_nomve

If you are able to use the LLVM 17 preview this should fix the multilib problem for -mfloat-abi=hard. We are intending to have a compatible softfp library variant for the final release.

Tests (adding -Wl,--verbose) to see the libraries used by the linker
LLVM17 -mcpu=cortex-m85+fp+mve -mfloat-abi=hard

LLVMEmbeddedToolchainForArm-17.0.0-Linux-x86_64/bin/clang --target=arm-none-eabi -mcpu=cortex-m85+fp+mve mve.c   -Wl,--verbose -mfloat-abi=hard
ld.lld: /tmp/mve-643fe0.o
ld.lld: LLVMEmbeddedToolchainForArm-17.0.0-Linux-x86_64/bin/../lib/clang-runtimes/arm-none-eabi/armv8.1m.main_hard_fp/lib/libc.a
ld.lld: LLVMEmbeddedToolchainForArm-17.0.0-Linux-x86_64/bin/../lib/clang-runtimes/arm-none-eabi/armv8.1m.main_hard_fp/lib/libm.a
ld.lld: LLVMEmbeddedToolchainForArm-17.0.0-Linux-x86_64/bin/../lib/clang-runtimes/arm-none-eabi/armv8.1m.main_hard_fp/lib/libclang_rt.builtins.a

LLVM16 -mcpu=cortex-m85+fp+mve -mfloat-abi=hard

LLVMEmbeddedToolchainForArm-16.0.0-Linux-x86_64/bin/clang --target=arm-none-eabi -mcpu=cortex-m85+fp+mve mve.c   -Wl,--verbose -mfloat-abi=hard
ld.lld: /tmp/mve-5f8036.o
ld.lld: error: unable to find library -lc
ld.lld: error: unable to find library -lm
ld.lld: error: unable to find library -lclang_rt.builtins-arm

LLVM16 using -march=armv8.1-m.main+fp+mve

LLVMEmbeddedToolchainForArm-16.0.0-Linux-x86_64/bin/clang --target=arm-none-eabi -march=armv8.1-m.main+fp+mv
e mve.c   -Wl,--verbose -mfloat-abi=hard
ld.lld: /tmp/mve-d65dda.o
ld.lld: LLVMEmbeddedToolchainForArm-16.0.0-Linux-x86_64/bin/../lib/clang-runtimes/arm-none-eabi/armv8.1m.main_hard_fp/lib/libc.a
ld.lld: LLVMEmbeddedToolchainForArm-16.0.0-Linux-x86_64/bin/../lib/clang-runtimes/arm-none-eabi/armv8.1m.main_hard_fp/lib/libm.a
ld.lld:LLVMEmbeddedToolchainForArm-16.0.0-Linux-x86_64/bin/../lib/clang-runtimes/arm-none-eabi/armv8.1m.main_hard_fp/lib/libclang_rt.builtins.a

We are planning to add at least one softfp variant for the full release.

I have already been using the preview release of LLVM 17. I have been working on making a simplified example to send over and have made two observations.

Example source code:

#include <stdint.h>
#include <math.h>

#define DUMMY_CONST_1 (0.0012345F)
#define DUMMY_CONST_2 (0.01F)
#define DUMMY_CONST_3 (0.02F)
#define DUMMY_CONST_4 (0.03F)
#define DUMMY_CONST_5 (0.04F)

typedef struct
{
    float a;
    float b;
    float c;
    float d;
} dummy_t;

int8_t foo(dummy_t *handle)
{
    handle->a += DUMMY_CONST_2 * (DUMMY_CONST_1 - handle->a);
    handle->b += DUMMY_CONST_3 * (DUMMY_CONST_1 - handle->b);
    handle->c += DUMMY_CONST_4 * (DUMMY_CONST_1 - handle->c);
    handle->d += DUMMY_CONST_5 * (DUMMY_CONST_1 - handle->d);
    return 0;
}
  1. With the provided example source code and the command below, without specifying -mfloat-abi=hard I get an error that math.h cannot be found. With the -v flag sent to clang, it appears that it is using an invalid -internal-isystem /opt/LLVMEmbeddedToolchainForArm-17.0.0-Linux-x86_64/bin/../lib/clang-runtimes/include. I think this could be related to the missing softfp variant.

Here is the full output:

clang -v -std=c99 -x c -O2 -mcpu=cortex-m85+fp+mve --target=arm-none-eabi -mthumb -save-temps=obj -c -o mve_compile_test.o mve_compile_test.c 
clang: warning: no multilib found matching flags: --target=thumbv8.1m.main-none-unknown-eabi -march=thumbv8.1m.main+dsp+mve+mve.fp+fp16+ras+lob+pacbti+nocrc+nocrypto+nosha2+noaes+nodotprod+nofp16fml+nobf16+nosb+noi8mm+nocdecp0+nocdecp1+nocdecp2+nocdecp3+nocdecp4+nocdecp5+nocdecp6+nocdecp7 -mfloat-abi=softfp -mfpu=fp-armv8-fullfp16-d16 [-Wmissing-multilib]
clang: note: available multilibs are:
--target=aarch64-none-unknown-elf
--target=armv4t-none-unknown-eabi -mfpu=none
--target=armv5e-none-unknown-eabi -mfpu=none
--target=thumbv6m-none-unknown-eabi -mfpu=none
--target=armv7-none-unknown-eabi -mfpu=none
--target=armv7-none-unknown-eabihf -mfpu=vfpv3-d16
--target=armv7r-none-unknown-eabi -mfpu=none
--target=armv7r-none-unknown-eabihf -mfpu=vfpv3-d16
--target=thumbv7m-none-unknown-eabi -mfpu=none
--target=thumbv7em-none-unknown-eabi -mfpu=none
--target=thumbv7em-none-unknown-eabihf -mfpu=fpv4-sp-d16
--target=thumbv7em-none-unknown-eabihf -mfpu=fpv5-d16
--target=thumbv8m.main-none-unknown-eabi -mfpu=none
--target=thumbv8m.main-none-unknown-eabihf -mfpu=fpv5-d16
--target=thumbv8.1m.main-none-unknown-eabi -mfpu=none
--target=thumbv8.1m.main-none-unknown-eabihf -march=thumbv8.1m.main+fp16 -mfpu=fp-armv8-fullfp16-sp-d16
--target=thumbv8.1m.main-none-unknown-eabihf -march=thumbv8.1m.main+dsp+mve -mfpu=none
clang version 17.0.0
Target: arm-none-unknown-eabi
Thread model: posix
InstalledDir: /opt/LLVMEmbeddedToolchainForArm-17.0.0-Linux-x86_64/bin
 "/opt/LLVMEmbeddedToolchainForArm-17.0.0-Linux-x86_64/bin/clang-17" -cc1 -triple thumbv8.1m.main-none-unknown-eabi -E -save-temps=obj -disable-free -clear-ast-before-backend -disable-llvm-verifier -discard-value-names -main-file-name mve_compile_test.c -mrelocation-model static -mframe-pointer=all -fmath-errno -ffp-contract=on -fno-rounding-math -mconstructor-aliases -nostdsysteminc -target-cpu cortex-m85 -target-feature +soft-float-abi -target-feature -crc -target-feature -dotprod -target-feature +mve.fp -target-feature +ras -target-feature -fp16fml -target-feature -bf16 -target-feature -sb -target-feature -i8mm -target-feature +lob -target-feature -cdecp0 -target-feature -cdecp1 -target-feature -cdecp2 -target-feature -cdecp3 -target-feature -cdecp4 -target-feature -cdecp5 -target-feature -cdecp6 -target-feature -cdecp7 -target-feature +pacbti -target-feature -hwdiv-arm -target-feature +hwdiv -target-feature +vfp2 -target-feature +vfp2sp -target-feature -vfp3 -target-feature +vfp3d16 -target-feature +vfp3d16sp -target-feature -vfp3sp -target-feature +fp16 -target-feature -vfp4 -target-feature +vfp4d16 -target-feature +vfp4d16sp -target-feature -vfp4sp -target-feature -fp-armv8 -target-feature +fp-armv8d16 -target-feature +fp-armv8d16sp -target-feature -fp-armv8sp -target-feature +fullfp16 -target-feature +fp64 -target-feature -d32 -target-feature -neon -target-feature +dsp -target-feature +mve -target-feature -crypto -target-feature -sha2 -target-feature -aes -target-feature +strict-align -target-abi aapcs -mfloat-abi soft -Wunaligned-access -debugger-tuning=gdb -v -fcoverage-compilation-dir=/home/coder/workspace/peaks -resource-dir /opt/LLVMEmbeddedToolchainForArm-17.0.0-Linux-x86_64/lib/clang/17 -internal-isystem /opt/LLVMEmbeddedToolchainForArm-17.0.0-Linux-x86_64/lib/clang/17/include -internal-isystem /opt/LLVMEmbeddedToolchainForArm-17.0.0-Linux-x86_64/bin/../lib/clang-runtimes/include -O0 -std=c99 -fdebug-compilation-dir=/home/coder/workspace/peaks -ferror-limit 19 -fno-signed-char -fgnuc-version=4.2.1 -fcolor-diagnostics -faddrsig -o mve_compile_test.i -x c mve_compile_test.c
clang -cc1 version 17.0.0 based upon LLVM 17.0.0-rc1 default target aarch64-linux-gnu
ignoring nonexistent directory "/opt/LLVMEmbeddedToolchainForArm-17.0.0-Linux-x86_64/bin/../lib/clang-runtimes/include"
ignoring duplicate directory "/opt/LLVMEmbeddedToolchainForArm-17.0.0-Linux-x86_64/lib/clang/17/include"
#include "..." search starts here:
#include <...> search starts here:
 /opt/LLVMEmbeddedToolchainForArm-17.0.0-Linux-x86_64/lib/clang/17/include
End of search list.
mve_compile_test.c:2:10: fatal error: 'math.h' file not found
    2 | #include <math.h>
      |          ^~~~~~~~
1 error generated.
  1. When I do specify -mfloat-abi=hard, I get the originally reported errors. However, I don't think these are related to multilib. I think these errors are related to optimization. Using O0 or O1, this compiles fine. But using O2, O3, Ofast, Os, or Oz, I get the following errors and output:
clang -v -std=c99 -x c -mfloat-abi=hard -O2 -mcpu=cortex-m85+fp+mve --target=arm-none-eabi -mthumb -save-temps=obj -c -o mve_compile_test.o mve_compile_test.c 
clang version 17.0.0
Target: arm-none-unknown-eabi
Thread model: posix
InstalledDir: /opt/LLVMEmbeddedToolchainForArm-17.0.0-Linux-x86_64/bin
 "/opt/LLVMEmbeddedToolchainForArm-17.0.0-Linux-x86_64/bin/clang-17" -cc1 -triple thumbv8.1m.main-none-unknown-eabihf -E -save-temps=obj -disable-free -clear-ast-before-backend -disable-llvm-verifier -discard-value-names -main-file-name mve_compile_test.c -mrelocation-model static -mframe-pointer=all -fmath-errno -ffp-contract=on -fno-rounding-math -mconstructor-aliases -nostdsysteminc -target-cpu cortex-m85 -target-feature -crc -target-feature -dotprod -target-feature +mve.fp -target-feature +ras -target-feature -fp16fml -target-feature -bf16 -target-feature -sb -target-feature -i8mm -target-feature +lob -target-feature -cdecp0 -target-feature -cdecp1 -target-feature -cdecp2 -target-feature -cdecp3 -target-feature -cdecp4 -target-feature -cdecp5 -target-feature -cdecp6 -target-feature -cdecp7 -target-feature +pacbti -target-feature -hwdiv-arm -target-feature +hwdiv -target-feature +vfp2 -target-feature +vfp2sp -target-feature -vfp3 -target-feature +vfp3d16 -target-feature +vfp3d16sp -target-feature -vfp3sp -target-feature +fp16 -target-feature -vfp4 -target-feature +vfp4d16 -target-feature +vfp4d16sp -target-feature -vfp4sp -target-feature -fp-armv8 -target-feature +fp-armv8d16 -target-feature +fp-armv8d16sp -target-feature -fp-armv8sp -target-feature +fullfp16 -target-feature +fp64 -target-feature -d32 -target-feature -neon -target-feature +dsp -target-feature +mve -target-feature -crypto -target-feature -sha2 -target-feature -aes -target-feature +strict-align -target-abi aapcs -mfloat-abi hard -Wunaligned-access -debugger-tuning=gdb -v -fcoverage-compilation-dir=/home/coder/workspace/peaks -resource-dir /opt/LLVMEmbeddedToolchainForArm-17.0.0-Linux-x86_64/lib/clang/17 -internal-isystem /opt/LLVMEmbeddedToolchainForArm-17.0.0-Linux-x86_64/lib/clang/17/include -internal-isystem /opt/LLVMEmbeddedToolchainForArm-17.0.0-Linux-x86_64/bin/../lib/clang-runtimes/arm-none-eabi/armv8.1m.main_hard_fp/include -internal-isystem /opt/LLVMEmbeddedToolchainForArm-17.0.0-Linux-x86_64/bin/../lib/clang-runtimes/arm-none-eabi/armv8m.main_hard_fp/include -internal-isystem /opt/LLVMEmbeddedToolchainForArm-17.0.0-Linux-x86_64/bin/../lib/clang-runtimes/arm-none-eabi/armv7em_hard_fpv5_d16/include -internal-isystem /opt/LLVMEmbeddedToolchainForArm-17.0.0-Linux-x86_64/bin/../lib/clang-runtimes/arm-none-eabi/armv7em_hard_fpv4_sp_d16/include -O2 -std=c99 -fdebug-compilation-dir=/home/coder/workspace/peaks -ferror-limit 19 -fno-signed-char -fgnuc-version=4.2.1 -fcolor-diagnostics -vectorize-loops -vectorize-slp -faddrsig -o mve_compile_test.i -x c mve_compile_test.c
clang -cc1 version 17.0.0 based upon LLVM 17.0.0-rc1 default target aarch64-linux-gnu
ignoring duplicate directory "/opt/LLVMEmbeddedToolchainForArm-17.0.0-Linux-x86_64/lib/clang/17/include"
#include "..." search starts here:
#include <...> search starts here:
 /opt/LLVMEmbeddedToolchainForArm-17.0.0-Linux-x86_64/lib/clang/17/include
 /opt/LLVMEmbeddedToolchainForArm-17.0.0-Linux-x86_64/bin/../lib/clang-runtimes/arm-none-eabi/armv8.1m.main_hard_fp/include
 /opt/LLVMEmbeddedToolchainForArm-17.0.0-Linux-x86_64/bin/../lib/clang-runtimes/arm-none-eabi/armv8m.main_hard_fp/include
 /opt/LLVMEmbeddedToolchainForArm-17.0.0-Linux-x86_64/bin/../lib/clang-runtimes/arm-none-eabi/armv7em_hard_fpv5_d16/include
 /opt/LLVMEmbeddedToolchainForArm-17.0.0-Linux-x86_64/bin/../lib/clang-runtimes/arm-none-eabi/armv7em_hard_fpv4_sp_d16/include
End of search list.
 "/opt/LLVMEmbeddedToolchainForArm-17.0.0-Linux-x86_64/bin/clang-17" -cc1 -triple thumbv8.1m.main-none-unknown-eabihf -emit-llvm-bc -emit-llvm-uselists -save-temps=obj -disable-free -clear-ast-before-backend -disable-llvm-verifier -discard-value-names -main-file-name mve_compile_test.c -mrelocation-model static -mframe-pointer=all -fmath-errno -ffp-contract=on -fno-rounding-math -mconstructor-aliases -nostdsysteminc -target-cpu cortex-m85 -target-feature -crc -target-feature -dotprod -target-feature +mve.fp -target-feature +ras -target-feature -fp16fml -target-feature -bf16 -target-feature -sb -target-feature -i8mm -target-feature +lob -target-feature -cdecp0 -target-feature -cdecp1 -target-feature -cdecp2 -target-feature -cdecp3 -target-feature -cdecp4 -target-feature -cdecp5 -target-feature -cdecp6 -target-feature -cdecp7 -target-feature +pacbti -target-feature -hwdiv-arm -target-feature +hwdiv -target-feature +vfp2 -target-feature +vfp2sp -target-feature -vfp3 -target-feature +vfp3d16 -target-feature +vfp3d16sp -target-feature -vfp3sp -target-feature +fp16 -target-feature -vfp4 -target-feature +vfp4d16 -target-feature +vfp4d16sp -target-feature -vfp4sp -target-feature -fp-armv8 -target-feature +fp-armv8d16 -target-feature +fp-armv8d16sp -target-feature -fp-armv8sp -target-feature +fullfp16 -target-feature +fp64 -target-feature -d32 -target-feature -neon -target-feature +dsp -target-feature +mve -target-feature -crypto -target-feature -sha2 -target-feature -aes -target-feature +strict-align -target-abi aapcs -mfloat-abi hard -Wunaligned-access -debugger-tuning=gdb -v -fcoverage-compilation-dir=/home/coder/workspace/peaks -resource-dir /opt/LLVMEmbeddedToolchainForArm-17.0.0-Linux-x86_64/lib/clang/17 -O2 -std=c99 -fdebug-compilation-dir=/home/coder/workspace/peaks -ferror-limit 19 -fno-signed-char -fgnuc-version=4.2.1 -fcolor-diagnostics -vectorize-loops -vectorize-slp -disable-llvm-passes -faddrsig -o mve_compile_test.bc -x cpp-output mve_compile_test.i
clang -cc1 version 17.0.0 based upon LLVM 17.0.0-rc1 default target aarch64-linux-gnu
#include "..." search starts here:
#include <...> search starts here:
 /opt/LLVMEmbeddedToolchainForArm-17.0.0-Linux-x86_64/lib/clang/17/include
End of search list.
 "/opt/LLVMEmbeddedToolchainForArm-17.0.0-Linux-x86_64/bin/clang-17" -cc1 -triple thumbv8.1m.main-none-unknown-eabihf -S -save-temps=obj -disable-free -clear-ast-before-backend -disable-llvm-verifier -discard-value-names -main-file-name mve_compile_test.c -mrelocation-model static -mframe-pointer=all -fmath-errno -ffp-contract=on -fno-rounding-math -mconstructor-aliases -nostdsysteminc -target-cpu cortex-m85 -target-feature -crc -target-feature -dotprod -target-feature +mve.fp -target-feature +ras -target-feature -fp16fml -target-feature -bf16 -target-feature -sb -target-feature -i8mm -target-feature +lob -target-feature -cdecp0 -target-feature -cdecp1 -target-feature -cdecp2 -target-feature -cdecp3 -target-feature -cdecp4 -target-feature -cdecp5 -target-feature -cdecp6 -target-feature -cdecp7 -target-feature +pacbti -target-feature -hwdiv-arm -target-feature +hwdiv -target-feature +vfp2 -target-feature +vfp2sp -target-feature -vfp3 -target-feature +vfp3d16 -target-feature +vfp3d16sp -target-feature -vfp3sp -target-feature +fp16 -target-feature -vfp4 -target-feature +vfp4d16 -target-feature +vfp4d16sp -target-feature -vfp4sp -target-feature -fp-armv8 -target-feature +fp-armv8d16 -target-feature +fp-armv8d16sp -target-feature -fp-armv8sp -target-feature +fullfp16 -target-feature +fp64 -target-feature -d32 -target-feature -neon -target-feature +dsp -target-feature +mve -target-feature -crypto -target-feature -sha2 -target-feature -aes -target-feature +strict-align -target-abi aapcs -mfloat-abi hard -Wunaligned-access -debugger-tuning=gdb -v -fcoverage-compilation-dir=/home/coder/workspace/peaks -resource-dir /opt/LLVMEmbeddedToolchainForArm-17.0.0-Linux-x86_64/lib/clang/17 -O2 -std=c99 -fdebug-compilation-dir=/home/coder/workspace/peaks -ferror-limit 19 -fno-signed-char -fgnuc-version=4.2.1 -fcolor-diagnostics -vectorize-loops -vectorize-slp -faddrsig -o mve_compile_test.s -x ir mve_compile_test.bc
clang -cc1 version 17.0.0 based upon LLVM 17.0.0-rc1 default target aarch64-linux-gnu
 "/opt/LLVMEmbeddedToolchainForArm-17.0.0-Linux-x86_64/bin/clang-17" -cc1as -triple thumbv8.1m.main-none-unknown-eabihf -filetype obj -main-file-name mve_compile_test.c -target-cpu cortex-m85 -target-feature -crc -target-feature -dotprod -target-feature +mve.fp -target-feature +ras -target-feature -fp16fml -target-feature -bf16 -target-feature -sb -target-feature -i8mm -target-feature +lob -target-feature -cdecp0 -target-feature -cdecp1 -target-feature -cdecp2 -target-feature -cdecp3 -target-feature -cdecp4 -target-feature -cdecp5 -target-feature -cdecp6 -target-feature -cdecp7 -target-feature +pacbti -target-feature -hwdiv-arm -target-feature +hwdiv -target-feature +vfp2 -target-feature +vfp2sp -target-feature -vfp3 -target-feature +vfp3d16 -target-feature +vfp3d16sp -target-feature -vfp3sp -target-feature +fp16 -target-feature -vfp4 -target-feature +vfp4d16 -target-feature +vfp4d16sp -target-feature -vfp4sp -target-feature -fp-armv8 -target-feature +fp-armv8d16 -target-feature +fp-armv8d16sp -target-feature -fp-armv8sp -target-feature +fullfp16 -target-feature +fp64 -target-feature -d32 -target-feature -neon -target-feature +dsp -target-feature +mve -target-feature -crypto -target-feature -sha2 -target-feature -aes -target-feature +strict-align -fdebug-compilation-dir=/home/coder/workspace/peaks -dwarf-version=5 -mrelocation-model static -mllvm -arm-add-build-attributes -o mve_compile_test.o mve_compile_test.s
mve_compile_test.s:44:2: error: invalid instruction, any one of the following would fix this:
        vsub.f32        q1, q1, q0
        ^
mve_compile_test.s:44:6: note: invalid operand for instruction
        vsub.f32        q1, q1, q0
            ^
mve_compile_test.s:44:2: note: instruction requires: mve.fp
        vsub.f32        q1, q1, q0
        ^
mve_compile_test.s:45:2: error: instruction requires: mve.fp
        vfma.f32        q0, q1, q2

Thanks for the example. The missing header file is definitely part of the missing softfp multilib variant.

The second error looks like it is related to the compilers assembler output. The -save-temps=obj outputs an assembly file and then reassembles it.

Looking at the assembly

        .text
        .syntax unified
        .eabi_attribute 67, "2.09"      @ Tag_conformance
        .cpu    cortex-m85
        .eabi_attribute 6, 21   @ Tag_CPU_arch
        .eabi_attribute 7, 77   @ Tag_CPU_arch_profile
        .eabi_attribute 8, 0    @ Tag_ARM_ISA_use
        .eabi_attribute 9, 3    @ Tag_THUMB_ISA_use
        .fpu    fpv5-d16
        ...

It looks like the .cpu and .fpu directives here are overriding the command line -mcpu option and are losing the MVE. This can be reproduced separately with:

LLVMEmbeddedToolchainForArm-17.0.0-Linux-x86_64/bin/clang --target=arm-none-eabi -mcpu=cortex-m85+fp+mve -mfloat-abi=hard -O2 -S t.c -o t.s
LLVMEmbeddedToolchainForArm-17.0.0-Linux-x86_64/bin/clang --target=arm-none-eabi -mcpu=cortex-m85+fp+mve -mfloat-abi=hard -O2 -c t.s
t.s:44:2: error: invalid instruction, any one of the following would fix this:
        vsub.f32        q1, q1, q0
        ^
t.s:44:6: note: invalid operand for instruction
        vsub.f32        q1, q1, q0
            ^
t.s:44:2: note: instruction requires: mve.fp
        vsub.f32        q1, q1, q0
        ^
t.s:45:2: error: instruction requires: mve.fp
        vfma.f32        q0, q1, q2
        ^

When editing t.s I can remove/comment out the .cpu and .fpu directives and the file will assemble correctly.

So it looks like it is the compiler's assembly output is at fault here.

Thanks for the confirmation on both issues.

For the first, I would guess that this issue lies in "LLVM-embedded-toolchain-for-Arm"?

What about the second? Do I need to submit a ticket with the LLVM project?

The first is definitely within the scope of this project. We're going to make at least one softfp variant to start with so that there is at least a compatible softfp multilib.

I have an internal ticket that I raised for the code-generation problem. I can submit it to the llvm-project for more visibility. Will do that tomorrow.

With this change #302 that added softfp library variant, now the default for Cortrex-M85 is satisfied:

./LLVMEmbeddedToolchainForArm-18.0.0-Linux-x86_64/bin/clang --target=arm-none-eabi -mcpu=cortex-m85 -c -o test.o test.c --print-multi-directory
# arm-none-eabi/armv7m_soft_fpv4_sp_d16

./LLVMEmbeddedToolchainForArm-18.0.0-Linux-x86_64/bin/clang --target=arm-none-eabi -mcpu=cortex-m85+fp -c -o test.o test.c --print-multi-directory
# arm-none-eabi/armv7m_soft_fpv4_sp_d16

./LLVMEmbeddedToolchainForArm-18.0.0-Linux-x86_64/bin/clang --target=arm-none-eabi -mcpu=cortex-m85+fp+mve -c -o test.o test.c --print-multi-directory
# arm-none-eabi/armv7m_soft_fpv4_sp_d16

https://github.com/ARM-software/LLVM-embedded-toolchain-for-Arm/releases/tag/release-17.0.1 has fixes for both multilib and MVE reassembling issues, however we will need to upstream the latter into LLVM for future releases.

Both changes were upstreamed, closing.