ARM-software / LLVM-embedded-toolchain-for-Arm

A project dedicated to building LLVM toolchain for 32-bit Arm embedded targets.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Include profiling lib?

rgrr opened this issue · comments

I'm trying to do profiling on the target with clang. Unfortunately the corresponding symbols __llvm_profile* are not included in clang_rt.

I have tried to include it via -DCOMPILER_RT_BUILD_PROFILE=ON in CMakeLists.txt:~600 but had no success. On the internet there is almost no information about how to build this part correctly.

Do you have any pointer or better could you please include it in the toolchain package?

Hi Hardy,

Indeed, the profiling runtime is not included since it does not support building for bare-metal targets now. We are interested in enabling it, but we expect to be able to look into it at some later stage.

As a workaround, what should be possible it to provide own implementation for the profiling runtime functions required in a particular use case, however this would involve reverse engineering of what these runtime functions are expected to do and re-implementing them.

I have been able to write an implementation of the profiling functions for code-coverage in LLVM 13.0.0 although sadly the profiling format changes between LLVM releases and my runtime was written for a different linker and C-library so it would not work without modifications.

The bare-metal runtime for coverage is based on the code in compiler-rt/lib/profile/InstrProfilingPlatformOther.c I added my own __llvm_profile_dump() function that was registered as an atexit handler by __llvm_profile_register_function(). This essentially writes a file with the profiling header and then writes out the various counter sections to a file. An alternative method would be to call the __llvm_profile_dump() directly either at the end of main, or immediately after it.

If I get the time I'll try and adapt this for LLVM 16. May take me some time to do this though.

The main difficulty for bare-metal profiling upstream is that there is no universal way of extracting the counters.

Hello Volodymyr & Peter,

some time ago I implemented the profiling output for clang13 via debug console output too. But as Peter wrote, those implementations are compiler dependent and I'm stuck at clang13 as well (also I did separation of the constant and variable data, because the profiling stuff contains a lot of constant data (sections __llvm_covmap, __llvm_covfun) which should not go into the target, but perhaps this has changed in the meantime).

I'm wondering if there could be a more generic approach via semihosting. Semihosting just for transmission of the profiling data after execution, for nothing else. This way, the regular profiling dump code should be usable.

Currently I'm trying to change the build operations of this project to output the profiling library (libclang_rt.profile*.a ?) to further experiment. But til now without luck.

Any pointers for doing this?

Ok, it seems that I have been a little bit naive. InstrProfilingFile is nothing for baremetal. Wondering why they made it such complicated and non-protable.
So I'm currently checking my own implementation of InstrProfilingFileOther or something similar.

@peter: is it possible to get your InstrProfilingPlatformOther.c as a starter? My own old implementation does not make me very happy because it changed InstrProfilingWriter heavily and is less than portable.

Yes, although I would like to spend some time to adapt it for the toolchain and make sure it works first. The version I have is written and tested for Arm Compiler 6, which has a different linker. I'll aim to get it done this week. If I don't get started in time I'll post what I have, although it will need adapting.

proflib.zip
I've attached a zip file of a basic implementation of the profiling runtime. I can't attach .c files directly unfortunately. Some notes:

  • Given that so much of the code is adapted from compiler-rt I've used the LLVM Apache 2.0 license with LLVM exceptions and referenced compiler-rt profile.
  • The version of the runtime is 8 (LLVM 13.0.0 was 5), there were a few changes in the meanings of fields (absolute to relative). I would expect this to work for a source build from main, or the LLVM 15 release of the toolchain.
  • I've only tested this on one example (samples/src/baremetal-semihosting) using llvm-cov to produce code-coverage. I've not tried profile guided optimisation.
  • The runtime writes a file called default.profraw
    To use it on the example:
clang -c -O1 -g --config armv6m_soft_nofp_semihost.cfg proflib.c
clang  -O1 -g --config armv6m_soft_nofp_semihost.cfg proflib.o hello.c -fprofile-instr-generate -fcoverage-mapping -T ../../ldscripts/microbit.ld -o hello.elf
llvm-objcopy -O ihex hello.elf hello.hex
qemu-system-arm  -M microbit -semihosting -nographic -device loader,file=hello.hex
llvm-profdata merge -sparse default.profraw -o hello.profdata
llvm-cov show hello.elf -instr-profile=hello.profdata
    1|       |#include <stdio.h>
    2|       |
    3|      1|int main(void) {
    4|      1|  printf("Hello World!\n");
    5|      1|  return 0;
    6|      1|}

Hope that is enough to get you started. The implementation tries to stay as close to the code in compiler-rt so it may not be the most efficient.

We hope to add a sample with profiling that we can update when clang updates the profile version.

Hello Peter,

thanks for the file. It is really stripped down and straight forward. Your detection of INSTR_PROF_PROFILE_RUNTIME_VAR in __llvm_profile_register_function() helped a lot. I wasn't aware of that problem (bug?) and did it with a special linker file.

Now I will try to integrate it into the toolchain build.

PS: what's really a waste are the 64bit counters. TI seems to have patched clang so that it can also to it with 32bit counters. There is nothing in this direction for the original clang available, right?

PPS: and actually __llvm_prf_names and __llvm_prf_data could be dropped from the output image and merged later into profdata. Some time ago I wrote a script to accomplish this. Anyone aware if there is a standard procedure for doing this?

I think several teams have made their own modification to the profiling, but as yet no-one has upstreamed the changes so there isn't anything like that available in clang. We are hoping to work with the community to persuade someone to do this, but it suffers from being an area where it is fairly simple to make a downstream implementation that works for your toolchain, but hard to make a general implementation that works well enough for everyone.

One example of a discussion that didn't go anywhere: https://lists.llvm.org/pipermail/llvm-dev/2017-September/117156.html

I'm not aware of a standard way of extracting the __llvm_prf_names and __llvm_prf_data. I guess that these could be extracted from the executable with objcopy. The runtime would need to calculate the number of counters and number of data for the header file differently, probably via linker defined symbols instead.

I doubt, that I will have enough energy and patience to make it upstreams with a generic profiling approach. Looking at their backlog I'm actually not confident that my report about wrong optimization will ever make it. For new / changed concepts I'ld see chances below zero.

Nevertheless I'ld be happy to make a contribution to this project. My idea is to create something like a contrib folder which holds features which can be compiled in via selection.

More specific this could be something like contrib/profiling with the option LLVM_TOOLCHAIN_CONTRIB_BUILD_LIBCLANG_RT_PROFILE (or something shorter ;-)).

Any pointers how to integrate this for all libraries?

The way the libraries are structured right now is that we have a separate sysroot for each supported target (lib/clang-runtimes). In theory adding a contrib sub-directory containing a proflib library could work. It may need a --config=contrib.cfg as a shortcut to include the contrib library directory on the linker path.

I'm more of a user of the toolchain than someone that knows how best to build it. I'll refer this to my colleagues to see if they have any suggestions.

No, I meant it differently: the sources of the optional feature go into a contrib directory structure, the generated library goes to the standard place lib/clang-runtimes/arm-none-eabi/*/lib which means that it has to be generated for each target.

I've implemented it already and will place a PR hopefully today.

One point: if -fprofile-instrument... is provided to the clang driver, it includes automatically -lclang_rt.profile as an ldd option. At least my knowledge. It does not so for the generated cross compiler.
Anyone an idea how to add this behavior to the embedded version?

In the https://github.com/llvm/llvm-project/tree/main/clang/lib/Driver you'll notice a directory called ToolChains. Each --target=-- will map to one of those ToolChains, in the case of arm-none-eabi and aarch64-none-eabi this is the BareMetal driver https://github.com/llvm/llvm-project/blob/main/clang/lib/Driver/ToolChains/BareMetal.cpp

I think that the omission for the BareMetal driver in this case is intentional. Until there is an upstream implementation of clang_rt.profile for BareMetal (like there is for the builtins) then this is not a universal positive, i.e. it will break everyone that has implemented their own profiling runtime with a linker error. This is resolvable, perhaps with an option to suppress, or perhaps an agreement upstream to do it on the grounds that people's downsteam implementations can be made into such a library. I should warn you that this toolchain does not have its own fork of the LLVM repo, this is a design decision to discourage downstream changes.

We do have a call (LLVM Embedded Toolchains with details in https://llvm.org/docs/GettingInvolved.html#online-sync-ups) that we take part in, alongside other interested parties. May be a useful place to discuss.

Hi,

An update here, the topic was discussed in the LLVM WG sync https://discourse.llvm.org/t/llvm-embedded-toolchains-working-group-sync-up/63270/30 and the consensus was that the profiling runtime needed to be refactored first before it would be ported to bare-metal. The progress is expected here, but it will take time.

So as an intermediate solution, we agreed internally to create a sample based on the standalone minimal runtime example provided here so that people have something to start with, see #249. Eventually, bare-metal runtime support will be provided upstream.

I suggest to close this issue for now and the PR #204 since it does not make sense to accept it in this form/approach.

Any objections?

Hello Volodymyr

thanks for your efforts. And no objections from my side.

In the meantime I will insert my personal contribs into my personal fork of this repo ;-) (https://github.com/rgrr/LLVM-embedded-toolchain-for-Arm/tree/feature/contrib)