google / ml-compiler-opt

Infrastructure for Machine Learning Guided Optimization (MLGO) in LLVM.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

error: Could not setup Inlining Advisor for the requested mode and/or options

prasitaGit opened this issue · comments

I'm trying to run the demo following this link: https://github.com/google/ml-compiler-opt/blob/main/docs/demo/demo.md
Everything goes fine till I run the fx build (i run the commands in order provided in the link. So I run fx set core.x64 followed by the given arguments before that). fx set runs successfully. However, fx build gives me this error:

[2754/67050] CXX host_x64/obj/sdk/lib/syslog/cpp/backend_host.logging_backend_shared.cc.o
FAILED: host_x64/obj/sdk/lib/syslog/cpp/backend_host.logging_backend_shared.cc.o
../../../llvm-install/bin/clang++ -MD -MF host_x64/obj/sdk/lib/syslog/cpp/backend_host.logging_backend_shared.cc.o.d -D_LIBCPP_DISABLE_VISIBILITY_ANNOTATIONS -I../.. -Ihost_x64/gen -I../../sdk -Ihost_x64/gen/sdk -I../../sdk/lib/fit-promise/include -I../../sdk/lib/fit/include -I../../sdk/lib/stdcompat/include -I../../zircon/system/public -fclang-abi-compat=13.0 -fcolor-diagnostics -fcrash-diagnostics-dir=clang-crashreports -Xclang -fembed-bitcode=all -ffp-contract=off --sysroot=../../prebuilt/third_party/sysroot/linux --target=x86_64-unknown-linux-gnu -ffile-compilation-dir=. -no-canonical-prefixes -fomit-frame-pointer -fdata-sections -ffunction-sections -Os -mllvm -enable-ml-inliner=release -gdwarf-5 -Xclang -debug-info-kind=constructor -g3 -Wall -Wextra -Wconversion -Wextra-semi -Wimplicit-fallthrough -Wnewline-eof -Wstrict-prototypes -Wwrite-strings -Wno-sign-conversion -Wno-unused-parameter -Wnonportable-system-include-path -fvisibility=hidden -Werror -Wno-error=deprecated-declarations -Wa,--fatal-warnings --sysroot=../../prebuilt/third_party/sysroot/linux --target=x86_64-unknown-linux-gnu -fPIE -fvisibility-inlines-hidden -stdlib=libc++ -std=c++17 -fno-exceptions -fno-rtti -stdlib=libc++ -c ../../sdk/lib/syslog/cpp/logging_backend_shared.cc -o host_x64/obj/sdk/lib/syslog/cpp/backend_host.logging_backend_shared.cc.o
error: Could not setup Inlining Advisor for the requested mode and/or options
1 error generated.

[2755/67050] CXX host_x64/obj/sdk/lib/syslog/cpp/cpp.log_settings.cc.o
FAILED: host_x64/obj/sdk/lib/syslog/cpp/cpp.log_settings.cc.o
../../../llvm-install/bin/clang++ -MD -MF host_x64/obj/sdk/lib/syslog/cpp/cpp.log_settings.cc.o.d -D_LIBCPP_DISABLE_VISIBILITY_ANNOTATIONS -I../.. -Ihost_x64/gen -I../../sdk -Ihost_x64/gen/sdk -I../../sdk/lib/fit-promise/include -I../../sdk/lib/fit/include -I../../sdk/lib/stdcompat/include -I../../zircon/system/public -fclang-abi-compat=13.0 -fcolor-diagnostics -fcrash-diagnostics-dir=clang-crashreports -Xclang -fembed-bitcode=all -ffp-contract=off --sysroot=../../prebuilt/third_party/sysroot/linux --target=x86_64-unknown-linux-gnu -ffile-compilation-dir=. -no-canonical-prefixes -fomit-frame-pointer -fdata-sections -ffunction-sections -Os -mllvm -enable-ml-inliner=release -gdwarf-5 -Xclang -debug-info-kind=constructor -g3 -Wall -Wextra -Wconversion -Wextra-semi -Wimplicit-fallthrough -Wnewline-eof -Wstrict-prototypes -Wwrite-strings -Wno-sign-conversion -Wno-unused-parameter -Wnonportable-system-include-path -fvisibility=hidden -Werror -Wno-error=deprecated-declarations -Wa,--fatal-warnings --sysroot=../../prebuilt/third_party/sysroot/linux --target=x86_64-unknown-linux-gnu -fPIE -fvisibility-inlines-hidden -stdlib=libc++ -std=c++17 -fno-exceptions -fno-rtti -stdlib=libc++ -c ../../sdk/lib/syslog/cpp/log_settings.cc -o host_x64/obj/sdk/lib/syslog/cpp/cpp.log_settings.cc.o
error: Could not setup Inlining Advisor for the requested mode and/or options
1 error generated.
[2757/67050] CXX host_x64/obj/sdk/lib/syslog/cpp/cpp.macros.cc.o
FAILED: host_x64/obj/sdk/lib/syslog/cpp/cpp.macros.cc.o
../../../llvm-install/bin/clang++ -MD -MF host_x64/obj/sdk/lib/syslog/cpp/cpp.macros.cc.o.d -D_LIBCPP_DISABLE_VISIBILITY_ANNOTATIONS -I../.. -Ihost_x64/gen -I../../sdk -Ihost_x64/gen/sdk -I../../sdk/lib/fit-promise/include -I../../sdk/lib/fit/include -I../../sdk/lib/stdcompat/include -I../../zircon/system/public -fclang-abi-compat=13.0 -fcolor-diagnostics -fcrash-diagnostics-dir=clang-crashreports -Xclang -fembed-bitcode=all -ffp-contract=off --sysroot=../../prebuilt/third_party/sysroot/linux --target=x86_64-unknown-linux-gnu -ffile-compilation-dir=. -no-canonical-prefixes -fomit-frame-pointer -fdata-sections -ffunction-sections -Os -mllvm -enable-ml-inliner=release -gdwarf-5 -Xclang -debug-info-kind=constructor -g3 -Wall -Wextra -Wconversion -Wextra-semi -Wimplicit-fallthrough -Wnewline-eof -Wstrict-prototypes -Wwrite-strings -Wno-sign-conversion -Wno-unused-parameter -Wnonportable-system-include-path -fvisibility=hidden -Werror -Wno-error=deprecated-declarations -Wa,--fatal-warnings --sysroot=../../prebuilt/third_party/sysroot/linux --target=x86_64-unknown-linux-gnu -fPIE -fvisibility-inlines-hidden -stdlib=libc++ -std=c++17 -fno-exceptions -fno-rtti -stdlib=libc++ -c ../../sdk/lib/syslog/cpp/macros.cc -o host_x64/obj/sdk/lib/syslog/cpp/cpp.macros.cc.o
error: Could not setup Inlining Advisor for the requested mode and/or options
1 error generated.

[2758/67050] CXX host_x64/obj/sdk/lib/syslog/cpp/backend_host.logging_backend_host.cc.o
FAILED: host_x64/obj/sdk/lib/syslog/cpp/backend_host.logging_backend_host.cc.o
../../../llvm-install/bin/clang++ -MD -MF host_x64/obj/sdk/lib/syslog/cpp/backend_host.logging_backend_host.cc.o.d -D_LIBCPP_DISABLE_VISIBILITY_ANNOTATIONS -I../.. -Ihost_x64/gen -I../../sdk -Ihost_x64/gen/sdk -I../../sdk/lib/fit-promise/include -I../../sdk/lib/fit/include -I../../sdk/lib/stdcompat/include -I../../zircon/system/public -fclang-abi-compat=13.0 -fcolor-diagnostics -fcrash-diagnostics-dir=clang-crashreports -Xclang -fembed-bitcode=all -ffp-contract=off --sysroot=../../prebuilt/third_party/sysroot/linux --target=x86_64-unknown-linux-gnu -ffile-compilation-dir=. -no-canonical-prefixes -fomit-frame-pointer -fdata-sections -ffunction-sections -Os -mllvm -enable-ml-inliner=release -gdwarf-5 -Xclang -debug-info-kind=constructor -g3 -Wall -Wextra -Wconversion -Wextra-semi -Wimplicit-fallthrough -Wnewline-eof -Wstrict-prototypes -Wwrite-strings -Wno-sign-conversion -Wno-unused-parameter -Wnonportable-system-include-path -fvisibility=hidden -Werror -Wno-error=deprecated-declarations -Wa,--fatal-warnings --sysroot=../../prebuilt/third_party/sysroot/linux --target=x86_64-unknown-linux-gnu -fPIE -fvisibility-inlines-hidden -stdlib=libc++ -std=c++17 -fno-exceptions -fno-rtti -stdlib=libc++ -c ../../sdk/lib/syslog/cpp/logging_backend_host.cc -o host_x64/obj/sdk/lib/syslog/cpp/backend_host.logging_backend_host.cc.o
error: Could not setup Inlining Advisor for the requested mode and/or options
1 error generated.

[2761/67050] CC efi_x64/obj/third_party/lz4/lib/liblz4.lz4hc.c.o
ninja: build stopped: subcommand failed.
Hint: run fx build with the option --log LOGFILE to generate a debug log if you are reporting a bug.

It'd be great to have some clarity on this.

That means that you're running a clang that doesn't have an inliner policy embedded. hmm... could you check you did this step, and that LLVM_INSTALLDIR_RELEASE is correct, and that in the fx set command, the clang_prefix points to the same place LLVM_INSTALLDIR_RELEASE points?

I didn't go till that step. I'm stuck precisely here: https://github.com/google/ml-compiler-opt/blob/main/docs/demo/demo.md#build-fuchsia, i.e., before extracting the corpus.

Aaah... I think I see why. By default, Fuchsia builds now with the ml inliner enabled, but since the default clang doesn't have that built in, you need to disable it at this stage. Add to fx set --args=clang_ml_inliner=false to disable it.

When we wrote the demo, they haven't yet flipped the default. Please let me know if this unblocks you, and I'll update the demo.

Thanks!

Thanks for the reply. I did what you said. The script looks like:
cd ${FUCHSIA_SRCDIR}
jiri update ~/ml-compiler-opt/docs/demo/fuchsia.xml
fx set core.x64
--args='clang_prefix="/usr/local/google/home/mtrofin/llvm-install/bin"'
--args=clang_embed_bitcode=true
--args=clang_ml_inliner=false
--args='optimize="size"'
fx build

But now, fx set fails giving this error:
[17:07:30.823] Jiri packages are not fetched due to fatal errors when updating projects.
[17:07:30.823] Jiri hooks are not run due to fatal errors when updating projects or packages
ERROR: cannot move "/home/saikat/MLGO/fuchsia/third_party/pyyaml/src" to "/home/saikat/MLGO/fuchsia/third_party/pyyaml" as the destination already exists
ERROR at //BUILD.gn:201:5: Script returned non-zero exit code.
exec_script("//prebuilt/third_party/gn/${host_platform}/gn", gn_cmd)
^----------
Current dir: /home/saikat/MLGO/fuchsia/out/default/
Command: /usr/bin/env /home/saikat/MLGO/fuchsia/prebuilt/third_party/gn/linux-x64/gn gen -q --root=../../zircon --args=# THIS FILE IS CLOBBERED. DO NOT EDIT!
Instead, edit //out/default/args.gn to add
zircon_extra_args = { ... } to override settings below.
asan_default_options = ["detect_stack_use_after_return=1"] lsan_default_options = [] ubsan_default_options = ["print_stacktrace=1", "halt_on_error=1"] scudo_default_options = [] forward_variables_from({
clang_tool_dir = "/home/saikat/MLGO/llvm-install/bin"
default_deps = ["//:legacy_unification-x64"]
disable_kernel_pci = false
goma_dir = "/home/saikat/MLGO/fuchsia/prebuilt/third_party/goma/linux-x64"
optimize = "size"
output_gsym = false
rustc_version_string = "XJj9WwQ5ZIufh8195356YT-rfrxfxUL44aixzXq6oL8C"
use_ccache = false
use_goma = false
variants = []
zbi_compression = "zstd"
zx_fidl_trace_level = 0
}, "*") --export-compile-commands=default ../default.zircon
Returned 1 and printed out:

ERROR at //public/gn/BUILDCONFIG.gn:863:3: Dependency not allowed.
group("${_library_name}.headers") {
^----------------------------------
The item //system/utest/abi-type-validator:abi-type-validator.headers
can not depend on //system/utest/abi-type-validator:_library.config.abi-type-validator
because it is not in //system/utest/abi-type-validator:_library.config.abi-type-validator's visibility list: [
//system/utest/abi-type-validator/abi-type-validator.headers:abi-type-validator.headers
]

See //BUILD.gn:203:3: whence it was called.
_run_zircon_gn("zircon") {

when doing jiri update, pass -gc, like this: jiri update -gc ~/ml-compiler-opt/docs/demo/fuchsia.xml

also make sure to use the llvm hash specified there.

Sorry about this - we should update the demo to more recent fuchsia/llvm - they seem to have gotten out of sync if you update llvm but not fuchsia; I'll chat with the Fuchsia folks on Tuesday and get it up to date.

Thanks for the reply. Where exactly should I use the llvm hash? Also on including -gc like you said, I get load error in fx set:

ERROR at //third_party/cobalt/src/lib/util/BUILD.gn:8:1: Can't load input file.
import("//third_party/protobuf/proto_library.gni")
^------------------------------------------------
Unable to load:
/home/saikat/MLGO/fuchsia/third_party/protobuf/proto_library.gni
I also checked in the secondary tree for:
/home/saikat/MLGO/fuchsia/build/secondary/third_party/protobuf/proto_library.gni
See //src/cobalt/bin/app/BUILD.gn:79:5: which caused the file to be included.
"//third_party/cobalt/src/lib/util:pem_util",
^-------------------------------------------
ERROR: error running gn gen: exit status 1
Re-running gn gen first (/home/saikat/MLGO/fuchsia/prebuilt/third_party/gn/linux-x64/gn changed)
ERROR at //third_party/cobalt/src/logger/BUILD.gn:5:1: Can't load input file.
import("//third_party/protobuf/proto_library.gni")
^------------------------------------------------
Unable to load:
/home/saikat/MLGO/fuchsia/third_party/protobuf/proto_library.gni
I also checked in the secondary tree for:
/home/saikat/MLGO/fuchsia/build/secondary/third_party/protobuf/proto_library.gni
See //src/cobalt/bin/app/BUILD.gn:81:5: which caused the file to be included.
"//third_party/cobalt/src/logger",

May be worth deleting the fuchsia source dir and then fetching it again - I'm assuming you followed this, and ended up doing a curl -s "https://fuchsia.googlesource.com/fuchsia/+/HEAD/scripts/bootstrap?format=TEXT" | base64 --decode | bash, correct?

so after that if you do cd ${FUCHSIA_SRCDIR} && jiri update -gc ~/ml-compiler-opt/docs/demo/fuchsia.xml, it should get you in a usable state.

The llvm hash is the git hash - i.e. after you clone llvm, cd llvm-project && git checkout fa4c3f70ff0768a270b0620dc6d158ed1205ec4e

Some background: Fuchsia is really a suite of repos, and the xml doc specifies a snapshot of all of them, which defines the product (similar to how chrome, for example, ingests a bunch of repos, like v8). Fuchsia also periodically fetches llvm from upstream and builds and tests their own compiler toolchain (hence the pretty custom cmake flags). As any of these repos evolve, they get out of sync (stuff breaks), hence our specifying the exact snapshot where we tried out the demo. That's also the reason we want to start updating the demo snapshot (i.e. the xml file and the llvm hash) more frequently, so folks can try things out with llvm close to HEAD (with high probability that if they also want to try it exactly at HEAD, it'd work).

(The only nuance that may cause some friction, though, right now, is that the IDK and SYSROOT packages aren't captured in the xml file, but that shouldn't manifest itself during jiri or fx set.)

I did the above steps. I don't get the issues described above, most of the fx build runs but I get this:
[17/3219] ACTION //sdk/lib/ui/scenic/cpp:gen_command_sizing(//build/toolchain/fuchsia:x64)
FAILED: obj/sdk/lib/ui/scenic/cpp/scenic-measure-tape.checked
/usr/bin/env ../../build/gn_run_binary.sh /homes/mukher39/scratch/llvm-install/bin host_x64/measure-tape --json fidling/gen/sdk/fidl/fuchsia.images/fuchsia.images.fidl.json --json fidling/gen/sdk/fidl/fuchsia.ui.gfx/fuchsia.ui.gfx.fidl.json --json fidling/gen/sdk/fidl/fuchsia.ui.input/fuchsia.ui.input.fidl.json --json fidling/gen/sdk/fidl/fuchsia.ui.scenic/fuchsia.ui.scenic.fidl.json --json fidling/gen/sdk/fidl/fuchsia.ui.views/fuchsia.ui.views.fidl.json --target-binding hlcpp --target-type fuchsia.ui.scenic/Command --out-h /u/scratch1/mukher39/fuchsia/sdk/lib/ui/scenic/cpp/commands_sizing.h --h-include-path lib/ui/scenic/cpp/commands_sizing.h --out-cc /u/scratch1/mukher39/fuchsia/sdk/lib/ui/scenic/cpp/commands_sizing.cc --only-check-to-file /u/scratch1/mukher39/fuchsia/out/default/obj/sdk/lib/ui/scenic/cpp/scenic-measure-tape.checked
/u/scratch1/mukher39/fuchsia/sdk/lib/ui/scenic/cpp/commands_sizing.h and/or /u/scratch1/mukher39/fuchsia/sdk/lib/ui/scenic/cpp/commands_sizing.cc is out of date! Please run the following

./host_x64/measure-tape
--json
fidling/gen/sdk/fidl/fuchsia.images/fuchsia.images.fidl.json
--json
fidling/gen/sdk/fidl/fuchsia.ui.gfx/fuchsia.ui.gfx.fidl.json
--json
fidling/gen/sdk/fidl/fuchsia.ui.input/fuchsia.ui.input.fidl.json
--json
fidling/gen/sdk/fidl/fuchsia.ui.scenic/fuchsia.ui.scenic.fidl.json
--json
fidling/gen/sdk/fidl/fuchsia.ui.views/fuchsia.ui.views.fidl.json
--target-binding
hlcpp
--target-type
fuchsia.ui.scenic/Command
--out-h
/u/scratch1/mukher39/fuchsia/sdk/lib/ui/scenic/cpp/commands_sizing.h
--h-include-path
lib/ui/scenic/cpp/commands_sizing.h
--out-cc
/u/scratch1/mukher39/fuchsia/sdk/lib/ui/scenic/cpp/commands_sizing.cc

[48/3219] CXX x64-shared/obj/zircon/system/ulib/zxio/remote_v2/libzxio.remote_v2.cc.o
ninja: build stopped: subcommand failed.
I downloaded and installed fuchsia just and ran this. I can't comprehend how can commands_sizing.h be out of date? I'm not sure whether the issue is on Fuchsia's end or here.

I'm about to update the demo - it's very likely a bunch of instructions got old / packages out of sync (i.e. the IDK relative to fuchsia and llvm). The update would let you 'be' at HEAD of fuchsia and a fuchsia-determined git hash of llvm, close to llvm head. Trying it out locally first to make sure all holds up, I should have it updated by EOD today. I'll let you know. Sorry this is causing trouble.

Updated. The TL;DR; is that you can just jiri update -gc under ${FUCHSIA_SRCDIR} without needing to worry about that fuchsia.xml file. This will sync the fuchsia repos at HEAD. The benefit is that we now know we don't have to worry about the IDK / SYSROOT packages being out of sync.

From there, you jiri package 'fuchsia/third_party/clang${platform}' (verbatim - i.e. don't substitute ${platform}) and that gives you the right llvm repo hash to use (see https://github.com/google/ml-compiler-opt/blob/main/docs/demo/demo.md#set-up-the-correct-package-versions). Normally you could try work at llvm HEAD, too, but right now there were some libc++ updates in the llvm repo that fuchsia didn't yet pick up, so if you leave llvm at HEAD, you'll see later some errors when building fuchsia, about std::sort not being found. The hash that the jiri package command above gives you is what the fuchsia build bots use, too.

One thing that may have also caused you trouble: the build system (cmake/ninja) isn't quite hermetic, i.e. the best thing to do after updating the repos is to delete the build dirs (i.e. cd ${FUCHSIA_SRCDIR} && rm -rf out/default and cd ${LLVM_SRCDIR} && rm -rf build. Then re-do creating the build dir for llvm, cmake... all that (and when you get to it, same for fx set).

Thanks for the update. I did what you said, and followed the steps in the updated demo. Both the hashes are in sync. When I
run fx build, it fails in a later stage giving me this error:

[44623/55563] ACTION //garnet/bin/sched:sched.verify(//build/toolchain/fuchsia:x64)
FAILED: gen/garnet/bin/sched/sched.verify
../../prebuilt/third_party/python3/linux-x64/bin/python3.8 -S ../../build/dist/verify_manifest_elf_binaries.py --check-stripped --depfile=gen/garnet/bin/sched/sched.verify.d --stamp=gen/garnet/bin/sched/sched.verify --fini-manifest=obj/garnet/bin/sched/sched_manifest --check-unstripped-files --toolchain-lib-dir=../../../../../../homes/mukher39/scratch/llvm-install/bin/../lib --toolchain-lib-dir=../../prebuilt/third_party/rust/linux-x64/bin/../lib
ERRORS FOUND IN obj/garnet/bin/sched/sched_manifest:
bin/sched missing dependency lib/libc++.so.2
bin/sched missing dependency lib/libc++abi.so.1
Binary bin/sched has interp ld.so.1, lib_dir lib/

I'm using python3.8.

The python version shouldn't matter... (and 3.8 is good). The libc++ part is odd though. Let's do a few things to try narrow it down.

could you, in fuchsia:

rm -rf out/default
fx set core.x64 \
  --args=clang_embed_bitcode=true \
  --args='optimize="size"' \
  --args='clang_ml_inliner=false'
fx build

Let's see first if fuchsia, built with its toolchain, goes well.

Then, could you paste the output from jiri package 'fuchsia/third_party/clang/${platform}' and then, cd ${LLVM_SRCDIR} and git status and paste the output there?

btw, this may be better / easier to do interactively, if you want, ping me on chat (google meet) - same alias I use here, at google.com.

it worked! I think I was doing jiri update -gc before fx set, hence it was giving the error. I couldn't have managed to resolve this without your constant help. i'd do the rest of the procedure now. If I run into issues, I'll ping you on google meet if that's okay

Doing jiri update -gc before fx set makes sense though - but no need to do it again (i.e. it just updates the fuchsia repo, and now it's updated to head, so let's keep that as-is)

If the above worked, it means the llvm repo is out of sync - all we did above is tell the fuchsia build system not to use the built-in llvm, but that's not what (I assume) you want.

Can you paste those 2 bits of info - the jiri package output and the git status, let's see if anything obvious pops.

jiri package output:

  • package fuchsia/third_party/clang/${platform}
    Path: /homes/mukher39/scratch/fuchsia/prebuilt/third_party/clang/linux-x64
    Version: git_revision:1aa59ff2f789776ebfa2d4b315fd3ea589652b4a
    Manifest: /homes/mukher39/scratch/fuchsia/integration/toolchain
    Platforms: [linux-amd64 linux-arm64 mac-amd64 windows-amd64]

llvm:

HEAD detached at 1aa59ff2f789

ah... do you see a nothing to commit, working tree clean line? Mine looks like this:

$ git status
HEAD detached at 1aa59ff2f789
nothing to commit, working tree clean

$

Yes, it does give that. This is the entire output:

HEAD detached at 1aa59ff2f789

It took 29.24 seconds to enumerate untracked files. 'status -uno'
may speed it up, but you have to be careful not to forget to add
new files yourself (see 'git help status').
nothing to commit, working tree clean

hmm... could you do a git status -u mine does the same as before:

$ git status -u
HEAD detached at 1aa59ff2f789
nothing to commit, working tree clean

$

Now, if that starts spewing a ton of stuff, do a git clean -df to delete the untracked files.

it gives the same result as above

OK, so I understand where we are: did you re-do the demo:

Yes this is resolved. Hence, I deleted the comment.

But module_paths is empty in my case, hence it gives 0 of 0 modules succeeded. I'd like to point out that fx compdb doesn't work. It shows:
DEPRECATED: fx now automatically creates and tracks one's compilation database; this command will soon be deleted.
I don't believe this would cause module_paths to be empty though

nvm, I think i missed the --clang-prefix while doinf fx set. I'm redoing the build again. I was trying exactly this:

rm -rf out/default
fx set core.x64
--args=clang_embed_bitcode=true
--args='optimize="size"'
--args='clang_ml_inliner=false'
fx build

OK, that is a separate problem. for the modules_path, replace the -Oz with -Os in the call to extract_ir.py (I'll fix this asap in the doc, forgot it it seems)

(Background: that's just a filter, and Fuchsia recently moved to use -Os rather than -Oz.)

But that aside, yes, we need to: wipe out the build dir under $LLVM_SRCDIR, and re-do https://github.com/google/ml-compiler-opt/blob/main/docs/demo/demo.md#build-llvm, then re-build fuchsia (delete out/default to be sure)

I did the steps, i.e., removed the build folder, built it again, then removed out/default in fuchsia and ran fx set with --clang-prefix (which I didn't do previously, when it ran successfully). It gives me this:

FAILED: gen/garnet/bin/sched/sched.verify
../../prebuilt/third_party/python3/linux-x64/bin/python3.8 -S ../../build/dist/verify_manifest_elf_binaries.py --check-stripped --depfile=gen/garnet/bin/sched/sched.verify.d --stamp=gen/garnet/bin/sched/sched.verify --fini-manifest=obj/garnet/bin/sched/sched_manifest --check-unstripped-files --toolchain-lib-dir=../../../../../../homes/mukher39/scratch/llvm-install/bin/../lib --toolchain-lib-dir=../../prebuilt/third_party/rust/linux-x64/bin/../lib
ERRORS FOUND IN obj/garnet/bin/sched/sched_manifest:
bin/sched missing dependency lib/libc++.so.2
bin/sched missing dependency lib/libc++abi.so.1
Binary bin/sched has interp ld.so.1, lib_dir lib/
[23572/34490] RUST bt_gap exe.unstripped/bt_gap

I installed lld from the present git version in llvm_project and tried the same steps again. Now I get this in fx set:
[23541/34490] ACTION //garnet/bin/ktrace_provider:ktrace_provider.verify(//build/toolchain/fuchsia:x64)
FAILED: gen/garnet/bin/ktrace_provider/ktrace_provider.verify
../../prebuilt/third_party/python3/linux-x64/bin/python3.8 -S ../../build/dist/verify_manifest_elf_binaries.py --check-stripped --depfile=gen/garnet/bin/ktrace_provider/ktrace_provider.verify.d --stamp=gen/garnet/bin/ktrace_provider/ktrace_provider.verify --fini-manifest=obj/garnet/bin/ktrace_provider/ktrace_provider_manifest --check-unstripped-files --toolchain-lib-dir=../../../../../../homes/mukher39/scratch/llvm-install/bin/../lib --toolchain-lib-dir=../../prebuilt/third_party/rust/linux-x64/bin/../lib
ERRORS FOUND IN obj/garnet/bin/ktrace_provider/ktrace_provider_manifest:
bin/ktrace_provider missing dependency lib/libc++.so.2
bin/ktrace_provider missing dependency lib/libc++abi.so.1
Binary bin/ktrace_provider has interp ld.so.1, lib_dir lib/
[23547/34490] ACTION //garnet/bin/sched:sched.verify(//build/toolchain/fuchsia:x64)
FAILED: gen/garnet/bin/sched/sched.verify
../../prebuilt/third_party/python3/linux-x64/bin/python3.8 -S ../../build/dist/verify_manifest_elf_binaries.py --check-stripped --depfile=gen/garnet/bin/sched/sched.verify.d --stamp=gen/garnet/bin/sched/sched.verify --fini-manifest=obj/garnet/bin/sched/sched_manifest --check-unstripped-files --toolchain-lib-dir=../../../../../../homes/mukher39/scratch/llvm-install/bin/../lib --toolchain-lib-dir=../../prebuilt/third_party/rust/linux-x64/bin/../lib
ERRORS FOUND IN obj/garnet/bin/sched/sched_manifest:
bin/sched missing dependency lib/libc++.so.2
bin/sched missing dependency lib/libc++abi.so.1
Binary bin/sched has interp ld.so.1, lib_dir lib/
[23572/34490] RUST bt_gap exe.unstripped/bt_gap
ninja: build stopped: subcommand failed.v

Hmm... try deleting your $LLVM_INSTALLDIR and re-doing the install step - i.e. (from https://github.com/google/ml-compiler-opt/blob/main/docs/demo/demo.md#build-llvm):

DESTDIR=${LLVM_INSTALLDIR} ninja install-distribution-stripped
cd ${FUCHSIA_SRCDIR}
python scripts/clang/generate_runtimes.py --clang-prefix=$LLVM_INSTALLDIR --sdk-dir=$IDK_DIR --build-id-dir=$LLVM_INSTALLDIR/lib/.build-id > $LLVM_INSTALLDIR/lib/runtime.json

It gives the same error:

[44474/55563] ACTION //garnet/bin/sched:sched.verify(//build/toolchain/fuchsia:x64)
FAILED: gen/garnet/bin/sched/sched.verify
../../prebuilt/third_party/python3/linux-x64/bin/python3.8 -S ../../build/dist/verify_manifest_elf_binaries.py --check-stripped --depfile=gen/garnet/bin/sched/sched.verify.d --stamp=gen/garnet/bin/sched/sched.verify --fini-manifest=obj/garnet/bin/sched/sched_manifest --check-unstripped-files --toolchain-lib-dir=../../../../../../homes/mukher39/scratch/llvm-install/bin/../lib --toolchain-lib-dir=../../prebuilt/third_party/rust/linux-x64/bin/../lib
ERRORS FOUND IN obj/garnet/bin/sched/sched_manifest:
bin/sched missing dependency lib/libc++.so.2
bin/sched missing dependency lib/libc++abi.so.1
Binary bin/sched has interp ld.so.1, lib_dir lib/
[44505/55563] RUST bt_gap exe.unstripped/bt_gap
ninja: build stopped: subcommand failed.

Huh. OK, can you do (from ${FUCHSIA_SRCDIR}) a jiri snapshot /tmp/snapshot.xml, then send me that xml file - maybe as a gist or something - I'll try to repro your setup locally.

It is a very big file. I don't know what an apt gist maybe. But this is the entire file
snapshot.xml.zip

Thanks - just off chance, are you doing this on a linux box, or a mac box?

Huh... ok, so that's not the problem then (i.e. we're also on linux - a debian distro, fwiw, but I doubt that's it)

Well, "it works on my machine"... let's look at something, what's your out/default/obj/garnet/bin/sched/sched_manifest look like?

Mine:

bin/sched=sched
lib/ld.so.1=user.libc_x64/libc.so
lib/libasync-default.so=x64-shared/libasync-default.so
lib/libbackend_fuchsia_globals.so=x64-shared/libbackend_fuchsia_globals.so
lib/libc++.so.2=../../../llvm-install/lib/.build-id/a6/330277eae2394d
lib/libc++abi.so.1=../../../llvm-install/lib/.build-id/fe/357bd2932b67f1
lib/libfdio.so=x64-shared/libfdio.so
lib/libsyslog.so=x64-shared/libsyslog.so
lib/libunwind.so.1=../../../llvm-install/lib/.build-id/a2/be6e87187bf4d1
meta/package=gen/garnet/bin/sched/sched_meta_package.txt

It gives me this:

bin/sched=sched
lib/ld.so.1=user.libc_x64/libc.so
lib/libasync-default.so=x64-shared/libasync-default.so
lib/libbackend_fuchsia_globals.so=x64-shared/libbackend_fuchsia_globals.so
lib/libfdio.so=x64-shared/libfdio.so
lib/libsyslog.so=x64-shared/libsyslog.so
meta/package=gen/garnet/bin/sched/sched_meta_package.txt

I see i don't have entries for libc++ which is present in your output.

Ah! we're getting somewhere (hopefully) :)

OK, umm... you just git clone llvm-project, right? like so:

git clone git@github.com:llvm/llvm-project.git

Should I delete my present llvm-project repo then?

well... so we understand first where the difference comes from, did you clone it differently?

I just followed the link posted in the demo. Precisely this: git clone https://github.com/llvm/llvm-project.git. Then I checked out with the hash I get from jiri.

ok, so that should be fine. So there must be something about the way this file is set up maybe, let me get our Fuchsia colleagues to take a look

and when you did this step: https://github.com/google/ml-compiler-opt/blob/main/docs/demo/demo.md#build-llvm

did you make sure to run the last line after the ninja install-distribution-stripped step?

You mean this step right: python scripts/clang/generate_runtimes.py --clang-prefix=$LLVM_INSTALLDIR --sdk-dir=$IDK_DIR --build-id-dir=$LLVM_INSTALLDIR/lib/.build-id > $LLVM_INSTALLDIR/lib/runtime.json ?
Yes, I did this.

ok... huh. just in case, can you delete $LLVM_INSTALLDIR/lib/runtime.json and redo (just in case readonly flags and quiet error messages)

redo from the python step? Right after inja install-distribution-stripped?

right. maybe save the old one, redo it, then diff the 2?

by "the old one" I mean the runtime.json file - i.e. if they are not different, no point in rebuilding fuchsia after that :)

They are not different. I checked using the diff command. This is the op:

[ { "cflags": [], "ldflags": [], "runtime": [], "target": [ "x86_64-unknown-fuchsia" ] }, { "cflags": [], "ldflags": [ "-static-libstdc++" ], "runtime": [], "target": [ "x86_64-unknown-fuchsia" ] }, { "cflags": [ "-fsanitize=address" ], "ldflags": [], "runtime": [], "target": [ "x86_64-unknown-fuchsia" ] }, { "cflags": [ "-fsanitize=address" ], "ldflags": [ "-static-libstdc++" ], "runtime": [], "target": [ "x86_64-unknown-fuchsia" ] }, { "cflags": [ "-fsanitize=undefined" ], "ldflags": [], "runtime": [], "target": [ "x86_64-unknown-fuchsia" ] }, { "cflags": [ "-fsanitize=undefined" ], "ldflags": [ "-static-libstdc++" ], "runtime": [], "target": [ "x86_64-unknown-fuchsia" ] }, { "cflags": [], "ldflags": [], "runtime": [], "target": [ "aarch64-unknown-fuchsia" ] }, { "cflags": [], "ldflags": [ "-static-libstdc++" ], "runtime": [], "target": [ "aarch64-unknown-fuchsia" ] }, { "cflags": [ "-fsanitize=address" ], "ldflags": [], "runtime": [], "target": [ "aarch64-unknown-fuchsia" ] }, { "cflags": [ "-fsanitize=address" ], "ldflags": [ "-static-libstdc++" ], "runtime": [], "target": [ "aarch64-unknown-fuchsia" ] }, { "cflags": [ "-fsanitize=undefined" ], "ldflags": [], "runtime": [], "target": [ "aarch64-unknown-fuchsia" ] }, { "cflags": [ "-fsanitize=undefined" ], "ldflags": [ "-static-libstdc++" ], "runtime": [], "target": [ "aarch64-unknown-fuchsia" ] } ]

Oh! ok. Maybe that's the problem - it shouldn't be empty values. Mine looks like:


  {
    "cflags": [],
    "ldflags": [],
    "runtime": [
      {
        "debug": ".build-id/a6/330277eae2394d.debug",
        "dist": ".build-id/a6/330277eae2394d",
        "name": "libc++",
        "soname": "libc++.so.2"
      },
      {
        "debug": ".build-id/a2/be6e87187bf4d1.debug",
        "dist": ".build-id/a2/be6e87187bf4d1",
        "name": "libunwind",
        "soname": "libunwind.so.1"
      },
      {
        "debug": ".build-id/fe/357bd2932b67f1.debug",
        "dist": ".build-id/fe/357bd2932b67f1",
        "name": "libc++abi",
        "soname": "libc++abi.so.1"
      }
    ],
    "target": [
      "x86_64-unknown-fuchsia"
    ]
  },
  {
....

Couple of things coming to mind, when you run the python script that should generate this, are you using python3 (I remember you said you do, but sometimes python points to 2.7 and python3 is v3), also, are all the env vars the demo talks about defined?

it is python3. I just checked python --version and it gave me Python 3.8.5. The variables are set. I believe these are the ones:
$LLVM_SRCDIR is set to /homes/mukher39/scratch/llvm-project
$LLVM_INSTALLDIR is set to /homes/mukher39/scratch/llvm-install
$IDK_DIR is set to /homes/mukher39/scratch/fuchsia-idk
$SYSROOT_DIR is set to /homes/mukher39/scratch/fuchsia-sysroot
$FUCHSIA_SRCDIR is set to /homes/mukher39/scratch/fuchsia

These are all absolute paths. So there shouldn't be a problem in locating them with cd or cp.

Would you mind debugging the python script - I'm thinking this would be the fastest thing, as this seems to be somehow (and we can't figure out how) tied to your environment.

Sure. I'd do that

Thanks - btw, if you find a bug, you can (if you want) patch it to Fuchsia; anyway, at minimum, you get the credit in the patch for finding it, of course!

Debugging the python file helped. I saw the json file, apparently nothing got added in runtime. As per my understanding, If you look at generate_runtimes, you'd see they generate the absolute path. In my case, I already provided absolute paths, but the function trace_link generated the libraries and then the absolute paths took the shape of /u/scratch1/...rest. This was causing the library path to not match with the clang path. I modified the code to match with llvm-install and it generated the correct json file. I ran the fx set and build as per demo, and now it builds successfully!
I don't know if this is a system specific issue or something else, as the code is logically correct. Anyway, thank you so so much. I couldn't have figured this out without your constant support. I'd proceed to the next steps now.

Awesome! Happy it worked!

Could you share the diff to what you ended up doing, maybe it'd help the Fuchsia folks harden the script?

One other good outcome of this, they are looking to add some error messaging if the json ends up empty - as that would have helped us a ton in this case.

Thanks so much for your patience and diligence - let us know how things are going!

For generate_runtimes .py, I just changed line 186 into if not (clang_install in lib_path) instead of starts with, where clang_install is the last component of $LLVM_INSTALLDIR (/homes/mukher39/scratch/llvm-install), i.e., llvm-install in my case.

I do the extract corpus step using filter -Os (-Oz fails to generate any file). But even with -Os, it says Converted 11194 files out of 15864.

Next, I proceeded with the training step, i.e.,

rm -rf $DEFAULT_TRACE &&
  PYTHONPATH=$PYTHONPATH:. python3 \
    compiler_opt/tools/generate_default_trace.py \
    --data_path=$CORPUS \
    --output_path=$DEFAULT_TRACE \
    --compile_task=inlining \
    --clang_path=$LLVM_INSTALLDIR/bin/clang \
    --llvm_size_path=$LLVM_INSTALLDIR/bin/llvm-size \
    --sampling_rate=0.2

Unfortunately, none of the modules succeed. I get this:

clang (LLVM option parsing): Unknown command line argument '-training-log=/tmp/tmpd3opwr3m/log'.  Try: 'clang (LLVM   option parsing) --help'
clang (LLVM option parsing): Did you mean '--print-bfi=/tmp/tmpd3opwr3m/log'?
clang (LLVM option parsing): Unknown command line argument '-training-log=/tmp/tmpjjyuytpx/log'.  Try: 'clang (LLVM option parsing) --help'
clang (LLVM option parsing): Did you mean '--print-bfi=/tmp/tmpjjyuytpx/log'?
clang (LLVM option parsing): Unknown command line argument '-training-log=/tmp/tmpritwau25/log'.  Try: 'clang (LLVM option parsing) --help'
clang (LLVM option parsing): Did you mean '--print-bfi=/tmp/tmpritwau25/log'?
clang (LLVM option parsing): Unknown command line argument '-training-log=/tmp/tmpy60nt8g2/log'.  Try: 'clang (LLVM option parsing) --help'
clang (LLVM option parsing): Did you mean '--print-bfi=/tmp/tmpy60nt8g2/log'?
E0303 00:03:37.625085 140497752479552 generate_default_trace.py:73] Failed to compile ('/homes/mukher39/scratch/corpus/x64-shared/obj/third_party/curl/lib/libcurl.krb5.c.o.bc', '/homes/mukher39/scratch/corpus/x64-shared/obj/third_party/curl/lib/libcurl.krb5.c.o.cmd').
E0303 00:03:37.625708 140497752479552 generate_default_trace.py:73] Failed to compile ('/homes/mukher39/scratch/corpus/host_x64/obj/third_party/curl/lib/libcurl.strtok.c.o.bc', '/homes/mukher39/scratch/corpus/host_x64/obj/third_party/curl/lib/libcurl.strtok.c.o.cmd').
E0303 00:03:37.627145 140497752479552 generate_default_trace.py:73] Failed to compile ('/homes/mukher39/scratch/corpus/host_x64/obj/third_party/curl/lib/libcurl.altsvc.c.o.bc', '/homes/mukher39/scratch/corpus/host_x64/obj/third_party/curl/lib/libcurl.altsvc.c.o.cmd').
E0303 00:03:37.628262 140497752479552 generate_default_trace.py:73] Failed to compile ('/homes/mukher39/scratch/corpus/host_x64/obj/third_party/curl/lib/libcurl.idn_win32.c.o.bc', '/homes/mukher39/scratch/corpus/host_x64/obj/third_party/curl/lib/libcurl.idn_win32.c.o.cmd').
E0303 00:03:37.629735 140497752479552 generate_default_trace.py:73] Failed to compile ('/homes/mukher39/scratch/corpus/x64-shared/obj/third_party/curl/lib/libcurl.curl_sspi.c.o.bc', '/homes/mukher39/scratch/corpus/x64-shared/obj/third_party/curl/lib/libcurl.curl_sspi.c.o.cmd').
E0303 00:03:37.630831 140497752479552 generate_default_trace.py:73] Failed to compile ('/homes/mukher39/scratch/corpus/x64-shared/obj/third_party/curl/lib/libcurl.inet_ntop.c.o.bc', '/homes/mukher39/scratch/corpus/x64-shared/obj/third_party/curl/lib/libcurl.inet_ntop.c.o.cmd').
E0303 00:03:37.634116 140497752479552 generate_default_trace.py:73] Failed to compile ('/homes/mukher39/scratch/corpus/x64-shared/obj/third_party/curl/lib/vtls/libcurl.nss.c.o.bc', '/homes/mukher39/scratch/corpus/x64-shared/obj/third_party/curl/lib/vtls/libcurl.nss.c.o.cmd').
E0303 00:03:37.634639 140497752479552 generate_default_trace.py:73] Failed to compile ('/homes/mukher39/scratch/corpus/host_x64/obj/third_party/curl/lib/libcurl.hostip4.c.o.bc', '/homes/mukher39/scratch/corpus/host_x64/obj/third_party/curl/lib/libcurl.hostip4.c.o.cmd').
E0303 00:03:37.635804 140497752479552 generate_default_trace.py:73] Failed to compile ('/homes/mukher39/scratch/corpus/host_x64/obj/third_party/curl/lib/libcurl.x509asn1.c.o.bc', '/homes/mukher39/scratch/corpus/host_x64/obj/third_party/curl/lib/libcurl.x509asn1.c.o.cmd').
E0303 00:03:37.636985 140497752479552 generate_default_trace.py:73] Failed to compile ('/homes/mukher39/scratch/corpus/host_x64/obj/third_party/curl/lib/libcurl.curl_sspi.c.o.bc', '/homes/mukher39/scratch/corpus/host_x64/obj/third_party/curl/lib/libcurl.curl_sspi.c.o.cmd').
E0303 00:03:37.637018 140497752479552 generate_default_trace.py:73] Failed to compile ('/homes/mukher39/scratch/corpus/host_x64/obj/third_party/curl/lib/libcurl.hostasyn.c.o.bc', '/homes/mukher39/scratch/corpus/host_x64/obj/third_party/curl/lib/libcurl.hostasyn.c.o.cmd').
E0303 00:03:37.639245 140497752479552 generate_default_trace.py:73] Failed to compile ('/homes/mukher39/scratch/corpus/x64-shared/obj/third_party/curl/lib/libcurl.curl_des.c.o.bc', '/homes/mukher39/scratch/corpus/x64-shared/obj/third_party/curl/lib/libcurl.curl_des.c.o.cmd').
0 of 2238 modules succeeded.

I looked at generate_default_trace.py line 73, it is an exception under the worker function.

For generate_runtimes .py, I just changed line 186 into if not (clang_install in lib_path) instead of starts with, where clang_install is the last component of $LLVM_INSTALLDIR (/homes/mukher39/scratch/llvm-install), i.e., llvm-install in my case.

I do the extract corpus step using filter -Os (-Oz fails to generate any file). But even with -Os, it says Converted 11194 files out of 15864.

That's expected - not all Fuchsia modules are built -Oz. I updated the doc but forgot to push it, fixed now.

Next, I proceeded with the training step, i.e.,

rm -rf $DEFAULT_TRACE &&
  PYTHONPATH=$PYTHONPATH:. python3 \
    compiler_opt/tools/generate_default_trace.py \
    --data_path=$CORPUS \
    --output_path=$DEFAULT_TRACE \
    --compile_task=inlining \
    --clang_path=$LLVM_INSTALLDIR/bin/clang \
    --llvm_size_path=$LLVM_INSTALLDIR/bin/llvm-size \
    --sampling_rate=0.2

Unfortunately, none of the modules succeed. I get this:

That looks like your clang doesn't understand the -training-log flag, which means it wasn't built with the tensorflow support. Can you check you built llvm correctly - this step. To be sure, delete the build, and re-do the mkdir build && cd build && cmake... steps, then just ninja distribution and the DESTDIR=$LLVM_INSTALLDIR ninja install-distribution-stripped - because the generate_runtimes.py step placed a file already (you can check at the end that your json file is still there).

BTW, if you get some linker error when building llvm, pass -DLLVM_ENABLE_LLD=Off right after the Fuchsia-stage2.cmake part.

clang (LLVM option parsing): Unknown command line argument '-training-log=/tmp/tmpd3opwr3m/log'.  Try: 'clang (LLVM   option parsing) --help'
clang (LLVM option parsing): Did you mean '--print-bfi=/tmp/tmpd3opwr3m/log'?
clang (LLVM option parsing): Unknown command line argument '-training-log=/tmp/tmpjjyuytpx/log'.  Try: 'clang (LLVM option parsing) --help'
clang (LLVM option parsing): Did you mean '--print-bfi=/tmp/tmpjjyuytpx/log'?
clang (LLVM option parsing): Unknown command line argument '-training-log=/tmp/tmpritwau25/log'.  Try: 'clang (LLVM option parsing) --help'
clang (LLVM option parsing): Did you mean '--print-bfi=/tmp/tmpritwau25/log'?
clang (LLVM option parsing): Unknown command line argument '-training-log=/tmp/tmpy60nt8g2/log'.  Try: 'clang (LLVM option parsing) --help'
clang (LLVM option parsing): Did you mean '--print-bfi=/tmp/tmpy60nt8g2/log'?
E0303 00:03:37.625085 140497752479552 generate_default_trace.py:73] Failed to compile ('/homes/mukher39/scratch/corpus/x64-shared/obj/third_party/curl/lib/libcurl.krb5.c.o.bc', '/homes/mukher39/scratch/corpus/x64-shared/obj/third_party/curl/lib/libcurl.krb5.c.o.cmd').
E0303 00:03:37.625708 140497752479552 generate_default_trace.py:73] Failed to compile ('/homes/mukher39/scratch/corpus/host_x64/obj/third_party/curl/lib/libcurl.strtok.c.o.bc', '/homes/mukher39/scratch/corpus/host_x64/obj/third_party/curl/lib/libcurl.strtok.c.o.cmd').
E0303 00:03:37.627145 140497752479552 generate_default_trace.py:73] Failed to compile ('/homes/mukher39/scratch/corpus/host_x64/obj/third_party/curl/lib/libcurl.altsvc.c.o.bc', '/homes/mukher39/scratch/corpus/host_x64/obj/third_party/curl/lib/libcurl.altsvc.c.o.cmd').
E0303 00:03:37.628262 140497752479552 generate_default_trace.py:73] Failed to compile ('/homes/mukher39/scratch/corpus/host_x64/obj/third_party/curl/lib/libcurl.idn_win32.c.o.bc', '/homes/mukher39/scratch/corpus/host_x64/obj/third_party/curl/lib/libcurl.idn_win32.c.o.cmd').
E0303 00:03:37.629735 140497752479552 generate_default_trace.py:73] Failed to compile ('/homes/mukher39/scratch/corpus/x64-shared/obj/third_party/curl/lib/libcurl.curl_sspi.c.o.bc', '/homes/mukher39/scratch/corpus/x64-shared/obj/third_party/curl/lib/libcurl.curl_sspi.c.o.cmd').
E0303 00:03:37.630831 140497752479552 generate_default_trace.py:73] Failed to compile ('/homes/mukher39/scratch/corpus/x64-shared/obj/third_party/curl/lib/libcurl.inet_ntop.c.o.bc', '/homes/mukher39/scratch/corpus/x64-shared/obj/third_party/curl/lib/libcurl.inet_ntop.c.o.cmd').
E0303 00:03:37.634116 140497752479552 generate_default_trace.py:73] Failed to compile ('/homes/mukher39/scratch/corpus/x64-shared/obj/third_party/curl/lib/vtls/libcurl.nss.c.o.bc', '/homes/mukher39/scratch/corpus/x64-shared/obj/third_party/curl/lib/vtls/libcurl.nss.c.o.cmd').
E0303 00:03:37.634639 140497752479552 generate_default_trace.py:73] Failed to compile ('/homes/mukher39/scratch/corpus/host_x64/obj/third_party/curl/lib/libcurl.hostip4.c.o.bc', '/homes/mukher39/scratch/corpus/host_x64/obj/third_party/curl/lib/libcurl.hostip4.c.o.cmd').
E0303 00:03:37.635804 140497752479552 generate_default_trace.py:73] Failed to compile ('/homes/mukher39/scratch/corpus/host_x64/obj/third_party/curl/lib/libcurl.x509asn1.c.o.bc', '/homes/mukher39/scratch/corpus/host_x64/obj/third_party/curl/lib/libcurl.x509asn1.c.o.cmd').
E0303 00:03:37.636985 140497752479552 generate_default_trace.py:73] Failed to compile ('/homes/mukher39/scratch/corpus/host_x64/obj/third_party/curl/lib/libcurl.curl_sspi.c.o.bc', '/homes/mukher39/scratch/corpus/host_x64/obj/third_party/curl/lib/libcurl.curl_sspi.c.o.cmd').
E0303 00:03:37.637018 140497752479552 generate_default_trace.py:73] Failed to compile ('/homes/mukher39/scratch/corpus/host_x64/obj/third_party/curl/lib/libcurl.hostasyn.c.o.bc', '/homes/mukher39/scratch/corpus/host_x64/obj/third_party/curl/lib/libcurl.hostasyn.c.o.cmd').
E0303 00:03:37.639245 140497752479552 generate_default_trace.py:73] Failed to compile ('/homes/mukher39/scratch/corpus/x64-shared/obj/third_party/curl/lib/libcurl.curl_des.c.o.bc', '/homes/mukher39/scratch/corpus/x64-shared/obj/third_party/curl/lib/libcurl.curl_des.c.o.cmd').
0 of 2238 modules succeeded.

I looked at generate_default_trace.py line 73, it is an exception under the worker function.

I managed to resolve the issue. Apparently TENSORFLOW_C_LIB path wasn't set correctly on my end. I also downloaded latest fuchsia, checked out llvm with the new hash (not that it wouldn't work with the previous hash - just deleted and re did everything). I managed to extract the corpus and train (and generate the default trace). The training step was also throwing some error on the writing step in default_trace. However, I resolved that by cloning the latest commit of MLGO. Now, when I try to train anew model (this step: https://github.com/google/ml-compiler-opt/blob/main/docs/demo/demo.md#train-a-new-model), i.e., the warmstart model, I get this error:

Traceback (most recent call last):
  File "compiler_opt/rl/train_bc.py", line 102, in <module>
    app.run(main)
  File "/homes/mukher39/.local/lib/python3.8/site-packages/absl/app.py", line 303, in run
    _run_main(main, args)
  File "/homes/mukher39/.local/lib/python3.8/site-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "compiler_opt/rl/train_bc.py", line 98, in main
    train_eval()
  File "/homes/mukher39/.local/lib/python3.8/site-packages/gin/config.py", line 1605, in gin_wrapper
    utils.augment_exception_message_and_reraise(e, err_str)
  File "/homes/mukher39/.local/lib/python3.8/site-packages/gin/utils.py", line 41, in augment_exception_message_and_reraise
    raise proxy.with_traceback(exception.__traceback__) from None
  File "/homes/mukher39/.local/lib/python3.8/site-packages/gin/config.py", line 1582, in gin_wrapper
    return fn(*new_args, **new_kwargs)
  File "compiler_opt/rl/train_bc.py", line 64, in train_eval
    tf_agent = agent_creators.create_agent(agent_name, time_step_spec,
  File "/homes/mukher39/.local/lib/python3.8/site-packages/gin/config.py", line 1605, in gin_wrapper
    utils.augment_exception_message_and_reraise(e, err_str)
  File "/homes/mukher39/.local/lib/python3.8/site-packages/gin/utils.py", line 41, in augment_exception_message_and_reraise
    raise proxy.with_traceback(exception.__traceback__) from None
  File "/homes/mukher39/.local/lib/python3.8/site-packages/gin/config.py", line 1582, in gin_wrapper
    return fn(*new_args, **new_kwargs)
  File "/u/scratch1/mukher39/ml-compiler-opt/compiler_opt/rl/agent_creators.py", line 112, in create_agent
    preprocessing_layers = tf.nest.map_structure(
  File "/homes/mukher39/.local/lib/python3.8/site-packages/tensorflow/python/util/nest.py", line 869, in map_structure
    structure[0], [func(*x) for x in entries],
  File "/homes/mukher39/.local/lib/python3.8/site-packages/tensorflow/python/util/nest.py", line 869, in <listcomp>
    structure[0], [func(*x) for x in entries],
  File "/u/scratch1/mukher39/ml-compiler-opt/compiler_opt/rl/inlining/config.py", line 95, in observation_processing_layer
    quantile = quantile_map[obs_spec.name]
KeyError: 'call_argument_setup'
  In call to configurable 'create_agent' (<function create_agent at 0x7fb513ddfc10>)
  In call to configurable 'train_eval' (<function train_eval at 0x7fb513b1cdc0>)

I managed to resolve the issue. Apparently TENSORFLOW_C_LIB path wasn't set correctly on my end. I also downloaded latest fuchsia, checked out llvm with the new hash (not that it wouldn't work with the previous hash - just deleted and re did everything). I managed to extract the corpus and train (and generate the default trace). The training step was also throwing some error on the writing step in default_trace. However, I resolved that by cloning the latest commit of MLGO.

Awesome!

Now, when I try to train anew model (this step: https://github.com/google/ml-compiler-opt/blob/main/docs/demo/demo.md#train-a-new-model), i.e., the warmstart model, I get this error:

That seems to look like the vocab is missing, call_argument_setup is a feature. Did you collect a vocab? Adding @kshiteejm @yundiqian.

Traceback (most recent call last):
  File "compiler_opt/rl/train_bc.py", line 102, in <module>
    app.run(main)
  File "/homes/mukher39/.local/lib/python3.8/site-packages/absl/app.py", line 303, in run
    _run_main(main, args)
  File "/homes/mukher39/.local/lib/python3.8/site-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "compiler_opt/rl/train_bc.py", line 98, in main
    train_eval()
  File "/homes/mukher39/.local/lib/python3.8/site-packages/gin/config.py", line 1605, in gin_wrapper
    utils.augment_exception_message_and_reraise(e, err_str)
  File "/homes/mukher39/.local/lib/python3.8/site-packages/gin/utils.py", line 41, in augment_exception_message_and_reraise
    raise proxy.with_traceback(exception.__traceback__) from None
  File "/homes/mukher39/.local/lib/python3.8/site-packages/gin/config.py", line 1582, in gin_wrapper
    return fn(*new_args, **new_kwargs)
  File "compiler_opt/rl/train_bc.py", line 64, in train_eval
    tf_agent = agent_creators.create_agent(agent_name, time_step_spec,
  File "/homes/mukher39/.local/lib/python3.8/site-packages/gin/config.py", line 1605, in gin_wrapper
    utils.augment_exception_message_and_reraise(e, err_str)
  File "/homes/mukher39/.local/lib/python3.8/site-packages/gin/utils.py", line 41, in augment_exception_message_and_reraise
    raise proxy.with_traceback(exception.__traceback__) from None
  File "/homes/mukher39/.local/lib/python3.8/site-packages/gin/config.py", line 1582, in gin_wrapper
    return fn(*new_args, **new_kwargs)
  File "/u/scratch1/mukher39/ml-compiler-opt/compiler_opt/rl/agent_creators.py", line 112, in create_agent
    preprocessing_layers = tf.nest.map_structure(
  File "/homes/mukher39/.local/lib/python3.8/site-packages/tensorflow/python/util/nest.py", line 869, in map_structure
    structure[0], [func(*x) for x in entries],
  File "/homes/mukher39/.local/lib/python3.8/site-packages/tensorflow/python/util/nest.py", line 869, in <listcomp>
    structure[0], [func(*x) for x in entries],
  File "/u/scratch1/mukher39/ml-compiler-opt/compiler_opt/rl/inlining/config.py", line 95, in observation_processing_layer
    quantile = quantile_map[obs_spec.name]
KeyError: 'call_argument_setup'
  In call to configurable 'create_agent' (<function create_agent at 0x7fb513ddfc10>)
  In call to configurable 'train_eval' (<function train_eval at 0x7fb513b1cdc0>)

I did the vocab step:

rm -rf $DEFAULT_VOCAB &&
  PYTHONPATH=$PYTHONPATH:. python3 \
    compiler_opt/tools/sparse_bucket_generator.py \
    --input=$DEFAULT_TRACE \
    --output_dir=$DEFAULT_VOCAB

but it seems that no directory was created. $DEFAULT_TRACE is a huge file that was generated by training. My DEFAULT_VOCAB is set correctly.

the result of having had run sparse_bucket_generator.py is that it repopulates compiler_opt/rl/inlining/vocab. If you git status in the ml-compiler-opt repo, does it report changes there?

Yes. Just that it has deleted all the files under compiler_opt/rl/inlining/vocab. I guess this is the result of rm -rf $DEFAULT_VOCAB. This is the entire op:

        deleted:    compiler_opt/rl/inlining/vocab/call_argument_setup.buckets
        deleted:    compiler_opt/rl/inlining/vocab/call_penalty.buckets
        deleted:    compiler_opt/rl/inlining/vocab/callee_basic_block_count.buckets
        deleted:    compiler_opt/rl/inlining/vocab/callee_conditionally_executed_blocks.buckets
        deleted:    compiler_opt/rl/inlining/vocab/callee_users.buckets
        deleted:    compiler_opt/rl/inlining/vocab/caller_basic_block_count.buckets
        deleted:    compiler_opt/rl/inlining/vocab/caller_conditionally_executed_blocks.buckets
        deleted:    compiler_opt/rl/inlining/vocab/caller_users.buckets
        deleted:    compiler_opt/rl/inlining/vocab/callsite_cost.buckets
        deleted:    compiler_opt/rl/inlining/vocab/callsite_height.buckets
        deleted:    compiler_opt/rl/inlining/vocab/case_cluster_penalty.buckets
        deleted:    compiler_opt/rl/inlining/vocab/cold_cc_penalty.buckets
        deleted:    compiler_opt/rl/inlining/vocab/constant_args.buckets
        deleted:    compiler_opt/rl/inlining/vocab/constant_offset_ptr_args.buckets
        deleted:    compiler_opt/rl/inlining/vocab/cost_estimate.buckets
        deleted:    compiler_opt/rl/inlining/vocab/dead_blocks.buckets
        deleted:    compiler_opt/rl/inlining/vocab/edge_count.buckets
        deleted:    compiler_opt/rl/inlining/vocab/indirect_call_penalty.buckets
        deleted:    compiler_opt/rl/inlining/vocab/is_multiple_blocks.buckets
        deleted:    compiler_opt/rl/inlining/vocab/jump_table_penalty.buckets
        deleted:    compiler_opt/rl/inlining/vocab/last_call_to_static_bonus.buckets
        deleted:    compiler_opt/rl/inlining/vocab/load_elimination.buckets
        deleted:    compiler_opt/rl/inlining/vocab/load_relative_intrinsic.buckets
        deleted:    compiler_opt/rl/inlining/vocab/lowered_call_arg_setup.buckets
        deleted:    compiler_opt/rl/inlining/vocab/nested_inline_cost_estimate.buckets
        deleted:    compiler_opt/rl/inlining/vocab/nested_inlines.buckets
        deleted:    compiler_opt/rl/inlining/vocab/node_count.buckets
        deleted:    compiler_opt/rl/inlining/vocab/nr_ctant_params.buckets
        deleted:    compiler_opt/rl/inlining/vocab/num_loops.buckets
        deleted:    compiler_opt/rl/inlining/vocab/simplified_instructions.buckets
        deleted:    compiler_opt/rl/inlining/vocab/sroa_losses.buckets
        deleted:    compiler_opt/rl/inlining/vocab/sroa_savings.buckets
        deleted:    compiler_opt/rl/inlining/vocab/switch_penalty.buckets
        deleted:    compiler_opt/rl/inlining/vocab/threshold.buckets
        deleted:    compiler_opt/rl/inlining/vocab/unsimplified_common_instructions.buckets

@kshiteejm - it's as if the vocab step did nothing; any ideas?

Thank you for your patience @prasitaGit!

The vocab step is supposed to generate the .buckets files. So it seems that step failed.

To debug this, can you check if $DEFAULT_TRACE is non-empty. My $DEFAULT_TRACE has files of the form: sequence_examples.recordio.* (* expands to a number).

If it does have those files, can you retry the vocab step with the following slightly updated command:

PYTHONPATH=$PYTHONPATH:. python3 \
    compiler_opt/tools/sparse_bucket_generator.py \
    --input=$DEFAULT_TRACE/sequence_examples.recordio.* \
    --output_dir=$DEFAULT_VOCAB

My default_trace is non-empty, but it is a huge file which I believe is not human readable. I don't see anything of the form sequence_examples.recordio.*. These are the first few lines:

§K^V^@^@^@^@^@^LÀ<93>{^R£<97>Y
¬<96>^B
^Vlowered_call_arg_setup^R<90><96>^B
^E^Z^C
^A^@
^E^Z^C
^A^@
^E^Z^C
^A^@
^E^Z^C
^A^@

Can you try this (slightly different from the demo) command:

python3 \
    compiler_opt/tools/sparse_bucket_generator.py \
    --input=$DEFAULT_TRACE/* \
    --output_dir=$DEFAULT_VOCAB

Please let me know if you see any errors and if does not generate *.buckets files.

I tried. It says:
Expected 'tf.Tensor(False, shape=(), dtype=bool)' to be true. Summarized data: b'No files matched pattern: /homes/mukher39/scratch/default_trace/*'. This means that my default_trace wasn't generated correctly.

What happens if you try the default command from the demo, does it give any exception?

python3 \
    compiler_opt/tools/sparse_bucket_generator.py \
    --input=$DEFAULT_TRACE \
    --output_dir=$DEFAULT_VOCAB

If the above command doesn't generate *.buckets files, I'd strongly suggest temporary Option B: you can restore all the *.buckets file from git and skip the vocab step and jump straight to the training step.

We do want to understand why this doesn't work, though. I.e. option B is a temporary unblocker, sure, stress on temporary

I generated the $DEFAULT_TRACE again, this time it is readable. Precisely this was the op:

I0304 16:54:14.780992 140136115586880 generate_default_trace.py:143] 1 success, 0 failed out of 3041
I0304 16:54:25.301459 140136115586880 generate_default_trace.py:143] 16 success, 0 failed out of 3041
I0304 16:54:35.370017 140136115586880 generate_default_trace.py:143] 59 success, 0 failed out of 3041
I0304 16:54:45.531310 140136115586880 generate_default_trace.py:143] 129 success, 0 failed out of 3041
I0304 16:54:55.537342 140136115586880 generate_default_trace.py:143] 237 success, 0 failed out of 3041
I0304 16:55:05.597499 140136115586880 generate_default_trace.py:143] 393 success, 0 failed out of 3041
I0304 16:55:15.601158 140136115586880 generate_default_trace.py:143] 668 success, 0 failed out of 3041
I0304 16:55:25.601731 140136115586880 generate_default_trace.py:143] 1176 success, 0 failed out of 3041
I0304 16:55:35.604851 140136115586880 generate_default_trace.py:143] 2054 success, 7 failed out of 3041
2925 of 3041 modules succeeded.

Also, this is how the first few lines of the file looks like:

^P
^Lsroa_savings^R^@
^X
^Tcase_cluster_penalty^R^@
^\
^Xcallee_basic_block_count^R^@
^V
^Ris_multiple_blocks^R^@
^Q
^Mcallsite_cost^R^@
^\
^Xcaller_basic_block_count^R^@
^]
^Ylast_call_to_static_bonus^R^@
^R
^Nswitch_penalty^R^@
^N

After that I ran the sparse_bucket_generator, and it gave me an error:

File "compiler_opt/tools/sparse_bucket_generator.py", line 63, in _get_feature_info
    feature = feature_list.feature[0]
IndexError: list index (0) out of range

I debugged this a little bit, I see that the feature_list is basically empty, hence error is thrown. I printed out example.feature_lists, it gives me this (I pasted just the first few lines):

feature_list {
  key: "call_argument_setup"
  value {
  }
}
feature_list {
  key: "call_penalty"
  value {
  }
}
feature_list {
  key: "callee_basic_block_count"
  value {
  }
}
feature_list {
  key: "callee_conditionally_executed_blocks"
  value {
  }
}

I can see that the value is empty. I also printed out the example.feature_lists.feature_list.item(), and it gives me basically the keys:

example.feature_lists.feature_list.items()
ItemsView({'cost_estimate': , 'is_multiple_blocks': , 'edge_count': , 'switch_penalty': , 'constant_args': , 'sroa_losses': , 'load_elimination': , 'simplified_instructions': , 'callee_users': , 'inlining_default': , 'node_count': , 'load_relative_intrinsic': , 'constant_offset_ptr_args': , 'inlining_decision': , 'indirect_call_penalty': , 'callee_conditionally_executed_blocks': , 'callsite_height': , 'nested_inline_cost_estimate': , 'reward': , 'dead_blocks': , 'case_cluster_penalty': , 'lowered_call_arg_setup': , 'nested_inlines': , 'sroa_savings': , 'last_call_to_static_bonus': , 'unsimplified_common_instructions': , 'callee_basic_block_count': , 'jump_table_penalty': , 'caller_basic_block_count': , 'threshold': , 'caller_users': , 'call_argument_setup': , 'num_loops': , 'caller_conditionally_executed_blocks': , 'cold_cc_penalty': , 'call_penalty': , 'callsite_cost': , 'nr_ctant_params': })

Any reason why the feature_list/values is empty?

i think the feature_list/values being empty is the root cause for the failure of vacab generation. @mtrofin any idea when this would happen in logging?

Are they all empty?

You mean the entire example.feature_lists? Yes, all are empty.

@prasitaGit We pushed a commit that might fix your issue (ef77833).

Please can you pull the latest commit and retry the vocab step?

also - somewhat orthogonal - but what's your cmd_filter, is it -Os, or the more encompassing "^-O2|-Os|-Oz$"? Just kind of surprised you only have 3K modules total

@kshiteejm I'd check that
@mtrofin It is ^-O2|-Os|-Oz$" as per the demo.
Let me point out though, while running the extract_ir step, I see a lot of errors popping up too. idk if that is expected? Maybe that might be the reason for 3k modules?

Hmm... some errors, yes; let me also do a collection run and see what my stats are (since I have your xml, I'll try exactly that).

Update:

After cloning the latest repo, I'm able to successfully generate vocabs (although modules are around 3k). I am now training the optimized model which is expected to take half a day. Precisely, I'm here:

rm -rf $OUTPUT_DIR && \
  PYTHONPATH=$PYTHONPATH:. python3 \
  compiler_opt/rl/train_locally.py \
  --root_dir=$OUTPUT_DIR \
  --data_path=$CORPUS \
  --clang_path=$LLVM_INSTALLDIR/bin/clang \
  --llvm_size_path=$LLVM_INSTALLDIR/bin/llvm-size \
  --num_modules=100 \
  --gin_files=compiler_opt/rl/inlining/gin_configs/ppo_nn_agent.gin \
  --gin_bindings=train_eval.warmstart_policy_dir=\"$WARMSTART_OUTPUT_DIR/saved_policy\"

Awesome!

So on the corpus extraction front, I'm getting 15196 out of 15864. Aah. I see what's happening, I got the 3K number from your output of generate_default_trace.py above, which has sampling rate of 0.2, so everything makes sense. I've been stress-testing generate_default_trace.py at a point at 1:1 rate, so got myself confused.

BTW, an easy way to doublecheck your corpus is to wc -l $CORPUS - should say something around this 15K number (15196, if we're at the exact fuchsia repo version)

Also, I looked at the errors during corpus collection, it seems the Fuchisa build generates some COFF objects (instead of ELF). The errors are safe to ignore.

May I know the estimate of the running time on CPU? I am presently running on CPU and it has been around 2 days hence. If it takes too long, I'll terminate and switch to GPU.

It takes our Fuchsia colleague 62 hours to train their model. You may estimate the remaining time by checking out the logging.info('step = %d, loss = %g', global_step_val, loss) logging

@yundiqian btw, how come we originally had it at ~0.5-1 day - is the larger feature set the issue? (we should update the demo time estimate then)

It just got done. Mine was run on CPU. It took close to 48hours.

Ack - out of curiosity, how many CPUs does lscpu say you have?

multiple factors: 1) more features, larger batch size, more training steps that naturally makes it take longer; 2) somehow the runtime gap between python3 v.s. blaze run becomes much bigger now

Ack - out of curiosity, how many CPUs does lscpu say you have?

32

My lscpu says 96, so that also explains what's up.

I get this error when I run fx build in this step:

cd ${FUCHSIA_SRCDIR}
fx set core.x64 \
  --args='clang_prefix="/usr/local/google/home/mtrofin/llvm-install-release/bin"' \
  --args='optimize="size"' \
  --args=clang_ml_inliner=true
fx build

Error:

[21776/34693] ACTION //src/connectivity/bluetooth/tools/bt-pairing-tool:bt-pairing-tool.verify(//build/toolchain/fuchsia:x64) FAILED: gen/src/connectivity/bluetooth/tools/bt-pairing-tool/bt-pairing-tool.verify ../../prebuilt/third_party/python3/linux-x64/bin/python3.8 -S ../../build/dist/verify_manifest_elf_binaries.py --check-stripped --depfile=gen/src/connectivity/bluetooth/tools/bt-pairing-tool/bt-pairing-tool.verify.d --stamp=gen/src/connectivity/bluetooth/tools/bt-pairing-tool/bt-pairing-tool.verify --fini-manifest=obj/src/connectivity/bluetooth/tools/bt-pairing-tool/bt-pairing-tool_manifest --check-unstripped-files --toolchain-lib-dir=../../../../../../homes/mukher39/scratch/llvm-install-release/bin/../lib --toolchain-lib-dir=../../prebuilt/third_party/rust/linux-x64/bin/../lib ERRORS FOUND IN obj/src/connectivity/bluetooth/tools/bt-pairing-tool/bt-pairing-tool_manifest: No unstripped file found for ./../../../../../../homes/mukher39/scratch/llvm-install-release/lib/x86_64-unknown-fuchsia/libc++.so.2 No unstripped file found for ./../../../../../../homes/mukher39/scratch/llvm-install-release/lib/x86_64-unknown-fuchsia/libc++abi.so.1 No unstripped file found for ./../../../../../../homes/mukher39/scratch/llvm-install-release/lib/x86_64-unknown-fuchsia/libunwind.so.1 Binary bin/bt-pairing-tool has interp ld.so.1, lib_dir lib/ [21784/34693] ACTION //src/connectivity/network/net-cli:net-cli.verify(//build/toolchain/fuchsia:x64) FAILED: gen/src/connectivity/network/net-cli/net-cli.verify ../../prebuilt/third_party/python3/linux-x64/bin/python3.8 -S ../../build/dist/verify_manifest_elf_binaries.py --check-stripped --depfile=gen/src/connectivity/network/net-cli/net-cli.verify.d --stamp=gen/src/connectivity/network/net-cli/net-cli.verify --fini-manifest=obj/src/connectivity/network/net-cli/net-cli_manifest --check-unstripped-files --toolchain-lib-dir=../../../../../../homes/mukher39/scratch/llvm-install-release/bin/../lib --toolchain-lib-dir=../../prebuilt/third_party/rust/linux-x64/bin/../lib ERRORS FOUND IN obj/src/connectivity/network/net-cli/net-cli_manifest: No unstripped file found for ./../../../../../../homes/mukher39/scratch/llvm-install-release/lib/x86_64-unknown-fuchsia/libc++.so.2 No unstripped file found for ./../../../../../../homes/mukher39/scratch/llvm-install-release/lib/x86_64-unknown-fuchsia/libc++abi.so.1 No unstripped file found for ./../../../../../../homes/mukher39/scratch/llvm-install-release/lib/x86_64-unknown-fuchsia/libunwind.so.1 Binary bin/net has interp ld.so.1, lib_dir lib/ [21785/34693] ACTION //src/connectivity/bluetooth/tools/bt-snoop-cli:bt-snoop-cli.verify(//build/toolchain/fuchsia:x64) FAILED: gen/src/connectivity/bluetooth/tools/bt-snoop-cli/bt-snoop-cli.verify ../../prebuilt/third_party/python3/linux-x64/bin/python3.8 -S ../../build/dist/verify_manifest_elf_binaries.py --check-stripped --depfile=gen/src/connectivity/bluetooth/tools/bt-snoop-cli/bt-snoop-cli.verify.d --stamp=gen/src/connectivity/bluetooth/tools/bt-snoop-cli/bt-snoop-cli.verify --fini-manifest=obj/src/connectivity/bluetooth/tools/bt-snoop-cli/bt-snoop-cli_manifest --check-unstripped-files --toolchain-lib-dir=../../../../../../homes/mukher39/scratch/llvm-install-release/bin/../lib --toolchain-lib-dir=../../prebuilt/third_party/rust/linux-x64/bin/../lib ERRORS FOUND IN obj/src/connectivity/bluetooth/tools/bt-snoop-cli/bt-snoop-cli_manifest: No unstripped file found for ./../../../../../../homes/mukher39/scratch/llvm-install-release/lib/x86_64-unknown-fuchsia/libc++.so.2 No unstripped file found for ./../../../../../../homes/mukher39/scratch/llvm-install-release/lib/x86_64-unknown-fuchsia/libc++abi.so.1 No unstripped file found for ./../../../../../../homes/mukher39/scratch/llvm-install-release/lib/x86_64-unknown-fuchsia/libunwind.so.1 Binary bin/bt-snoop-cli has interp ld.so.1, lib_dir lib/ [21807/34693] CXX obj/src/cobalt/bin/app/lib.cobalt_app.cc.o ninja: build stopped: subcommand failed.

fx set works fine. Before this under build-release, I set -DLLVM_INLINER_MODEL_PATH=$LLVM_SRCDIR/llvm/lib/Analysis/models/inliner while doing cmake -G Ninja...
I checked the ${LLVM_INSTALLDIR_RELEASE}/lib/runtime.json file and it contains entry in the runtime section.

I removed build-release and ${LLVM_INSTALLDIR_RELEASE} and did the steps again (from mkdir build-release), but the result is same. I checked the env. variables. They are ok.

does /usr/local/google/home/mtrofin/llvm-install-release/bin point to your llvm-install? Note the mtrofin in the path

Yes. I have changed that.

Weird that the envvar got expanded on my end, I'll fix that. Thanks @jacob-hegna for noticing.

Ah, I see that in your error now. The error seems to boil down to the fact that the files:

./../../../../../../homes/mukher39/scratch/llvm-install-release/lib/x86_64-unknown-fuchsia/libc++abi.so.1
./../../../../../../homes/mukher39/scratch/llvm-install-release/lib/x86_64-unknown-fuchsia/libc++.so.2
./../../../../../../homes/mukher39/scratch/llvm-install-release/lib/x86_64-unknown-fuchsia/libunwind.so.1

cannot be found by fx. Can you confirm that these exist from the directory that you ran fx set and fx build?

@prasitaGit if you wiped out your llvm install dir, did you run after that python script that creates the .json file?

I think I wiped out LLVM_INSTALLDIR_RELEASE not llvm install dir. In the demo, it has not been mentioned to execute the python code. I just did cp ${FUCHSIA_SRCDIR}/prebuilt/third_party/clang/linux-x64/lib/runtime.json ${LLVM_INSTALLDIR_RELEASE}/lib/runtime.json as per demo.

@jacob-hegna They do. But /homes/mukher39/scratch/llvm-install-release/lib/x86_64-unknown-fuchsia/libc++abi.so.1 is an absolute path. I'm not sure why they are adding those dots. May be that is causing the issue?

That is not the problem. I did ls -lrt for:

ls -lrt ./../../../../../../homes/mukher39/scratch/llvm-install-release/lib/x86_64-unknown-fuchsia/
total 10808
-rw-r--r-- 1 mukher39 mukher39      37 Mar  7 18:20 libc++.so
-rw-r--r-- 1 mukher39 mukher39   57758 Mar  7 18:20 libc++experimental.a
-rw-r--r-- 1 mukher39 mukher39 9739442 Mar  7 18:20 libc++.a
lrwxrwxrwx 1 mukher39 mukher39      13 Mar  7 18:32 libc++.so.2 -> libc++.so.2.0
-rwxr-xr-x 1 mukher39 mukher39  997112 Mar  7 18:32 libc++.so.2.0
lrwxrwxrwx 1 mukher39 mukher39      16 Mar  7 18:32 libc++abi.so.1 -> libc++abi.so.1.0
-rwxr-xr-x 1 mukher39 mukher39  198176 Mar  7 18:32 libc++abi.so.1.0
lrwxrwxrwx 1 mukher39 mukher39      14 Mar  7 18:32 libc++abi.so -> libc++abi.so.1
lrwxrwxrwx 1 mukher39 mukher39      16 Mar  7 18:32 libunwind.so.1 -> libunwind.so.1.0
-rwxr-xr-x 1 mukher39 mukher39   42368 Mar  7 18:32 libunwind.so.1.0
lrwxrwxrwx 1 mukher39 mukher39      14 Mar  7 18:32 libunwind.so -> libunwind.so.1
drwxr-x--- 2 mukher39 mukher39    4096 Mar  7 18:32 asan+noexcept
drwxr-x--- 2 mukher39 mukher39    4096 Mar  7 18:32 noexcept
drwxr-x--- 2 mukher39 mukher39    4096 Mar  7 18:32 asan
drwxr-x--- 2 mukher39 mukher39    4096 Mar  7 18:32 compat

The files are present.

Try deleting your ${LLVM_INSTALLDIR_RELEASE}/lib/runtime.json and re-generating it by running the python script instead - thanks for pointing that one out, fixed the demo, too.

fx builds successfully now. i'm not sure I understand what is the purpose of compare_elf_sizes.py?
When I run it, it is unable to find /tmp/orig_sizes.json and hence doesn't run.

compare_elf_sizes.py takes two .json files with the sizes of all the built objects in fuchsia and diffs them.

If you built fuchsia without the ml inliner at first, then you could copy the elf_sizes.json file to /tmp/orig_sizes.json. Then when you rebuilt with the ml inliner, you would get a new size file that you could use for the diff. This gives you a detailed breakdown of the size reduction achieved by the ml inliner.

compare_elf_sizes.py is a tool the Fuchsia folks wrote that produces a nice A/B report based on these json files. The json files are produced as part of the build, and contain a detailed size report - binary size, just .text size, etc. So earlier on, we copied the .json produced with the default policy to /tmp, and we'd use here. If you rebooted your machine, you probably lost it (because /tmp gets usually wiped)

This isn't essential for training, and also there is a reporting 'bug' or nuance in the tool - it reports all files, regardless how they are built (some are -O3, and not affected by the ml inliner)

Also, if you generate it through the demo, and run the comparison, just pay attention to the .text section. The reason is, the default sizes are off that build we used to create the corpus. That build embeds sections in object files, we then use to collect the corpus. The .text isn't affected, and it's possible that the linker strips those corpus-specific sections - you can check what sections the linked binaries include; but the more expedient thing to do is just look at .text.

okay. Then I guess the demo is complete. I don't think there is anything more to discuss from my end. Thank you guys for the constant and quick responses. Should I close the issue then?

Happy to help, and thanks for your patience - this exercise helped us improve the demo.

(fine to close the issue)

Thanks!