finalitylabs / bellman

zk-SNARK library

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

AMD compiler fails

nginnever opened this issue · comments

This is a long standing issue that arose when our kernel began becoming too complex. The complexity that seemed to be the first place the compiler started failing was the carry logic in the field lib.

Other versions with less intensive carries were working however they are too slow to use in practice from my recollection. @keyvank if you could shed some light on this as well that would be great.

System information

Radeon RX480 GPU

Reproduce issue

git clone https://github.com/finalitylabs/bellman
cd bellman
cargo build --release --features gpu
RUST_LOG=info RUST_BACKTRACE=full cargo test --features gpu-test -- --exact multiexp::gpu_multiexp_consistency --nocapture

or run FFT tests

RUST_LOG=info RUST_BACKTRACE=full cargo test --features gpu-test -- --exact domain::gpu_fft_consistency --nocapture

Error detail:

Rust Log err:
thread 'multiexp::gpu_multiexp_consistency' panicked at 'Cannot initialize kernel!', src/multiexp.rs:435:9
stack backtrace:
   0:     0x5646066107d4 - backtrace::backtrace::libunwind::trace::hfc3c8420b767c4a7
                               at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.40/src/backtrace/libunwind.rs:88
   1:     0x5646066107d4 - backtrace::backtrace::trace_unsynchronized::h0f7f875ce1984f36
                               at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.40/src/backtrace/mod.rs:66
   2:     0x5646066107d4 - std::sys_common::backtrace::_print_fmt::h2cb30e02f5c64651
                               at src/libstd/sys_common/backtrace.rs:77
   3:     0x5646066107d4 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::hee8674bb55243b5b
                               at src/libstd/sys_common/backtrace.rs:61
   4:     0x564606634cdc - core::fmt::write::ha8f532d3fa63f63f
                               at src/libcore/fmt/mod.rs:1028
   5:     0x56460660d3f7 - std::io::Write::write_fmt::h07782cd7d34b2415
                               at src/libstd/io/mod.rs:1412
   6:     0x564606612e5e - std::sys_common::backtrace::_print::hf184caaf275c0426
                               at src/libstd/sys_common/backtrace.rs:65
   7:     0x564606612e5e - std::sys_common::backtrace::print::hdc94e7a72f3c0e60
                               at src/libstd/sys_common/backtrace.rs:50
   8:     0x564606612e5e - std::panicking::default_hook::{{closure}}::h22dc30a2fab88435
                               at src/libstd/panicking.rs:188
   9:     0x564606612b51 - std::panicking::default_hook::h1831f9daeca2fede
                               at src/libstd/panicking.rs:205
  10:     0x56460661355b - std::panicking::rust_panic_with_hook::hafa4d144cdeac0c6
                               at src/libstd/panicking.rs:464
  11:     0x5646065eabd3 - std::panicking::begin_panic::h42586111483ad450
                               at /rustc/1423bec54cf2db283b614e527cfd602b481485d1/src/libstd/panicking.rs:400
  12:     0x56460626d66c - bellperson::multiexp::gpu_multiexp_consistency::h647c585c33c67b18
                               at src/multiexp.rs:435
  13:     0x5646061477ea - bellperson::multiexp::gpu_multiexp_consistency::{{closure}}::h5fa2265cd1115261
                               at src/multiexp.rs:426
  14:     0x564606171e4e - core::ops::function::FnOnce::call_once::h8d06a1df972eb9f9
                               at /rustc/1423bec54cf2db283b614e527cfd602b481485d1/src/libcore/ops/function.rs:227
  15:     0x564606393d9f - <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once::heea8b6fddafd6feb
                               at /rustc/1423bec54cf2db283b614e527cfd602b481485d1/src/liballoc/boxed.rs:942
  16:     0x56460661a64a - __rust_maybe_catch_panic
                               at src/libpanic_unwind/lib.rs:78
  17:     0x5646063af37a - std::panicking::try::hd8eceefd23569e1d
                               at /rustc/1423bec54cf2db283b614e527cfd602b481485d1/src/libstd/panicking.rs:265
  18:     0x5646063af37a - std::panic::catch_unwind::he4644c7014645ca9
                               at /rustc/1423bec54cf2db283b614e527cfd602b481485d1/src/libstd/panic.rs:396
  19:     0x5646063af37a - test::run_test_in_process::hc52fdc1397f9d0e9
                               at src/libtest/lib.rs:570
  20:     0x5646063af37a - test::run_test::run_test_inner::{{closure}}::h3e83a0ab36573a3d
                               at src/libtest/lib.rs:473
  21:     0x564606388606 - std::sys_common::backtrace::__rust_begin_short_backtrace::hfee0690679641f81
                               at /rustc/1423bec54cf2db283b614e527cfd602b481485d1/src/libstd/sys_common/backtrace.rs:129
  22:     0x56460638c9d6 - std::thread::Builder::spawn_unchecked::{{closure}}::{{closure}}::h98c621fa17c375b4
                               at /rustc/1423bec54cf2db283b614e527cfd602b481485d1/src/libstd/thread/mod.rs:469
  23:     0x56460638c9d6 - <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once::hf2b903704fc31bc6
                               at /rustc/1423bec54cf2db283b614e527cfd602b481485d1/src/libstd/panic.rs:317
  24:     0x56460638c9d6 - std::panicking::try::do_call::h371d929c727f84be
                               at /rustc/1423bec54cf2db283b614e527cfd602b481485d1/src/libstd/panicking.rs:287
  25:     0x56460661a64a - __rust_maybe_catch_panic
                               at src/libpanic_unwind/lib.rs:78
  26:     0x56460638d3f6 - std::panicking::try::hafdaac1d743657ed
                               at /rustc/1423bec54cf2db283b614e527cfd602b481485d1/src/libstd/panicking.rs:265
  27:     0x56460638d3f6 - std::panic::catch_unwind::h768cb1b266e375eb
                               at /rustc/1423bec54cf2db283b614e527cfd602b481485d1/src/libstd/panic.rs:396
  28:     0x56460638d3f6 - std::thread::Builder::spawn_unchecked::{{closure}}::hbad959438c643900
                               at /rustc/1423bec54cf2db283b614e527cfd602b481485d1/src/libstd/thread/mod.rs:468
  29:     0x56460638d3f6 - core::ops::function::FnOnce::call_once{{vtable.shim}}::hae56b17b26e13155
                               at /rustc/1423bec54cf2db283b614e527cfd602b481485d1/src/libcore/ops/function.rs:227
  30:     0x56460660740f - <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once::hee4022d6a9be1599
                               at /rustc/1423bec54cf2db283b614e527cfd602b481485d1/src/liballoc/boxed.rs:942
  31:     0x564606619730 - <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once::hc9487a25e90496bc
                               at /rustc/1423bec54cf2db283b614e527cfd602b481485d1/src/liballoc/boxed.rs:942
  32:     0x564606619730 - std::sys_common::thread::start_thread::h74bf9040cbae9302
                               at src/libstd/sys_common/thread.rs:13
  33:     0x564606619730 - std::sys::unix::thread::Thread::new::thread_start::hc7c276ba81ec9914
                               at src/libstd/sys/unix/thread.rs:79
  34:     0x7f30298226ba - start_thread
  35:     0x7f302934041d - clone
  36:                0x0 - <unknown>

https://github.com/finalitylabs/FFT-Multiexp-AMD Now contains a pure C opencl build that is able to reproduce the error. Follow the build instructions on the repo and run the main program after generating inputs and it should fail on AMD cards with the following...

building program
i768
i768
i768
i768
i768
Error in hsa_operand section, at offset 72708:
Address offset exceeds variable size
LLVM ERROR: 
 Brig container validation has failed in BRIGAsmPrinter.cpp

Note that removing the compiler optimizer with -cl-opt-disable will allow this earlier version of the kernel to compile correctly.