iree-org / iree

A retargetable MLIR-based machine learning compiler and runtime toolkit.

Home Page:http://iree.dev/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Heap buffer overflow in `tensor.pack` with `bf16` element type and transposition.

bjacob opened this issue · comments

The reproducer is a RHS packing op for a bf16 matmul:

func.func @pack_RHS_for_bf16bf16f32_matmul(%input : tensor<1024x1024xbf16>) -> tensor<64x1024x16x2xbf16> {
    %cst = arith.constant 0.0 : bf16
    %empty = tensor.empty() : tensor<64x1024x16x2xbf16>
    %pack = tensor.pack %input padding_value(%cst : bf16) outer_dims_perm = [1, 0] inner_dims_pos = [1, 0] inner_tiles = [16, 2] into %empty : tensor<1024x1024xbf16> -> tensor<64x1024x16x2xbf16>
    return %pack : tensor<64x1024x16x2xbf16>
}

I handwrote this, so please check that I didn't get something wrong (though, these being static shapes, that should have been caught at compile time).

Compile: with --iree-hal-target-backends=llvm-cpu --iree-llvmcpu-target-cpu=znver4

Without ukernel (the default, for this pack op), or with ukernel (--iree-llvmcpu-enable-ukernels=pack), I get the same crash, which ASan diagnoses like this. Here it's shown without ukernel and it says read of size 32 located 23 bytes after region. By contrast, with ukernel, it says read of size 64 located 55 bytes after region. Since 32+23=55, IIUC this is exactly the same buffer overflow by the same offset, the only difference is that the ukernel uses (AVX-512) 64-byte loads. So there is something outside of the codegen for the pack op itself (since it happens irrespective of ukernel) that gets some address wrong, and the number 23 arises consistently with or without ukernel.

=================================================================
==346839==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7f8336c018df at pc 0x7f833b3df654 bp 0x7f833b639940 sp 0x7f833b639938
READ of size 32 at 0x7f8336c018df thread T2
    #0 0x7f833b3df653 in pack_RHS_for_bf16bf16f32_matmul_dispatch_0_pack_bf16 -:35:1
    #1 0x5a77a2b9c147 in iree_hal_system_executable_issue_call /home/benoit/iree/runtime/src/iree/hal/local/loaders/system_library_loader.c:331:13
    #2 0x5a77a2b28879 in iree_hal_cmd_dispatch_tile /home/benoit/iree/runtime/src/iree/hal/drivers/local_task/task_command_buffer.c:877:26
    #3 0x5a77a2b3792c in iree_task_dispatch_shard_execute /home/benoit/iree/runtime/src/iree/task/task.c:797:11
    #4 0x5a77a2b3a7aa in iree_task_worker_execute /home/benoit/iree/runtime/src/iree/task/worker.c:187:7
    #5 0x5a77a2b3a7aa in iree_task_worker_pump_once /home/benoit/iree/runtime/src/iree/task/worker.c:248:3
    #6 0x5a77a2b3a7aa in iree_task_worker_pump_until_exit /home/benoit/iree/runtime/src/iree/task/worker.c:307:12
    #7 0x5a77a2b3a7aa in iree_task_worker_main /home/benoit/iree/runtime/src/iree/task/worker.c:391:5
    #8 0x5a77a2b3cc05 in iree_thread_start_routine /home/benoit/iree/runtime/src/iree/base/internal/threading_pthreads.c:123:29
    #9 0x5a77a29d9b5c in asan_thread_start(void*) crtstuff.c

0x7f8336c018df is located 23 bytes after 2097352-byte region [0x7f8336a01800,0x7f8336c018c8)
allocated by thread T0 here:
    #0 0x5a77a29dc25d in calloc (/home/benoit/iree-build/tools/iree-benchmark-module+0x21825d) (BuildId: 55decc5b87a0a8f3)
    #1 0x5a77a2abfe7a in iree_allocator_system_alloc /home/benoit/iree/runtime/src/iree/base/allocator.c:105:17
    #2 0x5a77a2abfe7a in iree_allocator_system_ctl /home/benoit/iree/runtime/src/iree/base/allocator.c:145:14
    #3 0x5a77a2ac04b9 in iree_allocator_issue_alloc /home/benoit/iree/runtime/src/iree/base/allocator.c:27:10
    #4 0x5a77a2ac04b9 in iree_allocator_malloc /home/benoit/iree/runtime/src/iree/base/allocator.c:32:10
    #5 0x5a77a2ac04b9 in iree_allocator_malloc_aligned /home/benoit/iree/runtime/src/iree/base/allocator.c:284:3
    #6 0x5a77a2aec670 in iree_hal_heap_buffer_allocate_slab /home/benoit/iree/runtime/src/iree/hal/buffer_heap.c:98:3
    #7 0x5a77a2aec670 in iree_hal_heap_buffer_create /home/benoit/iree/runtime/src/iree/hal/buffer_heap.c:134:13
    #8 0x5a77a2aebee4 in iree_hal_heap_allocator_allocate_buffer /home/benoit/iree/runtime/src/iree/hal/allocator_heap.c:200:3
    #9 0x5a77a2ac9595 in iree_hal_allocator_allocate_buffer /home/benoit/iree/runtime/src/iree/hal/allocator.c:186:26
    #10 0x5a77a2ae0b83 in iree_hal_buffer_view_allocate_buffer_copy /home/benoit/iree/runtime/src/iree/hal/buffer_view_util.c:194:14
    #11 0x5a77a2ae13be in iree_hal_buffer_view_generate_buffer_in_situ /home/benoit/iree/runtime/src/iree/hal/buffer_view_util.c:227:3
    #12 0x5a77a2ae13be in iree_hal_buffer_view_generate_buffer /home/benoit/iree/runtime/src/iree/hal/buffer_view_util.c:330:14
    #13 0x5a77a2ae1cb2 in iree_hal_buffer_view_parse_impl /home/benoit/iree/runtime/src/iree/hal/buffer_view_util.c:431:10
    #14 0x5a77a2af4a00 in iree_tooling_parse_tensor /home/benoit/iree/runtime/src/iree/tooling/function_io.c:483:26
    #15 0x5a77a2af2c52 in iree_tooling_parse_tensor_into /home/benoit/iree/runtime/src/iree/tooling/function_io.c:514:3
    #16 0x5a77a2af2c52 in iree_tooling_parse_variant_into /home/benoit/iree/runtime/src/iree/tooling/function_io.c:661:10
    #17 0x5a77a2af2c52 in iree_tooling_parse_variants_into /home/benoit/iree/runtime/src/iree/tooling/function_io.c:686:9
    #18 0x5a77a2af2c52 in iree_tooling_parse_variants /home/benoit/iree/runtime/src/iree/tooling/function_io.c:713:26
    #19 0x5a77a2a1dff8 in iree::(anonymous namespace)::IREEBenchmark::RegisterSpecificFunction(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&) /home/benoit/iree/tools/iree-benchmark-module-main.cc:507:5
    #20 0x5a77a2a1dff8 in iree::(anonymous namespace)::IREEBenchmark::Register() /home/benoit/iree/tools/iree-benchmark-module-main.cc:462:7
    #21 0x5a77a2a1dff8 in main /home/benoit/iree/tools/iree-benchmark-module-main.cc:625:41
    #22 0x7f833b42a1c9 in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
    #23 0x7f833b42a28a in __libc_start_main csu/../csu/libc-start.c:360:3
    #24 0x5a77a2941224 in _start (/home/benoit/iree-build/tools/iree-benchmark-module+0x17d224) (BuildId: 55decc5b87a0a8f3)

Thread T2 created by T0 here:
    #0 0x5a77a29c19e5 in pthread_create (/home/benoit/iree-build/tools/iree-benchmark-module+0x1fd9e5) (BuildId: 55decc5b87a0a8f3)
    #1 0x5a77a2b3c2e7 in iree_thread_create /home/benoit/iree/runtime/src/iree/base/internal/threading_pthreads.c:173:10
    #2 0x5a77a2b3a130 in iree_task_worker_initialize /home/benoit/iree/runtime/src/iree/task/worker.c:66:26
    #3 0x5a77a2b2f19d in iree_task_executor_create /home/benoit/iree/runtime/src/iree/task/executor.c:183:16
    #4 0x5a77a2b2dd18 in iree_task_executors_create_from_flags /home/benoit/iree/runtime/src/iree/task/api.c:405:16
    #5 0x5a77a2b2282a in iree_hal_local_task_driver_factory_try_create /home/benoit/iree/runtime/src/iree/hal/drivers/local_task/registration/driver_module.c:59:3
    #6 0x5a77a2af003c in iree_hal_driver_registry_try_create /home/benoit/iree/runtime/src/iree/hal/driver_registry.c:314:14
    #7 0x5a77a2af0247 in iree_hal_create_device /home/benoit/iree/runtime/src/iree/hal/driver_registry.c:342:3
    #8 0x5a77a2aee1db in iree_hal_create_devices_from_flags /home/benoit/iree/runtime/src/iree/tooling/device_util.c:392:14
    #9 0x5a77a2aea796 in iree_tooling_load_hal_async_module /home/benoit/iree/runtime/src/iree/tooling/context_util.c:204:3
    #10 0x5a77a2aea796 in iree_tooling_resolve_module_dependency_callback /home/benoit/iree/runtime/src/iree/tooling/context_util.c:425:5
    #11 0x5a77a2bafd52 in iree_vm_bytecode_module_enumerate_dependencies /home/benoit/iree/runtime/src/iree/vm/bytecode/module.c:231:5
    #12 0x5a77a2ae9c19 in iree_tooling_resolve_modules /home/benoit/iree/runtime/src/iree/tooling/context_util.c:485:14
    #13 0x5a77a2aeb425 in iree_tooling_create_context_from_flags /home/benoit/iree/runtime/src/iree/tooling/context_util.c:610:3
    #14 0x5a77a2a1cdfe in iree::(anonymous namespace)::IREEBenchmark::Init() /home/benoit/iree/tools/iree-benchmark-module-main.cc:481:5
    #15 0x5a77a2a1cdfe in iree::(anonymous namespace)::IREEBenchmark::Register() /home/benoit/iree/tools/iree-benchmark-module-main.cc:457:7
    #16 0x5a77a2a1cdfe in main /home/benoit/iree/tools/iree-benchmark-module-main.cc:625:41
    #17 0x7f833b42a1c9 in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
    #18 0x7f833b42a28a in __libc_start_main csu/../csu/libc-start.c:360:3
    #19 0x5a77a2941224 in _start (/home/benoit/iree-build/tools/iree-benchmark-module+0x17d224) (BuildId: 55decc5b87a0a8f3)

SUMMARY: AddressSanitizer: heap-buffer-overflow -:35:1 in pack_RHS_for_bf16bf16f32_matmul_dispatch_0_pack_bf16
Shadow bytes around the buggy address:
  0x7f8336c01600: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x7f8336c01680: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x7f8336c01700: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x7f8336c01780: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x7f8336c01800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x7f8336c01880: 00 00 00 00 00 00 00 00 00 fa fa[fa]fa fa fa fa
  0x7f8336c01900: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x7f8336c01980: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x7f8336c01a00: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x7f8336c01a80: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x7f8336c01b00: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb

D'oh, I did get the shape wrong. 64x1024x16x2 is wrong, should be 64x512x16x2.