Crash / ASan issues using VmModule.wrap_buffer from Python
ScottTodd opened this issue · comments
(Lots of red herrings here, see #17635 (comment) for latest issue description)
Splitting off of llvm/torch-mlir#3433 to discuss IREE-specific details.
I wrote some test cases for https://pytorch.org/docs/stable/generated/torch.Tensor.index_put_.html starting from PyTorch and going through iree-turbine / torch-mlir to IREE: https://gist.github.com/ScottTodd/1e95795e79d17964078217ca98a3a398. Latest results:
test_single_value | PASS
test_multiple_values | PASS (then crash)
test_broadcast_value_along_axis | FAIL
test_broadcast_value_along_indices | FAIL
test_broadcast_values_along_axis | PASS (then crash)
The two "pass then crash" test cases used to pass without crashing and I found some interesting results bisecting through IREE releases.
release | test result |
---|---|
iree_compiler==20240514.893 | PASS --> CRASH |
iree_compiler==20240508.887 | |
iree_compiler==20240507.886 | Error invoking function ... |
iree_compiler==20240506.885 | |
iree_compiler==20240505.884 | |
iree_compiler==20240504.883 | |
iree_compiler==20240503.882 | |
iree_compiler==20240502.881 | |
iree_compiler==20240501.880 | Error invoking function ... |
iree_compiler==20240430.879 | |
iree_compiler==20240429.878 | |
iree_compiler==20240428.877 | |
iree_compiler==20240427.876 | Error invoking function ... |
iree_compiler==20240426.875 | |
iree_compiler==20240425.874 | Error invoking function ... |
iree_compiler==20240424.873 | Error invoking function ... |
iree_compiler==20240423.872 | PASS --> CRASH |
iree_compiler==20240422.871 | |
iree_compiler==20240421.870 | |
iree_compiler==20240420.869 | PASS --> CRASH |
iree_compiler==20240419.868 | PASS --> CRASH |
iree_compiler==20240418.867 | PASS |
iree_compiler==20240417.866 | PASS |
iree_compiler==20240416.865 | |
iree_compiler==20240415.864 | |
iree_compiler==20240414.863 | |
iree_compiler==20240412.861 | |
iree_compiler==20240411.860 | |
iree_compiler==20240410.859 | PASS |
Crash (during shutdown, maybe from writing out of bounds?):
======================================== 1 passed in 3.60s ======================================== Exception Code: 0xC0000005
#0 0x00007ffa6c89a495 PyInit__runtime (D:\dev\projects\SHARK-TestSuite\iree_tests\pytorch\operators\.venv\Lib\site-packages\iree\_runtime_libs\_runtime.cp311-win_amd64.pyd+0x4a495)
...
#5 0x00007ffa6c91bf1c nanobind::python_error::what(void) const (D:\dev\projects\SHARK-TestSuite\iree_tests\pytorch\operators\.venv\Lib\site-packages\iree\_runtime_libs\_runtime.cp311-win_amd64.pyd+0xcbf1c)
"Error invoking function" (fixed by #17339):
def _invoke(self, arg_list, ret_list):
> self._vm_context.invoke(self._vm_function, arg_list, ret_list)
E ValueError: Error invoking function: D:\a\iree\iree\c\runtime\src\iree\hal\command_buffer_validation.c:363: INVALID_ARGUMENT; source and target ranges overlap within the same buffer; while invoking native function hal.command_buffer.copy_buffer; while calling import;
E [ 2] native hal.command_buffer.copy_buffer:0 -
E [ 1] bytecode module.main$async:526 d:\dev\projects\SHARK-TestSuite\iree_tests\pytorch\operators\index_put_test.py:88:0
E [ 0] bytecode module.main:62 d:\dev\projects\SHARK-TestSuite\iree_tests\pytorch\operators\index_put_test.py:88:0
20240418.867 to 20240419.868, pass --> crash
Between 20240418.867 and 20240419.868, the test_multiple_values
test case started crashing.
before: https://github.com/iree-org/iree/releases/tag/candidate-20240418.867
after: https://github.com/iree-org/iree/releases/tag/candidate-20240419.868
diff (ff624dd...a2476ce):
λ git log --oneline candidate-20240418.867..candidate-20240419.868
a2476ceef5 [metal] Disable failing semaphore submission test until fixing (#17100)
0e1e6bfe7a Clarify fusion heuristic (#17098)
125f4200d8 [CodeGen] Add a pattern to fold extract_slice consumer into xfer.write. (#17067)
f755b42289 [Codegen] Add folding in createBoundedTileSize for partially dynamic wgSize. (#17089)
a0b4853dc7 Fixes to enable out-of-tree plugin builds. (#17095)
bd1b10626c [CodeGen] Drop encoding for HAL and Flow ops when DT is not supported. (#17081)
886c4169ac [Winograd] Use TilingInterface for all levels of winograd op tiling (#17061)
244195960f Add GPU dialect dependencies to C/Python bindings. (#17090)
ab4babee07 [ConstEval] Add flag to adjust tensor size limit for hoisting (#17064)
eec081c2c6 [LLVMGPU] Fallback if dynamic dim found on vector distribute. (#17085)
f86f21cd80 [python] Expose MLIR python bindings for gpu and transform (#17088)
4778f5f909 Categorize matmul/metvec like generic ops for dispatches (#17084)
f32a87ccd2 [Flow] Move elementwise op fusion and bubble up expand shapes patterns into their own pass. (#17068)
3677fbcce8 [runtime] Add semaphore test where 2 batches wait on a former batch amongst 2 (#17080)
20240423.872 to 20240424.873 crash --> error invoking function
before: https://github.com/iree-org/iree/releases/tag/candidate-20240423.872
after: https://github.com/iree-org/iree/releases/tag/candidate-20240424.873
diff (f5660ee...59532d3):
λ git log --oneline candidate-20240423.872..candidate-20240424.873
59532d30c0 Extend DecomposeConvolutionToLowerDimOpsPass (#17069)
1b4c76f931 Adding a hoistable attr interface to allow attaching attrs to hoists. (#17139)
729ebc642f [runtime][metal] exclude properly the failing semaphore test (#17151)
30acc53257 [runtime][cts] add test where a device batch signals another and the host (#17138)
94728e34c2 [LinalgExt][Winograd] Add winograd.filter_transform op (#17102)
4bc90e7431 Run pytorch/models tests on AMDGPU with Vulkan. (#17129)
330651efa6 [plugins][ROCM] Fix minor loc source resize bug (#17133)
f7098e3f66 Moving regression suite to azure (#17140)
290d812d96 [runtime][cts] add semaphore test where a batch waits on another and a host signal (#17130)
8736479f5d Prevent stream.async.update folding unless confirmed safe. (#17135)
7192e8c195 [Preprocessing] Add support for general ContractionOps and handle dynamic dimension using pad + expand. (#17123)
d7db353adf Add initial website documentation for ONNX frontend. (#17004)
568bb31780 [runtime][cts] Add test waiting on a semaphore for finite time and fix Vulkan driver (#17126)
655b71a282 Executable library call hooks system, and a sample Linux/CPU event implementation (#15803)
3dde925d11 [VectorDistribution] Add distribution pattern for vector::MultiDimReductionOp (#17076)
Oooh, the test crash is flaky. That makes bisecting trickier... would be easier on Linux with ASan.
I tried around 10 times at iree_compiler==20240418.867
and can't repro there.
Still verifying with ASan on Linux to avoid the flakes, but my bisect on Windows seems to be pointing to f32a87ccd2 for the test crash. That's "mostly" an NFC according to @MaheshRavishankar 🤔
Bah, having trouble pinning this down.
- Linux segfaults in my Python unit test on package versions as old as
20240410.859
(and possibly earlier). - Linux source builds with ASan at tip of tree report no issues with the test case when run through the
iree-compile
andiree-run-module
CLI tools.Trying to also run through the Python bindings but hitting(had built in one venv and then tried to use from another venv)ImportError: cannot import name 'ir' from 'iree.compiler._mlir_libs._mlir'
and other venv setup issues. - On Windows I diffed the
--mlir-print-ir-after-all
before and after f32a87ccd2 and there aren't many changes: codegen IR is identical beforeSerializeTargetExecutablesPass
and a few passes moved around but that's about it.
:/ yeah, .vmfb files produced before and after f32a87ccd2 are identical and that commit didn't modify runtime code or python bindings. That's what my bisect pointed to though.
Going to keep trying with ASan on Linux to see if I can get a 100% repro case instead of the flaky crashes on Windows.
Following https://iree.dev/developers/debugging/sanitizers/#asan-addresssanitizer on Linux got me a source Python build with ASan on Linux that reports errors in the test case I've been working with:
(.venv) scotttodd@scotttodd-cpu:~/scratch/tests$ LD_PRELOAD=/usr/lib/llvm-14/lib/clang/14.0.0/lib/linux/libclang_rt.asan-x86_64.so pytest --log-cli-level=info index_put_test.py::TestIndexPut::test_multiple_values
================================================================================================== test session starts ===================================================================================================
platform linux -- Python 3.11.0rc1, pytest-8.2.2, pluggy-1.5.0
rootdir: /home/scotttodd/scratch/tests
collected 1 item
index_put_test.py::TestIndexPut::test_multiple_values
----------------------------------------------------------------------------------------------------- live log call ------------------------------------------------------------------------------------------------------
INFO index_put_test:index_put_test.py:110 module @module {
func.func @main(%arg0: !torch.tensor<[3,6],f32>) -> !torch.vtensor<[3,6],f32> attributes {torch.assume_strict_symbolic_shapes} {
%0 = torch.vtensor.literal(dense_resource<torch_tensor_3_torch.int64> : tensor<3xsi64>) : !torch.vtensor<[3],si64>
%1 = torch.vtensor.literal(dense_resource<torch_tensor_3_torch.int64_1> : tensor<3xsi64>) : !torch.vtensor<[3],si64>
%2 = torch.vtensor.literal(dense_resource<torch_tensor_3_torch.float32> : tensor<3xf32>) : !torch.vtensor<[3],f32>
%3 = torch.copy.to_vtensor %arg0 : !torch.vtensor<[3,6],f32>
%none = torch.constant.none
%4 = torch.aten.clone %0, %none : !torch.vtensor<[3],si64>, !torch.none -> !torch.vtensor<[3],si64>
%none_0 = torch.constant.none
%5 = torch.aten.clone %1, %none_0 : !torch.vtensor<[3],si64>, !torch.none -> !torch.vtensor<[3],si64>
%none_1 = torch.constant.none
%6 = torch.aten.clone %2, %none_1 : !torch.vtensor<[3],f32>, !torch.none -> !torch.vtensor<[3],f32>
%7 = torch.prim.ListConstruct %4, %5 : (!torch.vtensor<[3],si64>, !torch.vtensor<[3],si64>) -> !torch.list<optional<vtensor>>
%false = torch.constant.bool false
%8 = torch.aten.index_put %3, %7, %6, %false : !torch.vtensor<[3,6],f32>, !torch.list<optional<vtensor>>, !torch.vtensor<[3],f32>, !torch.bool -> !torch.vtensor<[3,6],f32>
torch.overwrite.tensor.contents %8 overwrites %arg0 : !torch.vtensor<[3,6],f32>, !torch.tensor<[3,6],f32>
return %8 : !torch.vtensor<[3,6],f32>
}
}
{-#
dialect_resources: {
builtin: {
torch_tensor_3_torch.int64: "0x08000000000000000000000001000000000000000200000000000000",
torch_tensor_3_torch.int64_1: "0x08000000030000000000000004000000000000000500000000000000",
torch_tensor_3_torch.float32: "0x04000000CDCCCC3DCDCC4C3E9A99993E"
}
}
#-}
INFO index_put_test:index_put_test.py:128 [[0. 0. 0. 0.1 0. 0. ]
[0. 0. 0. 0. 0.2 0. ]
[0. 0. 0. 0. 0. 0.3]]
PASSED [100%]
=================================================================================================== 1 passed in 5.95s ====================================================================================================
AddressSanitizer:DEADLYSIGNAL
=================================================================
==78453==ERROR: AddressSanitizer: SEGV on unknown address 0x7f04a3c4d074 (pc 0x7f04a12ad2d8 bp 0x000000000000 sp 0x7ffe78c08730 T0)
==78453==The signal is caused by a READ memory access.
#0 0x7f04a12ad2d8 (/home/scotttodd/scratch/tests/.venv/lib/python3.11/site-packages/iree/_runtime_libs/_runtime.cpython-311-x86_64-linux-gnu.so+0xae2d8) (BuildId: 32e87a22f20d0241)
#1 0x7f04a1321d78 (/home/scotttodd/scratch/tests/.venv/lib/python3.11/site-packages/iree/_runtime_libs/_runtime.cpython-311-x86_64-linux-gnu.so+0x122d78) (BuildId: 32e87a22f20d0241)
#2 0x7f04a1321b86 (/home/scotttodd/scratch/tests/.venv/lib/python3.11/site-packages/iree/_runtime_libs/_runtime.cpython-311-x86_64-linux-gnu.so+0x122b86) (BuildId: 32e87a22f20d0241)
#3 0x7f04a125482d (/home/scotttodd/scratch/tests/.venv/lib/python3.11/site-packages/iree/_runtime_libs/_runtime.cpython-311-x86_64-linux-gnu.so+0x5582d) (BuildId: 32e87a22f20d0241)
#4 0x5af471 (/usr/bin/python3.11+0x5af471) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)
#5 0x4d8278 (/usr/bin/python3.11+0x4d8278) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)
#6 0x62a28d (/usr/bin/python3.11+0x62a28d) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)
#7 0x654986 in PyGC_Collect (/usr/bin/python3.11+0x654986) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)
#8 0x646d40 in Py_FinalizeEx (/usr/bin/python3.11+0x646d40) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)
#9 0x668cfb in Py_Exit (/usr/bin/python3.11+0x668cfb) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)
#10 0x6555de (/usr/bin/python3.11+0x6555de) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)
#11 0x6555a5 in PyErr_PrintEx (/usr/bin/python3.11+0x6555a5) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)
#12 0x650a91 in _PyRun_SimpleFileObject (/usr/bin/python3.11+0x650a91) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)
#13 0x650832 in _PyRun_AnyFileObject (/usr/bin/python3.11+0x650832) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)
#14 0x64f786 in Py_RunMain (/usr/bin/python3.11+0x64f786) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)
#15 0x61ee0c in Py_BytesMain (/usr/bin/python3.11+0x61ee0c) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)
#16 0x7f04a4429d8f (/lib/x86_64-linux-gnu/libc.so.6+0x29d8f) (BuildId: c289da5071a3399de893d2af81d6a30c62646e1e)
#17 0x7f04a4429e3f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x29e3f) (BuildId: c289da5071a3399de893d2af81d6a30c62646e1e)
#18 0x61ec94 in _start (/usr/bin/python3.11+0x61ec94) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)
AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV (/home/scotttodd/scratch/tests/.venv/lib/python3.11/site-packages/iree/_runtime_libs/_runtime.cpython-311-x86_64-linux-gnu.so+0xae2d8) (BuildId: 32e87a22f20d0241)
==78453==ABORTING
That's at tip of tree. Going to sync back to the suspected commit ranges to see if ASan similarly complains there.
Thanks @ScottTodd . If it does get back to that commit then let me know.
Having a hard time running my python test from an older commit.
-
Had to downgrade nanobind to build the IREE Python bindings from 2 months ago
-
Seeing this error when used together with the latest (pip installable)
iree-turbine
package:Traceback: /usr/lib/python3.11/importlib/__init__.py:126: in import_module return _bootstrap._gcd_import(name[level:], package, level) index_put_test.py:12: in <module> import shark_turbine.aot as aot .venv/lib/python3.11/site-packages/shark_turbine/aot/__init__.py:7: in <module> from .builtins import * .venv/lib/python3.11/site-packages/shark_turbine/aot/builtins/__init__.py:7: in <module> from .globals import * .venv/lib/python3.11/site-packages/shark_turbine/aot/builtins/globals.py:12: in <module> from ..support.procedural import ( .venv/lib/python3.11/site-packages/shark_turbine/aot/support/procedural/__init__.py:13: in <module> from .base import * .venv/lib/python3.11/site-packages/shark_turbine/aot/support/procedural/base.py:22: in <module> from ....support.ir_imports import ( .venv/lib/python3.11/site-packages/shark_turbine/support/ir_imports.py:10: in <module> from iree.compiler.ir import ( ../../iree-build/compiler/bindings/python/iree/compiler/ir.py:5: in <module> from ._mlir_libs._mlir.ir import * ../../iree-build/compiler/bindings/python/iree/compiler/_mlir_libs/__init__.py:180: in <module> _site_initialize() ../../iree-build/compiler/bindings/python/iree/compiler/_mlir_libs/__init__.py:78: in _site_initialize from ._mlir import ir E ImportError: /home/scotttodd/iree-build/compiler/bindings/python/iree/compiler/_mlir_libs/_mlir.cpython-311-x86_64-linux-gnu.so: undefined symbol: mlirSetGlobalDebugTypes, version IREE_API_0.0
-
Seeing alignment issues after switching from
iree-turbine
to justfrom iree.compiler import compile_file
on an already exported .mlir file:compiled_module = compile_file("/home/scotttodd/scratch/tests/index_put_multiple_values.mlir", target_backends=["llvm-cpu"]) config = ireert.Config("local-sync") vm_module = ireert.load_vm_module( > ireert.VmModule.wrap_buffer( # config.vm_instance, compiled_module.map_memory() config.vm_instance, compiled_module ), config, ) E ValueError: VmModule.from_aligned_memory received an unaligned buffer. Got 0x0x625001860120, expected alignment 64
Argh, red herrings all around.
Running a trivial test case instead of index_put
class TorchModule(torch.nn.Module):
def forward(self, input):
return input + torch.ones(3, 4)
through Python in this setup with ASan on tip of tree also produces
==246621==ERROR: AddressSanitizer: SEGV on unknown address 0x7f96b787e050 (pc 0x7f96b43422d8 bp 0x000000000000 sp 0x7ffdeacb5240 T0)
==246621==The signal is caused by a READ memory access.
the problematic lines for the ASan error are
vm_module = ireert.load_vm_module(
ireert.VmModule.wrap_buffer(
config.vm_instance, compiled_module.map_memory()
),
config,
)
Soooo... why does my TestIndexPut::test_single_value
test case succeed on Windows while the TestIndexPut::test_multiple_values
test case succeeds then crashes? Maybe the test cases write to different elements, near the edges of the buffers:
# Passing
[ # col 0 col 1 col 2 col 3
[0.0000, 0.0000, 0.0000, 0.0000], # row 0
[0.0000, 0.0000, 0.5000, 0.0000], # row 1
[0.0000, 0.0000, 0.0000, 0.0000], # row 2
]
# Failing
[ # col 0 col 1 col 2 col 3 col 4 col 5
[0.0000, 0.0000, 0.0000, 0.1000, 0.0000, 0.0000], # row 0
[0.0000, 0.0000, 0.0000, 0.0000, 0.2000, 0.0000], # row 1
[0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.3000], # row 2
]
Then also why is this behavior so difficult to pin down? Is ASan missing something? Is Python a relevant detail? The index put operations are trying to do in-place processing that could be hitting a special path in Python vs through iree-run-module
. Could try building a .c
test file using the C API.
Maybe the test cases write to different elements, near the edges of the buffers:
Yep! This put of a single value also passes:
# indices=[torch.tensor([0]), torch.tensor([3])],
[ # col 0 col 1 col 2 col 3
[0.0000, 0.0000, 0.0000, 0.5000], # row 0
[0.0000, 0.0000, 0.0000, 0.0000], # row 1
[0.0000, 0.0000, 0.0000, 0.0000], # row 2
]
but these (writing a single value into the last row) crash:
# indices=[torch.tensor([2]), torch.tensor([3])],
[ # col 0 col 1 col 2 col 3
[0.0000, 0.0000, 0.0000, 0.0000], # row 0
[0.0000, 0.0000, 0.0000, 0.0000], # row 1
[0.0000, 0.0000, 0.0000, 0.5000], # row 2
]
# indices=[torch.tensor([2]), torch.tensor([0])],
[ # col 0 col 1 col 2 col 3
[0.0000, 0.0000, 0.0000, 0.0000], # row 0
[0.0000, 0.0000, 0.0000, 0.0000], # row 1
[0.5000, 0.0000, 0.0000, 0.0000], # row 2
]
Writing into the last row of a [3, 4] tensor crashes. Writing into the last row of a [4, 4] or [6, 6] tensor does not. The full model from nod-ai/sharktank#22 was using [?,16,32,100]
. I'm wondering if the dynamic dim was odd?
After restarting my machine, I'm only seeing the Python test crash in about 1/30 runs. That's going to make it hard to verify which test cases are definitely working and which aren't. Results were very consistent yesterday...
Bah, just saw a crash writing into the middle of a 4x4 tensor. The crash I'm seeing might be unique to the Python bindings and maybe not even unique to index_put_
. Logs: https://gist.github.com/ScottTodd/a0c0e68d1abeb3240f782045c4c70e80
Aw, saw the same Python crash performing an elementwise add on a 4x4 tensor...
Minimal repro for the Python AddressSanitizer report (note: this shows up when using iree-turbine with example code like https://github.com/iree-org/iree-turbine/blob/4b451f84b03f87af21a9b785b0ddd68094f43ed8/examples/aot_mlp/mlp_export_simple.py#L45-L49)
Seems to be related to VmModule.wrap_buffer()
and map_memory()
on a compiler output. Using VmModule.copy_buffer()
avoids the ASan issue.
import iree.runtime as ireert
from iree.compiler.api import (
Session,
Source,
Output,
)
session = Session()
session.set_flags("--iree-hal-target-backends=vmvx")
inv = session.invocation()
source = Source.wrap_buffer(
session,
b"""
builtin.module {
func.func @abs(%input : tensor<4xf32>) -> (tensor<4xf32>) {
%result = math.absf %input : tensor<4xf32>
return %result : tensor<4xf32>
}
}""",
)
inv.parse_source(source)
inv.execute()
out = Output.open_membuffer()
inv.output_vm_bytecode(out)
config = ireert.Config("local-sync")
# ASan issue goes away if this is commented out
vm_module = ireert.load_vm_module(
ireert.VmModule.wrap_buffer(config.vm_instance, out.map_memory()),
config,
)
(.venv) scotttodd@scotttodd-cpu:~/scratch/tests$ LD_PRELOAD=/usr/lib/llvm-14/lib/clang/14.0.0/lib/linux/libclang_rt.asan-x86_64.so ASAN_OPTIONS=detect_leaks=0 python asan_minimal_repro.py
AddressSanitizer:DEADLYSIGNAL
=================================================================
==39396==ERROR: AddressSanitizer: SEGV on unknown address 0x7fbf57968050 (pc 0x7fbffd1712d8 bp 0x000000000000 sp 0x7fffe15017f0 T0)
==39396==The signal is caused by a READ memory access.
#0 0x7fbffd1712d8 (/home/scotttodd/scratch/tests/.venv/lib/python3.11/site-packages/iree/_runtime_libs/_runtime.cpython-311-x86_64-linux-gnu.so+0xae2d8) (BuildId: 32e87a22f20d0241)
#1 0x7fbffd1e5d78 (/home/scotttodd/scratch/tests/.venv/lib/python3.11/site-packages/iree/_runtime_libs/_runtime.cpython-311-x86_64-linux-gnu.so+0x122d78) (BuildId: 32e87a22f20d0241)
#2 0x7fbffd1e5b86 (/home/scotttodd/scratch/tests/.venv/lib/python3.11/site-packages/iree/_runtime_libs/_runtime.cpython-311-x86_64-linux-gnu.so+0x122b86) (BuildId: 32e87a22f20d0241)
#3 0x7fbffd11882d (/home/scotttodd/scratch/tests/.venv/lib/python3.11/site-packages/iree/_runtime_libs/_runtime.cpython-311-x86_64-linux-gnu.so+0x5582d) (BuildId: 32e87a22f20d0241)
#4 0x5af471 (/usr/bin/python3.11+0x5af471) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)
#5 0x4d8278 (/usr/bin/python3.11+0x4d8278) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)
#6 0x6557cf (/usr/bin/python3.11+0x6557cf) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)
#7 0x654f99 (/usr/bin/python3.11+0x654f99) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)
#8 0x646d4d in Py_FinalizeEx (/usr/bin/python3.11+0x646d4d) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)
#9 0x64f5ea in Py_RunMain (/usr/bin/python3.11+0x64f5ea) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)
#10 0x61ee0c in Py_BytesMain (/usr/bin/python3.11+0x61ee0c) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)
#11 0x7fbfff629d8f (/lib/x86_64-linux-gnu/libc.so.6+0x29d8f) (BuildId: c289da5071a3399de893d2af81d6a30c62646e1e)
#12 0x7fbfff629e3f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x29e3f) (BuildId: c289da5071a3399de893d2af81d6a30c62646e1e)
#13 0x61ec94 in _start (/usr/bin/python3.11+0x61ec94) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)
AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV (/home/scotttodd/scratch/tests/.venv/lib/python3.11/site-packages/iree/_runtime_libs/_runtime.cpython-311-x86_64-linux-gnu.so+0xae2d8) (BuildId: 32e87a22f20d0241)
==39396==ABORTING
interesting - the compiler Source
seems to hang on to the buffer provided in wrap_buffer
, but you could try adding a print to its close()
- the buffer as provided to ireeCompilerSourceWrapBuffer
is unowned (it's just a char*) so if the python side doesn't keep it live it'll go boom
iow, that Source
instance must remain live until the iree_vm_module_t
created from it is deleted - you could add a printf in there to see if they happen out of order
Try with copy_buffer instead of wrap_buffer just to eliminate variables. Removes any potential alignment or ownership issues.
Ok, so that is a definite bug with wrap_buffer used in that way. Will need to fortify testing of that and fix.
But probably not what you are trying to find...
Try with copy_buffer instead of wrap_buffer just to eliminate variables. Removes any potential alignment or ownership issues.
My tests pass a clean ASan report when I use copy_buffer
instead of wrap_buffer
. That might be enough to call the lowering/codegen/runtime for index_put_
correct (except for the unhandled broadcast case in the torch-mlir lowering, tracked here: llvm/torch-mlir#3433).
you could add a printf in there to see if they happen out of order
Trying this now. Printfs seem to be working from compiler Python code but I'm not seeing my changes to the runtime reflected in my venv... strange (my PYTHONPATH and build setup both seem fine...).
Sorta figured out my python bindings debug setup:
- Building with CMake and setting
PYTHONPATH
was not enough to get updated C++ code in the runtime, at least when I already hadiree-runtime
installed from pip. Maybe uninstalling from pip would help. What I ended up doing waspython -m pip wheel runtime/ -v
to build a wheel thenpip install iree_runtime-0.dev0+[...].whl
I found that if I change BoundModule
in system_api.py
to no longer retain the SystemContext
:
iree/runtime/bindings/python/iree/runtime/system_api.py
Lines 123 to 130 in 0a561c4
then the ASan error goes away. Something is trying to access memory that it shouldn't be, just not sure what specifically.
Printf debugging:
test.py calling Source.wrap_buffer(session, ...)
--- CompilerDriver.cpp: ireeCompilerSourceWrapBuffer, length: 167, isNullTerminated: 0 ---
test.py calling inv.parse_source(source)
test.py calling inv.execute()
test.py calling out = Output.open_membuffer()
--- CompilerDriver.cpp: ireeCompilerOutputOpenMembuffer ---
test.py calling inv.output_vm_bytecode(out)
--- CompilerDriver.cpp: ireeCompilerInvocationOutputVMBytecode ---
test.py calling ireert.VmModule.wrap_buffer
--- CompilerDriver.cpp: ireeCompilerOutputMapMemory ---
--- vm.cc: VmModule::WrapBuffer, (edit) close_buffer: 0 ---
test.py calling ireert.load_vm_module
system_api.py load_vm_module --> load_vm_modules
system_api.py load_vm_modules
system_api.py: SystemContext::__init__
system_api.py: SystemContext::__init__ is *not* dynamic
system_api.py: SystemContext::__init__ setup self._bound_modules
test.py finished
--- CompilerDriver.cpp: ireeCompilerSourceDestroy start ---
--- CompilerDriver.cpp: ireeCompilerSourceDestroy finish ---
AddressSanitizer:DEADLYSIGNAL
=================================================================
==189798==ERROR: AddressSanitizer: SEGV on unknown address
Let's fork this repro into an issue for wrap_buffer from the compiler API. I've seen evidence of something like this before but have had trouble getting it to repro.
Sure. I'm not sure if there's actually an issue with index_put_
at all... possibly all red herrings in my testing that were caused by wrap_buffer
. Can still start a more focused issue.
Frustrating.
This test looks quite similar to what I'm testing: https://github.com/iree-org/iree/blob/main/compiler/bindings/python/test/api/output_buffer_reference_test.py, and that is ASan-clean.
I'll see where the code diverges and try to file a more focused issue once I know more.
If it turns out this is the issue, we can just rename this issue and not have another one. Was this the only thing wrong the whole time?
This test looks quite similar to what I'm testing: https://github.com/iree-org/iree/blob/main/compiler/bindings/python/test/api/output_buffer_reference_test.py, and that is ASan-clean.
I'll see where the code diverges and try to file a more focused issue once I know more.
I mean, I've "fixed" this bug a couple of times. Something subtle is wrong and I think it needs a closer look at actual reference counts or something.
- output_buffer_reference_test stops at
module = VmModule.wrap_buffer(instance, mapped_memory) context = VmContext(instance, modules=[module])
- my test stops at
vm_module = ireert.load_vm_module( ireert.VmModule.wrap_buffer(config.vm_instance, out.map_memory()), config, )
The setup appears to all be the same.
Was this the only thing wrong the whole time?
Unclear. We originally hit crashes in decode()
for the sharktank llama model in Python and native tools that we were trying to reduce. Some forms of reducing the test hit unsupported lowerings (broadcasting behavior). I think there's still a different crash involved somewhere since this latest line of debugging involves only the Python bindings.
- Building with CMake and setting
PYTHONPATH
was not enough to get updated C++ code in the runtime, at least when I already hadiree-runtime
installed from pip. Maybe uninstalling from pip would help. What I ended up doing waspython -m pip wheel runtime/ -v
to build a wheel thenpip install iree_runtime-0.dev0+[...].whl
Aha! Uninstalling the python runtime wheel and just setting PYTHONPATH
to the build worked, and now I get stack traces from ASan!
scotttodd@scotttodd-cpu:~/scratch/tests$ LD_PRELOAD=/usr/lib/llvm-14/lib/clang/14.0.0/lib/linux/libclang_rt.asan-x86_64.so ASAN_OPTIONS=detect_leaks=0 ASAN_SYMBOLIZER_PATH=/usr/lib/llvm-14/bin/llvm-symbolizer python asan_repro_debugging.py
test.py calling Source.wrap_buffer(session, ...)
CompilerDriver.cpp: ireeCompilerSourceWrapBuffer, length: 167, isNullTerminated: 0 ---
test.py calling inv.parse_source(source)
test.py calling inv.execute()
test.py calling out = Output.open_membuffer()
CompilerDriver.cpp: ireeCompilerOutputOpenMembuffer
test.py calling inv.output_vm_bytecode(out)
CompilerDriver.cpp: ireeCompilerInvocationOutputVMBytecode
test.py calling out.map_memory()
ctypes_dl.py :: Output::map_memory
CompilerDriver.cpp: ireeCompilerOutputMapMemory
test.py calling ireert.VmModule.wrap_buffer()
vm.cc: VmModule::WrapBuffer, close_buffer: 0
vm.cc: VmModule::WrapBuffer, iree_vm_bytecode_module_create
VMFB Length = 5558
test.py calling ireert.load_vm_module
system_api.py load_vm_module --> load_vm_modules
system_api.py load_vm_modules
system_api.py: SystemContext::__init__
system_api.py: SystemContext::__init__ is *not* dynamic
system_api.py: SystemContext::__init__ setup self._bound_modules
test.py finished
CompilerDriver.cpp: ireeCompilerSourceDestroy start
CompilerDriver.cpp: ireeCompilerSourceDestroy finish
AddressSanitizer:DEADLYSIGNAL
=================================================================
==229852==ERROR: AddressSanitizer: SEGV on unknown address 0x7f66510ff050 (pc 0x7f66efa5f25e bp 0x7fff9db6e9d0 sp 0x7fff9db6e950 T0)
==229852==The signal is caused by a READ memory access.
+ #0 0x7f66efa5f25e in __flatbuffers_soffset_read /home/scotttodd/iree/third_party/flatcc/include/flatcc/flatcc_endian.h:89:2
+ #1 0x7f66efa5f25e in __flatbuffers_soffset_read_from_pe /home/scotttodd/iree/third_party/flatcc/include/flatcc/flatcc_endian.h:89:2
+ #2 0x7f66efa5f25e in iree_vm_BytecodeModuleDef_exported_functions /home/scotttodd/iree-build/runtime/src/iree/schemas/bytecode_module_def_reader.h:693:1
+ #3 0x7f66efa5f25e in iree_vm_bytecode_module_lookup_function /home/scotttodd/iree/runtime/src/iree/vm/bytecode/module.c:292:9
+ #4 0x7f66efb5b497 in iree_vm_context_run_function /home/scotttodd/iree/runtime/src/iree/vm/context.c:77:26
+ #5 0x7f66efb5b497 in iree_vm_context_release_modules /home/scotttodd/iree/runtime/src/iree/vm/context.c:269:5
+ #6 0x7f66efb5acba in iree_vm_context_destroy /home/scotttodd/iree/runtime/src/iree/vm/context.c:357:5
+ #7 0x7f66ef9c0cbe in iree::python::ApiPtrAdapter<iree_vm_context_t>::Release(iree_vm_context_t*) /home/scotttodd/iree/runtime/bindings/python/./vm.h:42:47
+ #8 0x7f66ef9c0cbe in iree::python::ApiRefCounted<iree::python::VmContext, iree_vm_context_t>::Release() /home/scotttodd/iree/runtime/bindings/python/./binding.h:107:7
+ #9 0x7f66ef9c0cbe in iree::python::ApiRefCounted<iree::python::VmContext, iree_vm_context_t>::~ApiRefCounted() /home/scotttodd/iree/runtime/bindings/python/./binding.h:59:22
+ #10 0x7f66ef9e40f5 in nanobind::detail::inst_dealloc(_object*) /home/scotttodd/iree/.venv/lib/python3.11/site-packages/nanobind/src/nb_type.cpp:229:13
#11 0x5af471 (/usr/bin/python3.11+0x5af471) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)
#12 0x4d8278 (/usr/bin/python3.11+0x4d8278) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)
#13 0x6557cf (/usr/bin/python3.11+0x6557cf) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)
#14 0x654f99 (/usr/bin/python3.11+0x654f99) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)
#15 0x646d4d in Py_FinalizeEx (/usr/bin/python3.11+0x646d4d) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)
#16 0x64f5ea in Py_RunMain (/usr/bin/python3.11+0x64f5ea) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)
#17 0x61ee0c in Py_BytesMain (/usr/bin/python3.11+0x61ee0c) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)
#18 0x7f66f2229d8f (/lib/x86_64-linux-gnu/libc.so.6+0x29d8f) (BuildId: c289da5071a3399de893d2af81d6a30c62646e1e)
#19 0x7f66f2229e3f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x29e3f) (BuildId: c289da5071a3399de893d2af81d6a30c62646e1e)
#20 0x61ec94 in _start (/usr/bin/python3.11+0x61ec94) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)
AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV /home/scotttodd/iree/third_party/flatcc/include/flatcc/flatcc_endian.h:89:2 in __flatbuffers_soffset_read
==229852==ABORTING
So the VM is trying to run the __deinit
function on the module, but the module was already freed...?
iree/runtime/src/iree/vm/context.c
Lines 269 to 270 in 71c07fa
Yeah, ASan is happy if I clear the loaded module before letting the program exit on its own:
loaded_module = ireert.load_vm_module(
wrapped_buffer,
config,
)
+loaded_module = None
This passes ASan, finishing with no leaks or segfaults:
instance = VmInstance()
output = Output.open_membuffer()
output.write(vmfb_contents)
mapped_memory = output.map_memory()
module = VmModule.wrap_buffer(instance, mapped_memory)
context = VmContext(instance, modules=[module])
This crashes and trips ASan with a segfault on a read memory access:
instance = VmInstance()
output = Output.open_membuffer()
output.write(vmfb_contents)
mapped_memory = output.map_memory()
module = VmModule.wrap_buffer(instance, mapped_memory)
# note this line is different!
loaded_module = load_vm_module(module)
The source for that different line is here:
iree/runtime/bindings/python/iree/runtime/system_api.py
Lines 247 to 251 in 71c07fa
the SystemContext
class creates BoundModule
classes that retain a reference to the self
SystemContext
:
iree/runtime/bindings/python/iree/runtime/system_api.py
Lines 204 to 206 in 71c07fa
Is that reference cycle throwing off the usual garbage collector / shutdown / destruction ordering?
Adding a weakref here makes ASan happy for tests that just load a module and then exit. That's not quite the right fix though, since tests that actually use the module seem to then be missing the object (already gc'd?) :P
self._bound_modules = BoundModules(
- [(m.name, BoundModule(self, m)) for m in init_vm_modules]
+ [(m.name, BoundModule(weakref.ref(self), m)) for m in init_vm_modules]
)
We need to root cause where the ref isn't being retained... It shouldn't be possible to crash with pure python usage of the APIs.
This might actually be behind some other ghosts I've been carefully trying to catch for a while. There's a bit of art to tracking this to a root cause. I can't do it right now but could help.
You started down this path because of a specific failure scenario. Is this the root cause of that or just an incidental thing found along the way (just trying to understand whether we have more going on)?
You started down this path because of a specific failure scenario. Is this the root cause of that or just an incidental thing found along the way (just trying to understand whether we have more going on)?
I believe it's incidental along the way, but I'd love to be wrong there.
The sequence was roughly:
-
We observed that the llama model from sharktank was crashing during the
decode()
step: nod-ai/sharktank#22 (comment), https://github.com/rsuderman/sharktank/blob/d9b25900a069f8200948da322de693e5d6bae15f/shortfin/shortfin/llm/impl/service_v1_cli.py#L101-L106 . We suspected out of bounds write in the kv_cache relying on the in-placeindex_put_
(source here), but there were too many unknowns to say for sure.~/Repos/iree/build/tools/iree-run-module --module=/tmp/batch_llama_v1.vmfb --function=decode_bs4 --parameters=model=/media/rsuderman/Disk2/Models/llama.slycho.gguf --input=4x1xsi64=0 --input=4xsi64=1 --input=4xsi64=1 --input=4x1xsi64=0,1,2,3 --input=8x2662400xf16=0.0 EXEC @decode_bs4 Exception thrown at 0x000001AA8B4EBFF8 in iree-run-module.exe: 0xC0000005: Access violation writing location 0xFFFF9B96ECCF4300. iree-run-module.exe!iree_elf_call_i_ppp() Line 184 (d:\dev\projects\iree\runtime\src\iree\hal\local\elf\arch\x86_64_msvc.asm:184)
-
We attempted to reduce the full model down to a smaller model and some unit tests for
index_put_
. Some of those failed to compile, due to llvm/torch-mlir#3433 (unimplemented supported for broadcasting variants of the PyTorchindex_put_
op in torch-mlir). -
Other unit tests produced correct outputs but crashed at runtime. The crash looked to be consistent enough to bisect across releases and I thought I had a culprit range of commits, but that was not the case. That led to this issue, which eventually narrowed in on the Python tests themselves hitting this buggy
wrap_buffer
shutdown / garbage collect behavior, independent of what the actual code in the test modules was doing.
So if this gets fixed, writing tests and exercising the code from Python will be more stable (especially with ASan enabled), but there is likely more debugging ahead - either going back to the full program or trying to build up component by component (e.g. verifying the behavior of just PagedKVCache.write_timestep
and the other functions involved there).
Still debugging this, reading through the changes in #15975
The iree_compiler_output_t
Output
from compiler/bindings/python/iree/compiler/api/ctypes_dl.py
is hitting its weakref.finalize(pointer, lambda x: ..., self)
call before the deallcator from VmModule VmModule::WrapBuffer
runs.
iree/compiler/bindings/python/iree/compiler/api/ctypes_dl.py
Lines 273 to 290 in b4321ea
iree/runtime/bindings/python/vm.cc
Lines 289 to 387 in b4321ea
https://docs.python.org/3/library/gc.html#gc.set_debug this looks useful...
Output with gc.set_debug(gc.DEBUG_STATS | gc.DEBUG_COLLECTABLE | gc.DEBUG_UNCOLLECTABLE)
and a ton of printfs: https://gist.github.com/ScottTodd/453834cb5731598f6f8a15b27700d518
I'm skeptical of this pattern:
pointer = (c_char * size).from_address(contents.value)
weakref.finalize(pointer, lambda x: ..., self)
return pointer
I've made a number of attempts to add extra references, log what the garbage collector is doing, log which C++ constructors / destructors are running, etc. here: https://github.com/iree-org/iree/compare/main...ScottTodd:wrap-buffer-debugging?expand=1. Not sure what else to try. Reading various stackoverflow questions about python memoryview
, from_address
, garbage collection, and related systems now...
Found another project using the same code pattern 🤔 https://github.com/GridTools/stencil_benchmarks/blob/33658cd68f3e2248611f6a845a39ae8b9684af78/stencil_benchmarks/benchmarks_collection/stencils/cuda_hip/api.py#L54-L69
Also reading https://nanobind.readthedocs.io/en/latest/ownership.html now
I'm not seeing the destructors for BufferState
or PyBufferRequest
run before the weakref.finalize
on Output.map_memory()
runs, so they aren't holding onto a reference to the pointer / memoryview?
iree/runtime/bindings/python/vm.cc
Line 299 in 045bf32
I guess we're also interoping between nanobind and pybind here? The compiler uses pybind but the runtime uses nanobind.
Yes, lots of interop. It might have been more effective to just randomly rotate the pointer and hope that sometimes it worked out.
Tried some variations of py::keep_alive
here:
iree/runtime/bindings/python/vm.cc
Lines 945 to 947 in 045bf32
"Keep the buffer
argument alive while the returned VmModule
is alive"
Might have missed some syntax but that didn't seem to help.
Using gc.get_referrers
I can see that the weakref.finalize
code in map_memory()
is creating a reference to the Output
object. I see no other references to that object or any references to the returned pointer though.
Is there a way we can use from_buffer
here instead of from_address
?
iree/compiler/bindings/python/iree/compiler/api/ctypes_dl.py
Lines 273 to 290 in 045bf32
I'm wondering if we can get this instance variable to be populated:
_objects
This member is eitherNone
or a dictionary containing Python objects that need to be kept alive so that the memory block contents is kept valid. This object is only exposed for debugging; never modify the contents of this dictionary.
The code before #15975 used return memoryview((c_char * size).from_address(contents.value))
too.
Whatever can be made to work yes. The big hammer is to add a native pybind module for interop
Going to put this on hold for a bit.
- Crash repro: #17657
- ASan + Python gets useful information about the crash, see https://iree.dev/developers/debugging/sanitizers/#asan-addresssanitizer for instructions on setup
- Debug code: https://github.com/iree-org/iree/compare/main...ScottTodd:wrap-buffer-debugging?expand=1
- Workarounds:
- use
VmModule.copy_buffer
instead ofVmModule.wrap_buffer
- write to a file then load from it instead of staying in memory and using
Output.open_membuffer()
- use