iree-org / iree

A retargetable MLIR-based machine learning compiler and runtime toolkit.

Home Page:http://iree.dev/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Crash / ASan issues using VmModule.wrap_buffer from Python

ScottTodd opened this issue · comments

(Lots of red herrings here, see #17635 (comment) for latest issue description)


Splitting off of llvm/torch-mlir#3433 to discuss IREE-specific details.

I wrote some test cases for https://pytorch.org/docs/stable/generated/torch.Tensor.index_put_.html starting from PyTorch and going through iree-turbine / torch-mlir to IREE: https://gist.github.com/ScottTodd/1e95795e79d17964078217ca98a3a398. Latest results:

test_single_value                  | PASS
test_multiple_values               | PASS (then crash)
test_broadcast_value_along_axis    | FAIL
test_broadcast_value_along_indices | FAIL
test_broadcast_values_along_axis   | PASS (then crash)

The two "pass then crash" test cases used to pass without crashing and I found some interesting results bisecting through IREE releases.

release test result
iree_compiler==20240514.893 PASS --> CRASH
iree_compiler==20240508.887
iree_compiler==20240507.886 Error invoking function ...
iree_compiler==20240506.885
iree_compiler==20240505.884
iree_compiler==20240504.883
iree_compiler==20240503.882
iree_compiler==20240502.881
iree_compiler==20240501.880 Error invoking function ...
iree_compiler==20240430.879
iree_compiler==20240429.878
iree_compiler==20240428.877
iree_compiler==20240427.876 Error invoking function ...
iree_compiler==20240426.875
iree_compiler==20240425.874 Error invoking function ...
iree_compiler==20240424.873 Error invoking function ...
iree_compiler==20240423.872 PASS --> CRASH
iree_compiler==20240422.871
iree_compiler==20240421.870
iree_compiler==20240420.869 PASS --> CRASH
iree_compiler==20240419.868 PASS --> CRASH
iree_compiler==20240418.867 PASS
iree_compiler==20240417.866 PASS
iree_compiler==20240416.865
iree_compiler==20240415.864
iree_compiler==20240414.863
iree_compiler==20240412.861
iree_compiler==20240411.860
iree_compiler==20240410.859 PASS

Crash (during shutdown, maybe from writing out of bounds?):

======================================== 1 passed in 3.60s ======================================== Exception Code: 0xC0000005
 #0 0x00007ffa6c89a495 PyInit__runtime (D:\dev\projects\SHARK-TestSuite\iree_tests\pytorch\operators\.venv\Lib\site-packages\iree\_runtime_libs\_runtime.cp311-win_amd64.pyd+0x4a495)
 ...
 #5 0x00007ffa6c91bf1c nanobind::python_error::what(void) const (D:\dev\projects\SHARK-TestSuite\iree_tests\pytorch\operators\.venv\Lib\site-packages\iree\_runtime_libs\_runtime.cp311-win_amd64.pyd+0xcbf1c)

"Error invoking function" (fixed by #17339):

    def _invoke(self, arg_list, ret_list):
>       self._vm_context.invoke(self._vm_function, arg_list, ret_list)
E       ValueError: Error invoking function: D:\a\iree\iree\c\runtime\src\iree\hal\command_buffer_validation.c:363: INVALID_ARGUMENT; source and target ranges overlap within the same buffer; while invoking native function hal.command_buffer.copy_buffer; while calling import;
E       [ 2]   native hal.command_buffer.copy_buffer:0 -
E       [ 1] bytecode module.main$async:526 d:\dev\projects\SHARK-TestSuite\iree_tests\pytorch\operators\index_put_test.py:88:0
E       [ 0] bytecode module.main:62 d:\dev\projects\SHARK-TestSuite\iree_tests\pytorch\operators\index_put_test.py:88:0

20240418.867 to 20240419.868, pass --> crash

Between 20240418.867 and 20240419.868, the test_multiple_values test case started crashing.

before: https://github.com/iree-org/iree/releases/tag/candidate-20240418.867
after: https://github.com/iree-org/iree/releases/tag/candidate-20240419.868

diff (ff624dd...a2476ce):

λ git log --oneline candidate-20240418.867..candidate-20240419.868
a2476ceef5 [metal] Disable failing semaphore submission test until fixing (#17100)
0e1e6bfe7a Clarify fusion heuristic (#17098)
125f4200d8 [CodeGen] Add a pattern to fold extract_slice consumer into xfer.write. (#17067)
f755b42289 [Codegen] Add folding in createBoundedTileSize for partially dynamic wgSize. (#17089)
a0b4853dc7 Fixes to enable out-of-tree plugin builds. (#17095)
bd1b10626c [CodeGen] Drop encoding for HAL and Flow ops when DT is not supported. (#17081)
886c4169ac [Winograd] Use TilingInterface for all levels of winograd op tiling (#17061)
244195960f Add GPU dialect dependencies to C/Python bindings. (#17090)
ab4babee07 [ConstEval] Add flag to adjust tensor size limit for hoisting (#17064)
eec081c2c6 [LLVMGPU] Fallback if dynamic dim found on vector distribute. (#17085)
f86f21cd80 [python] Expose MLIR python bindings for gpu and transform (#17088)
4778f5f909 Categorize matmul/metvec like generic ops for dispatches (#17084)
f32a87ccd2 [Flow] Move elementwise op fusion and bubble up expand shapes patterns into their own pass. (#17068)
3677fbcce8 [runtime] Add semaphore test where 2 batches wait on a former batch amongst 2 (#17080)

20240423.872 to 20240424.873 crash --> error invoking function

before: https://github.com/iree-org/iree/releases/tag/candidate-20240423.872
after: https://github.com/iree-org/iree/releases/tag/candidate-20240424.873

diff (f5660ee...59532d3):

λ git log --oneline candidate-20240423.872..candidate-20240424.873
59532d30c0 Extend DecomposeConvolutionToLowerDimOpsPass (#17069)
1b4c76f931 Adding a hoistable attr interface to allow attaching attrs to hoists. (#17139)
729ebc642f [runtime][metal] exclude properly the failing semaphore test (#17151)
30acc53257 [runtime][cts] add test where a device batch signals another and the host (#17138)
94728e34c2 [LinalgExt][Winograd] Add winograd.filter_transform op (#17102)
4bc90e7431 Run pytorch/models tests on AMDGPU with Vulkan. (#17129)
330651efa6 [plugins][ROCM] Fix minor loc source resize bug (#17133)
f7098e3f66 Moving regression suite to azure (#17140)
290d812d96 [runtime][cts] add semaphore test where a batch waits on another and a host signal (#17130)
8736479f5d Prevent stream.async.update folding unless confirmed safe. (#17135)
7192e8c195 [Preprocessing] Add support for general ContractionOps and handle dynamic dimension using pad + expand. (#17123)
d7db353adf Add initial website documentation for ONNX frontend. (#17004)
568bb31780 [runtime][cts] Add test waiting on a semaphore for finite time and fix Vulkan driver (#17126)
655b71a282 Executable library call hooks system, and a sample Linux/CPU event implementation (#15803)
3dde925d11 [VectorDistribution] Add distribution pattern for vector::MultiDimReductionOp (#17076)

Oooh, the test crash is flaky. That makes bisecting trickier... would be easier on Linux with ASan.

I tried around 10 times at iree_compiler==20240418.867 and can't repro there.

Still verifying with ASan on Linux to avoid the flakes, but my bisect on Windows seems to be pointing to f32a87ccd2 for the test crash. That's "mostly" an NFC according to @MaheshRavishankar 🤔

Bah, having trouble pinning this down.

  • Linux segfaults in my Python unit test on package versions as old as 20240410.859 (and possibly earlier).
  • Linux source builds with ASan at tip of tree report no issues with the test case when run through the iree-compile and iree-run-module CLI tools. Trying to also run through the Python bindings but hitting ImportError: cannot import name 'ir' from 'iree.compiler._mlir_libs._mlir' and other venv setup issues. (had built in one venv and then tried to use from another venv)
  • On Windows I diffed the --mlir-print-ir-after-all before and after f32a87ccd2 and there aren't many changes: codegen IR is identical before SerializeTargetExecutablesPass and a few passes moved around but that's about it.

:/ yeah, .vmfb files produced before and after f32a87ccd2 are identical and that commit didn't modify runtime code or python bindings. That's what my bisect pointed to though.

Going to keep trying with ASan on Linux to see if I can get a 100% repro case instead of the flaky crashes on Windows.

Following https://iree.dev/developers/debugging/sanitizers/#asan-addresssanitizer on Linux got me a source Python build with ASan on Linux that reports errors in the test case I've been working with:

(.venv) scotttodd@scotttodd-cpu:~/scratch/tests$ LD_PRELOAD=/usr/lib/llvm-14/lib/clang/14.0.0/lib/linux/libclang_rt.asan-x86_64.so pytest --log-cli-level=info index_put_test.py::TestIndexPut::test_multiple_values
================================================================================================== test session starts ===================================================================================================
platform linux -- Python 3.11.0rc1, pytest-8.2.2, pluggy-1.5.0
rootdir: /home/scotttodd/scratch/tests
collected 1 item                                                                                                                                                                                                         

index_put_test.py::TestIndexPut::test_multiple_values 
----------------------------------------------------------------------------------------------------- live log call ------------------------------------------------------------------------------------------------------
INFO     index_put_test:index_put_test.py:110 module @module {
  func.func @main(%arg0: !torch.tensor<[3,6],f32>) -> !torch.vtensor<[3,6],f32> attributes {torch.assume_strict_symbolic_shapes} {
    %0 = torch.vtensor.literal(dense_resource<torch_tensor_3_torch.int64> : tensor<3xsi64>) : !torch.vtensor<[3],si64>
    %1 = torch.vtensor.literal(dense_resource<torch_tensor_3_torch.int64_1> : tensor<3xsi64>) : !torch.vtensor<[3],si64>
    %2 = torch.vtensor.literal(dense_resource<torch_tensor_3_torch.float32> : tensor<3xf32>) : !torch.vtensor<[3],f32>
    %3 = torch.copy.to_vtensor %arg0 : !torch.vtensor<[3,6],f32>
    %none = torch.constant.none
    %4 = torch.aten.clone %0, %none : !torch.vtensor<[3],si64>, !torch.none -> !torch.vtensor<[3],si64>
    %none_0 = torch.constant.none
    %5 = torch.aten.clone %1, %none_0 : !torch.vtensor<[3],si64>, !torch.none -> !torch.vtensor<[3],si64>
    %none_1 = torch.constant.none
    %6 = torch.aten.clone %2, %none_1 : !torch.vtensor<[3],f32>, !torch.none -> !torch.vtensor<[3],f32>
    %7 = torch.prim.ListConstruct %4, %5 : (!torch.vtensor<[3],si64>, !torch.vtensor<[3],si64>) -> !torch.list<optional<vtensor>>
    %false = torch.constant.bool false
    %8 = torch.aten.index_put %3, %7, %6, %false : !torch.vtensor<[3,6],f32>, !torch.list<optional<vtensor>>, !torch.vtensor<[3],f32>, !torch.bool -> !torch.vtensor<[3,6],f32>
    torch.overwrite.tensor.contents %8 overwrites %arg0 : !torch.vtensor<[3,6],f32>, !torch.tensor<[3,6],f32>
    return %8 : !torch.vtensor<[3,6],f32>
  }
}

{-#
  dialect_resources: {
    builtin: {
      torch_tensor_3_torch.int64: "0x08000000000000000000000001000000000000000200000000000000",
      torch_tensor_3_torch.int64_1: "0x08000000030000000000000004000000000000000500000000000000",
      torch_tensor_3_torch.float32: "0x04000000CDCCCC3DCDCC4C3E9A99993E"
    }
  }
#-}

INFO     index_put_test:index_put_test.py:128 [[0.  0.  0.  0.1 0.  0. ]
 [0.  0.  0.  0.  0.2 0. ]
 [0.  0.  0.  0.  0.  0.3]]
PASSED                                                                                                                                                                                                             [100%]

=================================================================================================== 1 passed in 5.95s ====================================================================================================
AddressSanitizer:DEADLYSIGNAL
=================================================================
==78453==ERROR: AddressSanitizer: SEGV on unknown address 0x7f04a3c4d074 (pc 0x7f04a12ad2d8 bp 0x000000000000 sp 0x7ffe78c08730 T0)
==78453==The signal is caused by a READ memory access.
    #0 0x7f04a12ad2d8  (/home/scotttodd/scratch/tests/.venv/lib/python3.11/site-packages/iree/_runtime_libs/_runtime.cpython-311-x86_64-linux-gnu.so+0xae2d8) (BuildId: 32e87a22f20d0241)
    #1 0x7f04a1321d78  (/home/scotttodd/scratch/tests/.venv/lib/python3.11/site-packages/iree/_runtime_libs/_runtime.cpython-311-x86_64-linux-gnu.so+0x122d78) (BuildId: 32e87a22f20d0241)
    #2 0x7f04a1321b86  (/home/scotttodd/scratch/tests/.venv/lib/python3.11/site-packages/iree/_runtime_libs/_runtime.cpython-311-x86_64-linux-gnu.so+0x122b86) (BuildId: 32e87a22f20d0241)
    #3 0x7f04a125482d  (/home/scotttodd/scratch/tests/.venv/lib/python3.11/site-packages/iree/_runtime_libs/_runtime.cpython-311-x86_64-linux-gnu.so+0x5582d) (BuildId: 32e87a22f20d0241)
    #4 0x5af471  (/usr/bin/python3.11+0x5af471) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)
    #5 0x4d8278  (/usr/bin/python3.11+0x4d8278) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)
    #6 0x62a28d  (/usr/bin/python3.11+0x62a28d) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)
    #7 0x654986 in PyGC_Collect (/usr/bin/python3.11+0x654986) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)
    #8 0x646d40 in Py_FinalizeEx (/usr/bin/python3.11+0x646d40) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)
    #9 0x668cfb in Py_Exit (/usr/bin/python3.11+0x668cfb) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)
    #10 0x6555de  (/usr/bin/python3.11+0x6555de) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)
    #11 0x6555a5 in PyErr_PrintEx (/usr/bin/python3.11+0x6555a5) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)
    #12 0x650a91 in _PyRun_SimpleFileObject (/usr/bin/python3.11+0x650a91) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)
    #13 0x650832 in _PyRun_AnyFileObject (/usr/bin/python3.11+0x650832) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)
    #14 0x64f786 in Py_RunMain (/usr/bin/python3.11+0x64f786) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)
    #15 0x61ee0c in Py_BytesMain (/usr/bin/python3.11+0x61ee0c) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)
    #16 0x7f04a4429d8f  (/lib/x86_64-linux-gnu/libc.so.6+0x29d8f) (BuildId: c289da5071a3399de893d2af81d6a30c62646e1e)
    #17 0x7f04a4429e3f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x29e3f) (BuildId: c289da5071a3399de893d2af81d6a30c62646e1e)
    #18 0x61ec94 in _start (/usr/bin/python3.11+0x61ec94) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV (/home/scotttodd/scratch/tests/.venv/lib/python3.11/site-packages/iree/_runtime_libs/_runtime.cpython-311-x86_64-linux-gnu.so+0xae2d8) (BuildId: 32e87a22f20d0241) 
==78453==ABORTING

That's at tip of tree. Going to sync back to the suspected commit ranges to see if ASan similarly complains there.

Thanks @ScottTodd . If it does get back to that commit then let me know.

Having a hard time running my python test from an older commit.

  • Had to downgrade nanobind to build the IREE Python bindings from 2 months ago

  • Seeing this error when used together with the latest (pip installable) iree-turbine package:

    Traceback:
    /usr/lib/python3.11/importlib/__init__.py:126: in import_module
        return _bootstrap._gcd_import(name[level:], package, level)
    index_put_test.py:12: in <module>
        import shark_turbine.aot as aot
    .venv/lib/python3.11/site-packages/shark_turbine/aot/__init__.py:7: in <module>
        from .builtins import *
    .venv/lib/python3.11/site-packages/shark_turbine/aot/builtins/__init__.py:7: in <module>
        from .globals import *
    .venv/lib/python3.11/site-packages/shark_turbine/aot/builtins/globals.py:12: in <module>
        from ..support.procedural import (
    .venv/lib/python3.11/site-packages/shark_turbine/aot/support/procedural/__init__.py:13: in <module>
        from .base import *
    .venv/lib/python3.11/site-packages/shark_turbine/aot/support/procedural/base.py:22: in <module>
        from ....support.ir_imports import (
    .venv/lib/python3.11/site-packages/shark_turbine/support/ir_imports.py:10: in <module>
        from iree.compiler.ir import (
    ../../iree-build/compiler/bindings/python/iree/compiler/ir.py:5: in <module>
        from ._mlir_libs._mlir.ir import *
    ../../iree-build/compiler/bindings/python/iree/compiler/_mlir_libs/__init__.py:180: in <module>
        _site_initialize()
    ../../iree-build/compiler/bindings/python/iree/compiler/_mlir_libs/__init__.py:78: in _site_initialize
        from ._mlir import ir
    E   ImportError: /home/scotttodd/iree-build/compiler/bindings/python/iree/compiler/_mlir_libs/_mlir.cpython-311-x86_64-linux-gnu.so: undefined symbol: mlirSetGlobalDebugTypes, version IREE_API_0.0
  • Seeing alignment issues after switching from iree-turbine to just from iree.compiler import compile_file on an already exported .mlir file:

            compiled_module = compile_file("/home/scotttodd/scratch/tests/index_put_multiple_values.mlir", target_backends=["llvm-cpu"])
    
            config = ireert.Config("local-sync")
            vm_module = ireert.load_vm_module(
    >           ireert.VmModule.wrap_buffer(
                    # config.vm_instance, compiled_module.map_memory()
                    config.vm_instance, compiled_module
                ),
                config,
            )
    E       ValueError: VmModule.from_aligned_memory received an unaligned buffer. Got 0x0x625001860120, expected alignment 64

Argh, red herrings all around.

Running a trivial test case instead of index_put

class TorchModule(torch.nn.Module):
    def forward(self, input):
        return input + torch.ones(3, 4)

through Python in this setup with ASan on tip of tree also produces

==246621==ERROR: AddressSanitizer: SEGV on unknown address 0x7f96b787e050 (pc 0x7f96b43422d8 bp 0x000000000000 sp 0x7ffdeacb5240 T0)
==246621==The signal is caused by a READ memory access.

the problematic lines for the ASan error are

vm_module = ireert.load_vm_module(
    ireert.VmModule.wrap_buffer(
        config.vm_instance, compiled_module.map_memory()
    ),
    config,
)

Soooo... why does my TestIndexPut::test_single_value test case succeed on Windows while the TestIndexPut::test_multiple_values test case succeeds then crashes? Maybe the test cases write to different elements, near the edges of the buffers:

# Passing
[  #  col 0   col 1   col 2   col 3
    [0.0000, 0.0000, 0.0000, 0.0000],  # row 0
    [0.0000, 0.0000, 0.5000, 0.0000],  # row 1
    [0.0000, 0.0000, 0.0000, 0.0000],  # row 2
]

# Failing
[  #  col 0   col 1   col 2   col 3  col 4   col 5
    [0.0000, 0.0000, 0.0000, 0.1000, 0.0000, 0.0000],  # row 0
    [0.0000, 0.0000, 0.0000, 0.0000, 0.2000, 0.0000],  # row 1
    [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.3000],  # row 2
]

Then also why is this behavior so difficult to pin down? Is ASan missing something? Is Python a relevant detail? The index put operations are trying to do in-place processing that could be hitting a special path in Python vs through iree-run-module. Could try building a .c test file using the C API.

Maybe the test cases write to different elements, near the edges of the buffers:

Yep! This put of a single value also passes:

# indices=[torch.tensor([0]), torch.tensor([3])],
[  #  col 0   col 1   col 2   col 3
    [0.0000, 0.0000, 0.0000, 0.5000],  # row 0
    [0.0000, 0.0000, 0.0000, 0.0000],  # row 1
    [0.0000, 0.0000, 0.0000, 0.0000],  # row 2
]

but these (writing a single value into the last row) crash:

# indices=[torch.tensor([2]), torch.tensor([3])],
[  #  col 0   col 1   col 2   col 3
    [0.0000, 0.0000, 0.0000, 0.0000],  # row 0
    [0.0000, 0.0000, 0.0000, 0.0000],  # row 1
    [0.0000, 0.0000, 0.0000, 0.5000],  # row 2
]

# indices=[torch.tensor([2]), torch.tensor([0])],
[  #  col 0   col 1   col 2   col 3
    [0.0000, 0.0000, 0.0000, 0.0000],  # row 0
    [0.0000, 0.0000, 0.0000, 0.0000],  # row 1
    [0.5000, 0.0000, 0.0000, 0.0000],  # row 2
]

Writing into the last row of a [3, 4] tensor crashes. Writing into the last row of a [4, 4] or [6, 6] tensor does not. The full model from nod-ai/sharktank#22 was using [?,16,32,100]. I'm wondering if the dynamic dim was odd?

After restarting my machine, I'm only seeing the Python test crash in about 1/30 runs. That's going to make it hard to verify which test cases are definitely working and which aren't. Results were very consistent yesterday...

Bah, just saw a crash writing into the middle of a 4x4 tensor. The crash I'm seeing might be unique to the Python bindings and maybe not even unique to index_put_. Logs: https://gist.github.com/ScottTodd/a0c0e68d1abeb3240f782045c4c70e80

Aw, saw the same Python crash performing an elementwise add on a 4x4 tensor...

Minimal repro for the Python AddressSanitizer report (note: this shows up when using iree-turbine with example code like https://github.com/iree-org/iree-turbine/blob/4b451f84b03f87af21a9b785b0ddd68094f43ed8/examples/aot_mlp/mlp_export_simple.py#L45-L49)

Seems to be related to VmModule.wrap_buffer() and map_memory() on a compiler output. Using VmModule.copy_buffer() avoids the ASan issue.

import iree.runtime as ireert

from iree.compiler.api import (
    Session,
    Source,
    Output,
)

session = Session()
session.set_flags("--iree-hal-target-backends=vmvx")
inv = session.invocation()
source = Source.wrap_buffer(
    session,
    b"""
builtin.module {
  func.func @abs(%input : tensor<4xf32>) -> (tensor<4xf32>) {
    %result = math.absf %input : tensor<4xf32>
    return %result : tensor<4xf32>
  }
}""",
)
inv.parse_source(source)
inv.execute()
out = Output.open_membuffer()
inv.output_vm_bytecode(out)

config = ireert.Config("local-sync")
# ASan issue goes away if this is commented out
vm_module = ireert.load_vm_module(
    ireert.VmModule.wrap_buffer(config.vm_instance, out.map_memory()),
    config,
)
(.venv) scotttodd@scotttodd-cpu:~/scratch/tests$ LD_PRELOAD=/usr/lib/llvm-14/lib/clang/14.0.0/lib/linux/libclang_rt.asan-x86_64.so ASAN_OPTIONS=detect_leaks=0 python asan_minimal_repro.py 
AddressSanitizer:DEADLYSIGNAL
=================================================================
==39396==ERROR: AddressSanitizer: SEGV on unknown address 0x7fbf57968050 (pc 0x7fbffd1712d8 bp 0x000000000000 sp 0x7fffe15017f0 T0)
==39396==The signal is caused by a READ memory access.
    #0 0x7fbffd1712d8  (/home/scotttodd/scratch/tests/.venv/lib/python3.11/site-packages/iree/_runtime_libs/_runtime.cpython-311-x86_64-linux-gnu.so+0xae2d8) (BuildId: 32e87a22f20d0241)
    #1 0x7fbffd1e5d78  (/home/scotttodd/scratch/tests/.venv/lib/python3.11/site-packages/iree/_runtime_libs/_runtime.cpython-311-x86_64-linux-gnu.so+0x122d78) (BuildId: 32e87a22f20d0241)
    #2 0x7fbffd1e5b86  (/home/scotttodd/scratch/tests/.venv/lib/python3.11/site-packages/iree/_runtime_libs/_runtime.cpython-311-x86_64-linux-gnu.so+0x122b86) (BuildId: 32e87a22f20d0241)
    #3 0x7fbffd11882d  (/home/scotttodd/scratch/tests/.venv/lib/python3.11/site-packages/iree/_runtime_libs/_runtime.cpython-311-x86_64-linux-gnu.so+0x5582d) (BuildId: 32e87a22f20d0241)
    #4 0x5af471  (/usr/bin/python3.11+0x5af471) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)
    #5 0x4d8278  (/usr/bin/python3.11+0x4d8278) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)
    #6 0x6557cf  (/usr/bin/python3.11+0x6557cf) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)
    #7 0x654f99  (/usr/bin/python3.11+0x654f99) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)
    #8 0x646d4d in Py_FinalizeEx (/usr/bin/python3.11+0x646d4d) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)
    #9 0x64f5ea in Py_RunMain (/usr/bin/python3.11+0x64f5ea) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)
    #10 0x61ee0c in Py_BytesMain (/usr/bin/python3.11+0x61ee0c) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)
    #11 0x7fbfff629d8f  (/lib/x86_64-linux-gnu/libc.so.6+0x29d8f) (BuildId: c289da5071a3399de893d2af81d6a30c62646e1e)
    #12 0x7fbfff629e3f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x29e3f) (BuildId: c289da5071a3399de893d2af81d6a30c62646e1e)
    #13 0x61ec94 in _start (/usr/bin/python3.11+0x61ec94) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV (/home/scotttodd/scratch/tests/.venv/lib/python3.11/site-packages/iree/_runtime_libs/_runtime.cpython-311-x86_64-linux-gnu.so+0xae2d8) (BuildId: 32e87a22f20d0241) 
==39396==ABORTING

interesting - the compiler Source seems to hang on to the buffer provided in wrap_buffer, but you could try adding a print to its close() - the buffer as provided to ireeCompilerSourceWrapBuffer is unowned (it's just a char*) so if the python side doesn't keep it live it'll go boom

iow, that Source instance must remain live until the iree_vm_module_t created from it is deleted - you could add a printf in there to see if they happen out of order

Try with copy_buffer instead of wrap_buffer just to eliminate variables. Removes any potential alignment or ownership issues.

Ok, so that is a definite bug with wrap_buffer used in that way. Will need to fortify testing of that and fix.

But probably not what you are trying to find...

Try with copy_buffer instead of wrap_buffer just to eliminate variables. Removes any potential alignment or ownership issues.

My tests pass a clean ASan report when I use copy_buffer instead of wrap_buffer. That might be enough to call the lowering/codegen/runtime for index_put_ correct (except for the unhandled broadcast case in the torch-mlir lowering, tracked here: llvm/torch-mlir#3433).

you could add a printf in there to see if they happen out of order

Trying this now. Printfs seem to be working from compiler Python code but I'm not seeing my changes to the runtime reflected in my venv... strange (my PYTHONPATH and build setup both seem fine...).

Sorta figured out my python bindings debug setup:

  • Building with CMake and setting PYTHONPATH was not enough to get updated C++ code in the runtime, at least when I already had iree-runtime installed from pip. Maybe uninstalling from pip would help. What I ended up doing was python -m pip wheel runtime/ -v to build a wheel then pip install iree_runtime-0.dev0+[...].whl

I found that if I change BoundModule in system_api.py to no longer retain the SystemContext:

class BoundModule:
"""Wraps a VmModule with its context and provides nice python accessors.
Resolves item access (["foo"]) as function resolution.
"""
def __init__(self, context: SystemContext, vm_module: _binding.VmModule):
self._context = context

then the ASan error goes away. Something is trying to access memory that it shouldn't be, just not sure what specifically.

Printf debugging:

test.py calling Source.wrap_buffer(session, ...)
--- CompilerDriver.cpp: ireeCompilerSourceWrapBuffer, length: 167, isNullTerminated: 0 ---
test.py calling inv.parse_source(source)
test.py calling inv.execute()
test.py calling out = Output.open_membuffer()
--- CompilerDriver.cpp: ireeCompilerOutputOpenMembuffer ---
test.py calling inv.output_vm_bytecode(out)
--- CompilerDriver.cpp: ireeCompilerInvocationOutputVMBytecode ---
test.py calling ireert.VmModule.wrap_buffer
--- CompilerDriver.cpp: ireeCompilerOutputMapMemory ---
--- vm.cc: VmModule::WrapBuffer, (edit) close_buffer: 0 ---
test.py calling ireert.load_vm_module
system_api.py load_vm_module --> load_vm_modules
system_api.py load_vm_modules
system_api.py: SystemContext::__init__
system_api.py: SystemContext::__init__ is *not* dynamic
system_api.py: SystemContext::__init__ setup self._bound_modules
test.py finished
--- CompilerDriver.cpp: ireeCompilerSourceDestroy start ---
--- CompilerDriver.cpp: ireeCompilerSourceDestroy finish ---
AddressSanitizer:DEADLYSIGNAL
=================================================================
==189798==ERROR: AddressSanitizer: SEGV on unknown address

Let's fork this repro into an issue for wrap_buffer from the compiler API. I've seen evidence of something like this before but have had trouble getting it to repro.

Sure. I'm not sure if there's actually an issue with index_put_ at all... possibly all red herrings in my testing that were caused by wrap_buffer. Can still start a more focused issue.

Frustrating.

This test looks quite similar to what I'm testing: https://github.com/iree-org/iree/blob/main/compiler/bindings/python/test/api/output_buffer_reference_test.py, and that is ASan-clean.

I'll see where the code diverges and try to file a more focused issue once I know more.

If it turns out this is the issue, we can just rename this issue and not have another one. Was this the only thing wrong the whole time?

This test looks quite similar to what I'm testing: https://github.com/iree-org/iree/blob/main/compiler/bindings/python/test/api/output_buffer_reference_test.py, and that is ASan-clean.

I'll see where the code diverges and try to file a more focused issue once I know more.

I mean, I've "fixed" this bug a couple of times. Something subtle is wrong and I think it needs a closer look at actual reference counts or something.

  • output_buffer_reference_test stops at
        module = VmModule.wrap_buffer(instance, mapped_memory)
        context = VmContext(instance, modules=[module])
  • my test stops at
    vm_module = ireert.load_vm_module(
        ireert.VmModule.wrap_buffer(config.vm_instance, out.map_memory()),
        config,
    )

The setup appears to all be the same.

Was this the only thing wrong the whole time?

Unclear. We originally hit crashes in decode() for the sharktank llama model in Python and native tools that we were trying to reduce. Some forms of reducing the test hit unsupported lowerings (broadcasting behavior). I think there's still a different crash involved somewhere since this latest line of debugging involves only the Python bindings.

  • Building with CMake and setting PYTHONPATH was not enough to get updated C++ code in the runtime, at least when I already had iree-runtime installed from pip. Maybe uninstalling from pip would help. What I ended up doing was python -m pip wheel runtime/ -v to build a wheel then pip install iree_runtime-0.dev0+[...].whl

Aha! Uninstalling the python runtime wheel and just setting PYTHONPATH to the build worked, and now I get stack traces from ASan!

scotttodd@scotttodd-cpu:~/scratch/tests$ LD_PRELOAD=/usr/lib/llvm-14/lib/clang/14.0.0/lib/linux/libclang_rt.asan-x86_64.so ASAN_OPTIONS=detect_leaks=0 ASAN_SYMBOLIZER_PATH=/usr/lib/llvm-14/bin/llvm-symbolizer python asan_repro_debugging.py 
test.py calling Source.wrap_buffer(session, ...)
CompilerDriver.cpp: ireeCompilerSourceWrapBuffer, length: 167, isNullTerminated: 0 ---
test.py calling inv.parse_source(source)
test.py calling inv.execute()
test.py calling out = Output.open_membuffer()
CompilerDriver.cpp: ireeCompilerOutputOpenMembuffer 
test.py calling inv.output_vm_bytecode(out)
CompilerDriver.cpp: ireeCompilerInvocationOutputVMBytecode 
test.py calling out.map_memory()
ctypes_dl.py :: Output::map_memory
CompilerDriver.cpp: ireeCompilerOutputMapMemory 
test.py calling ireert.VmModule.wrap_buffer()
vm.cc: VmModule::WrapBuffer, close_buffer: 0 
vm.cc: VmModule::WrapBuffer, iree_vm_bytecode_module_create
VMFB Length = 5558
test.py calling ireert.load_vm_module
system_api.py load_vm_module --> load_vm_modules
system_api.py load_vm_modules
system_api.py: SystemContext::__init__
system_api.py: SystemContext::__init__ is *not* dynamic
system_api.py: SystemContext::__init__ setup self._bound_modules
test.py finished
CompilerDriver.cpp: ireeCompilerSourceDestroy start 
CompilerDriver.cpp: ireeCompilerSourceDestroy finish
AddressSanitizer:DEADLYSIGNAL
=================================================================
==229852==ERROR: AddressSanitizer: SEGV on unknown address 0x7f66510ff050 (pc 0x7f66efa5f25e bp 0x7fff9db6e9d0 sp 0x7fff9db6e950 T0)
==229852==The signal is caused by a READ memory access.
+   #0 0x7f66efa5f25e in __flatbuffers_soffset_read /home/scotttodd/iree/third_party/flatcc/include/flatcc/flatcc_endian.h:89:2
+   #1 0x7f66efa5f25e in __flatbuffers_soffset_read_from_pe /home/scotttodd/iree/third_party/flatcc/include/flatcc/flatcc_endian.h:89:2
+   #2 0x7f66efa5f25e in iree_vm_BytecodeModuleDef_exported_functions /home/scotttodd/iree-build/runtime/src/iree/schemas/bytecode_module_def_reader.h:693:1
+   #3 0x7f66efa5f25e in iree_vm_bytecode_module_lookup_function /home/scotttodd/iree/runtime/src/iree/vm/bytecode/module.c:292:9
+   #4 0x7f66efb5b497 in iree_vm_context_run_function /home/scotttodd/iree/runtime/src/iree/vm/context.c:77:26
+   #5 0x7f66efb5b497 in iree_vm_context_release_modules /home/scotttodd/iree/runtime/src/iree/vm/context.c:269:5
+   #6 0x7f66efb5acba in iree_vm_context_destroy /home/scotttodd/iree/runtime/src/iree/vm/context.c:357:5
+   #7 0x7f66ef9c0cbe in iree::python::ApiPtrAdapter<iree_vm_context_t>::Release(iree_vm_context_t*) /home/scotttodd/iree/runtime/bindings/python/./vm.h:42:47
+   #8 0x7f66ef9c0cbe in iree::python::ApiRefCounted<iree::python::VmContext, iree_vm_context_t>::Release() /home/scotttodd/iree/runtime/bindings/python/./binding.h:107:7
+   #9 0x7f66ef9c0cbe in iree::python::ApiRefCounted<iree::python::VmContext, iree_vm_context_t>::~ApiRefCounted() /home/scotttodd/iree/runtime/bindings/python/./binding.h:59:22
+   #10 0x7f66ef9e40f5 in nanobind::detail::inst_dealloc(_object*) /home/scotttodd/iree/.venv/lib/python3.11/site-packages/nanobind/src/nb_type.cpp:229:13
    #11 0x5af471  (/usr/bin/python3.11+0x5af471) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)
    #12 0x4d8278  (/usr/bin/python3.11+0x4d8278) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)
    #13 0x6557cf  (/usr/bin/python3.11+0x6557cf) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)
    #14 0x654f99  (/usr/bin/python3.11+0x654f99) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)
    #15 0x646d4d in Py_FinalizeEx (/usr/bin/python3.11+0x646d4d) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)
    #16 0x64f5ea in Py_RunMain (/usr/bin/python3.11+0x64f5ea) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)
    #17 0x61ee0c in Py_BytesMain (/usr/bin/python3.11+0x61ee0c) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)
    #18 0x7f66f2229d8f  (/lib/x86_64-linux-gnu/libc.so.6+0x29d8f) (BuildId: c289da5071a3399de893d2af81d6a30c62646e1e)
    #19 0x7f66f2229e3f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x29e3f) (BuildId: c289da5071a3399de893d2af81d6a30c62646e1e)
    #20 0x61ec94 in _start (/usr/bin/python3.11+0x61ec94) (BuildId: ead95fcf0410547669743f801bc8c549efbdf7ce)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV /home/scotttodd/iree/third_party/flatcc/include/flatcc/flatcc_endian.h:89:2 in __flatbuffers_soffset_read
==229852==ABORTING

So the VM is trying to run the __deinit function on the module, but the module was already freed...?

IREE_IGNORE_ERROR(iree_vm_context_run_function(
context, stack, module, iree_make_cstring_view("__deinit")));

Yeah, ASan is happy if I clear the loaded module before letting the program exit on its own:

loaded_module = ireert.load_vm_module(
    wrapped_buffer,
    config,
)
+loaded_module = None

This passes ASan, finishing with no leaks or segfaults:

    instance = VmInstance()
    output = Output.open_membuffer()
    output.write(vmfb_contents)
    mapped_memory = output.map_memory()
    module = VmModule.wrap_buffer(instance, mapped_memory)
    context = VmContext(instance, modules=[module])

This crashes and trips ASan with a segfault on a read memory access:

    instance = VmInstance()
    output = Output.open_membuffer()
    output.write(vmfb_contents)
    mapped_memory = output.map_memory()
    module = VmModule.wrap_buffer(instance, mapped_memory)
    # note this line is different!
    loaded_module = load_vm_module(module)

The source for that different line is here:

def load_vm_modules(*vm_modules, config: Optional[Config] = None):
"""Loads VmModules into a new SystemContext and returns them."""
context = SystemContext(vm_modules=vm_modules, config=config)
bound_modules = [context.modules[m.name] for m in vm_modules]
return bound_modules

the SystemContext class creates BoundModule classes that retain a reference to the self SystemContext:

self._bound_modules = BoundModules(
[(m.name, BoundModule(self, m)) for m in init_vm_modules]
)

Is that reference cycle throwing off the usual garbage collector / shutdown / destruction ordering?

garbage collection,
not-once-matty-matheson 1

(says an old school vb/c#/js dev)

Adding a weakref here makes ASan happy for tests that just load a module and then exit. That's not quite the right fix though, since tests that actually use the module seem to then be missing the object (already gc'd?) :P

            self._bound_modules = BoundModules(
-               [(m.name, BoundModule(self, m)) for m in init_vm_modules]
+               [(m.name, BoundModule(weakref.ref(self), m)) for m in init_vm_modules]
            )

We need to root cause where the ref isn't being retained... It shouldn't be possible to crash with pure python usage of the APIs.

This might actually be behind some other ghosts I've been carefully trying to catch for a while. There's a bit of art to tracking this to a root cause. I can't do it right now but could help.

You started down this path because of a specific failure scenario. Is this the root cause of that or just an incidental thing found along the way (just trying to understand whether we have more going on)?

You started down this path because of a specific failure scenario. Is this the root cause of that or just an incidental thing found along the way (just trying to understand whether we have more going on)?

I believe it's incidental along the way, but I'd love to be wrong there.

The sequence was roughly:

  1. We observed that the llama model from sharktank was crashing during the decode() step: nod-ai/sharktank#22 (comment), https://github.com/rsuderman/sharktank/blob/d9b25900a069f8200948da322de693e5d6bae15f/shortfin/shortfin/llm/impl/service_v1_cli.py#L101-L106 . We suspected out of bounds write in the kv_cache relying on the in-place index_put_ (source here), but there were too many unknowns to say for sure.

    ~/Repos/iree/build/tools/iree-run-module --module=/tmp/batch_llama_v1.vmfb  --function=decode_bs4 --parameters=model=/media/rsuderman/Disk2/Models/llama.slycho.gguf --input=4x1xsi64=0 --input=4xsi64=1 --input=4xsi64=1 --input=4x1xsi64=0,1,2,3 --input=8x2662400xf16=0.0
    
    EXEC @decode_bs4
    Exception thrown at 0x000001AA8B4EBFF8 in iree-run-module.exe: 0xC0000005: Access violation writing location 0xFFFF9B96ECCF4300.
    
    iree-run-module.exe!iree_elf_call_i_ppp() Line 184 (d:\dev\projects\iree\runtime\src\iree\hal\local\elf\arch\x86_64_msvc.asm:184)
    
  2. We attempted to reduce the full model down to a smaller model and some unit tests for index_put_. Some of those failed to compile, due to llvm/torch-mlir#3433 (unimplemented supported for broadcasting variants of the PyTorch index_put_ op in torch-mlir).

  3. Other unit tests produced correct outputs but crashed at runtime. The crash looked to be consistent enough to bisect across releases and I thought I had a culprit range of commits, but that was not the case. That led to this issue, which eventually narrowed in on the Python tests themselves hitting this buggy wrap_buffer shutdown / garbage collect behavior, independent of what the actual code in the test modules was doing.

So if this gets fixed, writing tests and exercising the code from Python will be more stable (especially with ASan enabled), but there is likely more debugging ahead - either going back to the full program or trying to build up component by component (e.g. verifying the behavior of just PagedKVCache.write_timestep and the other functions involved there).

Still debugging this, reading through the changes in #15975

The iree_compiler_output_t Output from compiler/bindings/python/iree/compiler/api/ctypes_dl.py is hitting its weakref.finalize(pointer, lambda x: ..., self) call before the deallcator from VmModule VmModule::WrapBuffer runs.

def map_memory(self) -> memoryview:
contents = c_void_p()
size = c_uint64()
_handle_error(
_dylib.ireeCompilerOutputMapMemory(
self._output_p, byref(contents), byref(size)
)
)
size = size.value
pointer = (c_char * size).from_address(contents.value)
# When the pointer is free'd, the no-op callback is invoked with
# the argument `self`. This implicitly keeps `self` alive until
# the callback is invoked, which keeps the compiler Output alive.
# The typical use of this pointer is to read it via the buffer
# protocol, and that will keep the pointer alive. Therefore, the
# chain is secure.
weakref.finalize(pointer, lambda x: ..., self)
return pointer
VmModule VmModule::WrapBuffer(VmInstance* instance, py::object buffer_obj,
py::object destroy_callback, bool close_buffer) {
IREE_TRACE_SCOPE_NAMED("VmModule::FromAlignedMemory");
// State object that is retained for the life of the module.
// It is responsible for keeping the backing resources alive and
// holding the user-level destroy callback.
// Note that the original buffer_obj is not captured explicitly but
// is available as part of the Py_buffer underlying the PyBufferRequest.
// Aside from being more efficient, avoiding redundant capture removes
// destruction race potential.
struct BufferState {
BufferState(py::object buffer_obj, py::object destroy_callback,
bool close_buffer)
: buffer_info(buffer_obj, PyBUF_SIMPLE),
destroy_callback(std::move(destroy_callback)),
close_buffer(close_buffer) {}
PyBufferRequest buffer_info;
py::object destroy_callback;
bool close_buffer;
py::handle get_buffer() { return py::handle(buffer_info.view().obj); }
};
BufferState* state =
new BufferState(buffer_obj, destroy_callback, close_buffer);
PyBufferRequest& buffer_info = state->buffer_info;
if (!iree_host_size_has_alignment((uintptr_t)buffer_info.view().buf,
IREE_HAL_HEAP_BUFFER_ALIGNMENT)) {
std::stringstream err;
err << "VmModule.from_aligned_memory received an unaligned buffer. ";
err << "Got 0x" << (void*)buffer_info.view().buf << ", expected alignment ";
err << IREE_HAL_HEAP_BUFFER_ALIGNMENT;
throw std::invalid_argument(err.str());
}
iree_vm_module_t* module = nullptr;
auto ctl_fn = +([](void* self, iree_allocator_command_t command,
const void* params, void** inout_ptr) {
py::gil_scoped_acquire gil;
assert(command == IREE_ALLOCATOR_COMMAND_FREE);
try {
// Destruction sequencing is tricky. We must have released the
// PyBufferRequest before calling close, so we first get what we
// need out of the state into local variables, then delete the state
// (releasing the PyBufferRequest), then closing and issuing the
// destroy callback. Getting the order wrong will result in an
// unrecoverable exception indicating the the buffer cannot be closed
// with outstanding mappings.
BufferState* state = static_cast<BufferState*>(self);
py::object destroy_callback = std::move(state->destroy_callback);
py::object buffer_to_close;
if (state->close_buffer) {
buffer_to_close = py::borrow(state->get_buffer());
}
delete state;
if (buffer_to_close) {
buffer_to_close.attr("close")();
}
if (!destroy_callback.is_none()) {
destroy_callback();
}
} catch (std::exception& e) {
// There are many situations where deallocation exceptions can be
// swallowed, so carp loudly. This is almost always a critical issue
// that needs to be visible.
fprintf(
stderr,
"error: exception raised while deallocating storage for an "
"iree.runtime.VmModule. This is unrecoverable and likely indicates a "
"serious problem, minimally resulting in memory leaks: %s",
e.what());
return iree_make_status(
IREE_STATUS_UNKNOWN,
"exception raised while deallocating storage for an "
"iree.runtime.VmModule. This is unrecoverable and likely indicates a "
"serious problem, minimally resulting in memory leaks: %s",
e.what());
}
return iree_ok_status();
});
iree_allocator_t deallocator{/*self=*/state, /*ctl=*/ctl_fn};
auto status = iree_vm_bytecode_module_create(
instance->raw_ptr(),
{static_cast<const uint8_t*>(buffer_info.view().buf),
static_cast<iree_host_size_t>(buffer_info.view().len)},
deallocator, iree_allocator_system(), &module);
if (!iree_status_is_ok(status)) {
delete state;
}
CheckApiStatus(status, "Error creating vm module from aligned memory");
auto py_module = VmModule::StealFromRawPtr(module);
// Stash a reference to the flatbuffer at the Python instance level. This
// is exposed to the tracing API, allowing it to get at the backing contents.
py_module.stashed_flatbuffer_blob = buffer_obj;
return py_module;
}

https://docs.python.org/3/library/gc.html#gc.set_debug this looks useful...

Output with gc.set_debug(gc.DEBUG_STATS | gc.DEBUG_COLLECTABLE | gc.DEBUG_UNCOLLECTABLE) and a ton of printfs: https://gist.github.com/ScottTodd/453834cb5731598f6f8a15b27700d518

I'm skeptical of this pattern:

pointer = (c_char * size).from_address(contents.value) 
weakref.finalize(pointer, lambda x: ..., self)
return pointer

I've made a number of attempts to add extra references, log what the garbage collector is doing, log which C++ constructors / destructors are running, etc. here: https://github.com/iree-org/iree/compare/main...ScottTodd:wrap-buffer-debugging?expand=1. Not sure what else to try. Reading various stackoverflow questions about python memoryview, from_address, garbage collection, and related systems now...

Also reading https://nanobind.readthedocs.io/en/latest/ownership.html now

I'm not seeing the destructors for BufferState or PyBufferRequest run before the weakref.finalize on Output.map_memory() runs, so they aren't holding onto a reference to the pointer / memoryview?

struct BufferState {
class PyBufferRequest {

I guess we're also interoping between nanobind and pybind here? The compiler uses pybind but the runtime uses nanobind.

Yes, lots of interop. It might have been more effective to just randomly rotate the pointer and hope that sometimes it worked out.

Tried some variations of py::keep_alive here:

.def_static("wrap_buffer", &VmModule::WrapBuffer, py::arg("instance"),
py::arg("buffer"), py::arg("destroy_callback") = py::none(),
py::arg("close_buffer") = false, kWrapBufferDocstring)

"Keep the buffer argument alive while the returned VmModule is alive"

Might have missed some syntax but that didn't seem to help.

Using gc.get_referrers I can see that the weakref.finalize code in map_memory() is creating a reference to the Output object. I see no other references to that object or any references to the returned pointer though.

Is there a way we can use from_buffer here instead of from_address?

def map_memory(self) -> memoryview:
contents = c_void_p()
size = c_uint64()
_handle_error(
_dylib.ireeCompilerOutputMapMemory(
self._output_p, byref(contents), byref(size)
)
)
size = size.value
pointer = (c_char * size).from_address(contents.value)
# When the pointer is free'd, the no-op callback is invoked with
# the argument `self`. This implicitly keeps `self` alive until
# the callback is invoked, which keeps the compiler Output alive.
# The typical use of this pointer is to read it via the buffer
# protocol, and that will keep the pointer alive. Therefore, the
# chain is secure.
weakref.finalize(pointer, lambda x: ..., self)
return pointer

I'm wondering if we can get this instance variable to be populated:

_objects
This member is either None or a dictionary containing Python objects that need to be kept alive so that the memory block contents is kept valid. This object is only exposed for debugging; never modify the contents of this dictionary.

The code before #15975 used return memoryview((c_char * size).from_address(contents.value)) too.

Whatever can be made to work yes. The big hammer is to add a native pybind module for interop

Going to put this on hold for a bit.