iree-org / iree

Summary of what happened:

LLVM integrate #17330 had to disable a number of e2e tests/benchmarks. Specifically, all tests compiling a .mlirbc source that contains a tensor.expand_shape op.
The reason is that llvm/llvm-project#90040 was a compatibility-breaking change to this op. The MLIR bytecode format version was not bumped, so it results in a cryptic error: #17330 (comment)

This issue is about re-enabling these tests. First, all these .mlirbc files need to be re-generated with tools rebuilt after llvm/llvm-project#90040.

was a compatibility-breaking change to this op. The MLIR bytecode format version was not bumped

That's expected. The bytecode format version is not tied to any particular dialect, and the tensor dialect makes no guarantees about its format or ops (unlike, say, VHLO from StableHLO). We've been fairly lucky recently in avoiding similar breaks.

I tried to regenerate the .mlirbc files for https://github.com/nod-ai/SHARK-TestSuite/tree/main/iree_tests/pytorch/models/resnet50 and https://github.com/nod-ai/SHARK-TestSuite/tree/main/iree_tests/pytorch/models/opt-125M, but hit issues with both. Need to apply more rigor to those frontend workflows.

resnet50 crashed in the compiler at mlir::iree_compiler::IREE::Util::serializeResourceRawData D:\dev\projects\iree\compiler\src\iree\compiler\Dialect\Util\IR\UtilAttrs.cpp:230:0
opt-125M regressed in PyTorch at some point - I can't even export it to torch-mlir (marked as failing, along with nearly all other models, at https://github.com/nod-ai/e2eshark-reports/blob/main/2024-05-07/turbine_reports/statusreport.md)

Looking through the history of https://github.com/iree-org/iree/commits/main/build_tools/python/e2e_test_framework/models/matmul.py, I can't tell how to regenerate those matmul test files. Quite a few PRs with completely empty descriptions :/

Possibly https://github.com/iree-org/iree-experimental/tree/main/iree-torch/library, https://github.com/iree-org/iree-experimental/tree/main/iree-jax/library, etc.?

We might be able to download the existing files, edit them manually to use the new tensor.expand_shape syntax, then upload them and update the URLs. I'm not sure who has access to https://storage.googleapis.com/iree-model-artifacts/ anymore though. We could push there if someone still has access or push elsewhere (a github repo with LFS, Azure, etc.)

@mariecwhite, we're going to need help here! Context in the issue description above.

For https://github.com/iree-org/iree/tree/main/experimental/regression_suite/tests/pregenerated, tests are still disabled:

iree/.github/workflows/pkgci_regression_test.yml

Lines 190 to 197 in 2587078

    
                 # TODO(#17344): regenerate .mlirbc files, test plat_rdna3_rocm on rocm 
        
                 # # In-tree tests 
        
                 # - name: Run experimental/regression_suite tests 
        
                 #   run: | 
        
                 #     source ${VENV_DIR}/bin/activate 
        
                 #     pytest \ 
        
                 #       -rA -s -m "plat_host_cpu and presubmit" \ 
        
                 #       experimental/regression_suite

Instructions for regenerating are at https://github.com/nod-ai/SHARK-Turbine/tree/main/models/turbine_models/custom_models#instructions, but that code hasn't been touched in a while, so it might need other updates too.

I tried to regenerate the .mlirbc files for https://github.com/nod-ai/SHARK-TestSuite/tree/main/iree_tests/pytorch/models/resnet50 and https://github.com/nod-ai/SHARK-TestSuite/tree/main/iree_tests/pytorch/models/opt-125M, but hit issues with both. Need to apply more rigor to those frontend workflows.

resnet50 crashed in the compiler at mlir::iree_compiler::IREE::Util::serializeResourceRawData D:\dev\projects\iree\compiler\src\iree\compiler\Dialect\Util\IR\UtilAttrs.cpp:230:0

opt-125M regressed in PyTorch at some point - I can't even export it to torch-mlir (marked as failing, along with nearly all other models, at https://github.com/nod-ai/e2eshark-reports/blob/main/2024-05-07/turbine_reports/statusreport.md)

Okay, so https://github.com/nod-ai/SHARK-TestSuite/blob/main/.github/workflows/test_e2eshark.yml (what is generating reports like these) is actually pinned to a very old PyTorch version (2.1.0, 8 months old) by using the requirements files in https://github.com/nod-ai/SHARK-Turbine/tree/torch_2.1/core (that repo itself has also moved to https://github.com/iree-org/iree-turbine).

When I try those pinned versions I get RuntimeError: Windows not yet supported for torch.compile
When I try with PyTorch 2.4.0 I get TypeError: forward() got an unexpected keyword argument 'constraints'

	# TODO(#17344): regenerate .mlirbc files, test plat_rdna3_rocm on rocm
	# # In-tree tests
	# - name: Run experimental/regression_suite tests
	# run: \|
	# source ${VENV_DIR}/bin/activate
	# pytest \
	# -rA -s -m "plat_host_cpu and presubmit" \
	# experimental/regression_suite

Regenerate `.mlirbc` files for tests and benchmarks after LLVM integrate #17330