Forward-pass NN model with dynamic input doesn't work on CPU, but work on GPU and TPU.
vanbasten23 opened this issue Β· comments
π Bug
Forward-pass NN model with dynamic input doesn't work on CPU, but work on GPU and TPU, per PR.
To Reproduce
On cloudtop docker container created via
$ sudo docker pull gcr.io/tpu-pytorch/xla_base:latest
$ sudo docker run -it gcr.io/tpu-pytorch/xla_base:latest
Steps to reproduce the behavior:
$ export XLA_EXPERIMENTAL="nonzero:masked_select"
$ python3 pytorch/xla/test/test_dynamic_shape_models.py TestDynamicShapeModels.test_forward_pass_dynamic_input_correctness
The error message we get is
======================================================================
ERROR: test_forward_pass_dynamic_input_correctness (__main__.TestDynamicShapeModels)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/tmp/pytorch/xla/test/test_dynamic_shape_models.py", line 51, in test_forward_pass_dynamic_input_correctness
xm.mark_step()
File "/opt/conda/lib/python3.7/site-packages/torch_xla-1.14-py3.7-linux-x86_64.egg/torch_xla/core/xla_model.py", line 953, in mark_step
wait=xu.getenv_as('XLA_SYNC_WAIT', bool, False))
RuntimeError: INVALID_ARGUMENT: From /job:localservice/replica:0/task:0:
2 root error(s) found.
(0) INVALID_ARGUMENT: Fail to proof the equality of two dimensions at compile time: %reduce.56 = s32[] reduce(s32[10]{0} %convert.50, s32[] %constant.51), dimensions={0}, to_apply=%add_S32.52 vs %reduce.17 = s32[] reduce(s32[10]{0} %convert.11, s32[] %constant.12), dimensions={0}, to_apply=%add_S32.13
[[{{node XRTCompile}}]]
[[XRTCompile_G3]]
(1) INVALID_ARGUMENT: Fail to proof the equality of two dimensions at compile time: %reduce.56 = s32[] reduce(s32[10]{0} %convert.50, s32[] %constant.51), dimensions={0}, to_apply=%add_S32.52 vs %reduce.17 = s32[] reduce(s32[10]{0} %convert.11, s32[] %constant.12), dimensions={0}, to_apply=%add_S32.13
[[{{node XRTCompile}}]]
0 successful operations.
0 derived errors ignored.
Recent warning and error logs:
0 successful operations.
0 derived errors ignored.
Recent warning and error logs:
OP_REQUIRES failed at xrt_compile_ops.cc:221 : INVALID_ARGUMENT: Fail to proof the equality of two dimensions at compile time: %reduce.56 = s32[] reduce(s32[10]{0} %convert.50, s32[] %constant.51), dimensions={0}, to_apply=%add_S32.52 vs %reduce.17 = s32[] reduce(s32[10]{0} %convert.11, s32[] %constant.12), dimensions={0}, to_apply=%add_S32.13
OP_REQUIRES failed at xrt_compile_ops.cc:221 : INVALID_ARGUMENT: Fail to proof the equality of two dimensions at compile time: %reduce.56 = s32[] reduce(s32[10]{0} %convert.50, s32[] %constant.51), dimensions={0}, to_apply=%add_S32.52 vs %reduce.17 = s32[] reduce(s32[10]{0} %convert.11, s32[] %constant.12), dimensions={0}, to_apply=%add_S32.13
Per comment, it failed because in cpu compiler(tensorflow/compiler/xla/service/cpu/cpu_compiler.cc)
dynamic_padder_options.shape_check_mode = DynamicDimensionInference::ShapeCheckMode::kCompileTime;
which will fail if at compileTime it can not verify if two shapes are equivalent, this will pretty much blocks ds work. In GPU it is being set to kRuntime which only check shape eqality at run time.
Right now, we skip this test on CPU but we need to investigate.
Expected behavior
Running the above test should not fail with error.
Environment
- Reproducible on XLA backend [CPU/TPU]: CPU
- torch_xla version: nightly
Additional context
@miladm @JackCaoG Per our meeting, I know XLA wants to make CPU stricter than GPU/TPU. But I assume we still want models with dynamic shape to run on CPU (perhaps for the sake of ease of debugging, testing, etc.), right? If so, do you think we should open an issue for XLA to allow dynamic shape on CPU?
@vanbasten23 LMK if I wrongly assigned this to you.