pytorch / xla

Enabling PyTorch on XLA Devices (e.g. Google TPU)

Home Page:https://pytorch.org/xla

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Forward-pass NN model with dynamic input doesn't work on CPU, but work on GPU and TPU.

vanbasten23 opened this issue Β· comments

πŸ› Bug

Forward-pass NN model with dynamic input doesn't work on CPU, but work on GPU and TPU, per PR.

To Reproduce

On cloudtop docker container created via

$ sudo docker pull gcr.io/tpu-pytorch/xla_base:latest
$ sudo docker run -it gcr.io/tpu-pytorch/xla_base:latest

Steps to reproduce the behavior:

$ export XLA_EXPERIMENTAL="nonzero:masked_select"
$ python3 pytorch/xla/test/test_dynamic_shape_models.py TestDynamicShapeModels.test_forward_pass_dynamic_input_correctness

The error message we get is

======================================================================
ERROR: test_forward_pass_dynamic_input_correctness (__main__.TestDynamicShapeModels)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/tmp/pytorch/xla/test/test_dynamic_shape_models.py", line 51, in test_forward_pass_dynamic_input_correctness
    xm.mark_step()
  File "/opt/conda/lib/python3.7/site-packages/torch_xla-1.14-py3.7-linux-x86_64.egg/torch_xla/core/xla_model.py", line 953, in mark_step
    wait=xu.getenv_as('XLA_SYNC_WAIT', bool, False))
RuntimeError: INVALID_ARGUMENT: From /job:localservice/replica:0/task:0:
2 root error(s) found.
  (0) INVALID_ARGUMENT: Fail to proof the equality of two dimensions at compile time: %reduce.56 = s32[] reduce(s32[10]{0} %convert.50, s32[] %constant.51), dimensions={0}, to_apply=%add_S32.52 vs %reduce.17 = s32[] reduce(s32[10]{0} %convert.11, s32[] %constant.12), dimensions={0}, to_apply=%add_S32.13
	 [[{{node XRTCompile}}]]
	 [[XRTCompile_G3]]
  (1) INVALID_ARGUMENT: Fail to proof the equality of two dimensions at compile time: %reduce.56 = s32[] reduce(s32[10]{0} %convert.50, s32[] %constant.51), dimensions={0}, to_apply=%add_S32.52 vs %reduce.17 = s32[] reduce(s32[10]{0} %convert.11, s32[] %constant.12), dimensions={0}, to_apply=%add_S32.13
	 [[{{node XRTCompile}}]]
0 successful operations.
0 derived errors ignored.
Recent warning and error logs:
  0 successful operations.
  0 derived errors ignored.
  Recent warning and error logs:
    OP_REQUIRES failed at xrt_compile_ops.cc:221 : INVALID_ARGUMENT: Fail to proof the equality of two dimensions at compile time: %reduce.56 = s32[] reduce(s32[10]{0} %convert.50, s32[] %constant.51), dimensions={0}, to_apply=%add_S32.52 vs %reduce.17 = s32[] reduce(s32[10]{0} %convert.11, s32[] %constant.12), dimensions={0}, to_apply=%add_S32.13
  OP_REQUIRES failed at xrt_compile_ops.cc:221 : INVALID_ARGUMENT: Fail to proof the equality of two dimensions at compile time: %reduce.56 = s32[] reduce(s32[10]{0} %convert.50, s32[] %constant.51), dimensions={0}, to_apply=%add_S32.52 vs %reduce.17 = s32[] reduce(s32[10]{0} %convert.11, s32[] %constant.12), dimensions={0}, to_apply=%add_S32.13

Per comment, it failed because in cpu compiler(tensorflow/compiler/xla/service/cpu/cpu_compiler.cc)

dynamic_padder_options.shape_check_mode = DynamicDimensionInference::ShapeCheckMode::kCompileTime;

which will fail if at compileTime it can not verify if two shapes are equivalent, this will pretty much blocks ds work. In GPU it is being set to kRuntime which only check shape eqality at run time.

Right now, we skip this test on CPU but we need to investigate.

Expected behavior

Running the above test should not fail with error.

Environment

  • Reproducible on XLA backend [CPU/TPU]: CPU
  • torch_xla version: nightly

Additional context

cc @miladm @JackCaoG

@miladm @JackCaoG Per our meeting, I know XLA wants to make CPU stricter than GPU/TPU. But I assume we still want models with dynamic shape to run on CPU (perhaps for the sake of ease of debugging, testing, etc.), right? If so, do you think we should open an issue for XLA to allow dynamic shape on CPU?

@vanbasten23 LMK if I wrongly assigned this to you.