microsoft / O-CNN

O-CNN: Octree-based Convolutional Neural Networks for 3D Shape Analysis

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

run_cls.py error: No OpKernel was registered to support Op 'OctreeProperty'

jeichelbaum opened this issue · comments

I am currently experiencing an issue with run_cls.py. The source seems to be that tensorflow can't find my GPU when I run the O-CNN classifier. Running basic tf examples like MNIST classify on my GPU works just fine. Previously I was able to run the classifier just fine, but yesterday evening it stopped working for me.

Steps before it stopped working:

Troubleshooting so far:

  • reinstalled conda and created new environment
  • reinstalled and clean build O-CNN repo
  • switching to tf 1.12.0

System Settings

  • Ubuntu 18.04
  • tensorflow-gpu 1.14.0
  • O-CNN repo version: most recent
  • GPU GTX 1060 (mobile)
  • GCC 7.5.0
  • CUDA V10.1.243
  • CuDNN 7.6.5
2020-03-25 11:51:35.086551: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.1
2020-03-25 11:51:36.614073: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2020-03-25 11:51:36.638301: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2599990000 Hz
2020-03-25 11:51:36.638761: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x558269dab940 executing computations on platform Host. Devices:
2020-03-25 11:51:36.638795: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
2020-03-25 11:51:36.639629: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2020-03-25 11:51:36.642741: E tensorflow/stream_executor/cuda/cuda_driver.cc:318] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2020-03-25 11:51:36.642768: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: GL502VM
2020-03-25 11:51:36.642775: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: GL502VM
2020-03-25 11:51:36.642820: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: 435.21.0
2020-03-25 11:51:36.642841: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 435.21.0
2020-03-25 11:51:36.642847: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:310] kernel version seems to match DSO: 435.21.0
Initialize ...
Traceback (most recent call last):
File "/home/jeri/.conda/envs/tf-1.14.0/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1356, in _do_call return fn(*args)
File "/home/jeri/.conda/envs/tf-1.14.0/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1339, in _run_fn self._extend_graph()
File "/home/jeri/.conda/envs/tf-1.14.0/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1374, in _extend_graph tf_session.ExtendSession(self._session)
tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'OctreeProperty' used by {{node ocnn/OctreeProperty}}with these attrs: [dtype=DT_FLOAT, depth=5, channel=3, property_name="feature"]
Registered devices: [CPU, XLA_CPU]
Registered kernels:
device='GPU'
[[ocnn/OctreeProperty]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "run_cls.py", line 65, in <module>
    solver.train()
File "/home/jeri/dev/O-CNN/tensorflow/script/tfsolver.py", line 56, in train
    self.initialize(sess)
File "/home/jeri/dev/O-CNN/tensorflow/script/tfsolver.py", line 33, in initialize
    sess.run(tf.global_variables_initializer())
File "/home/jeri/.conda/envs/tf-1.14.0/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 950, in run
    run_metadata_ptr)
File "/home/jeri/.conda/envs/tf-1.14.0/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1173, in _run
    feed_dict_tensor, options, run_metadata)
File "/home/jeri/.conda/envs/tf-1.14.0/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1350, in _do_run
    run_metadata)
File "/home/jeri/.conda/envs/tf-1.14.0/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1370, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'OctreeProperty' used by node ocnn/OctreeProperty (defined at <string>:2356) with these attrs: [dtype=DT_FLOAT, depth=5, channel=3, property_name="feature"]
Registered devices: [CPU, XLA_CPU]
Registered kernels:
device='GPU'
    [[ocnn/OctreeProperty]]
Errors may have originated from an input operation.
Input Source operations connected to node ocnn/OctreeProperty:
dataset/IteratorGetNext (defined at /home/jeri/dev/O-CNN/tensorflow/script/dataset.py:137)

The most recent commit fixed it for me. Many thanks. For anyone encountering a similar issue in the future: The fix was to point tensorflow to the correct GPU by setting the correct device_id