FBGEMM_GPU Build Issue

Question

FBGEMM_GPU Build Issue

rkindi opened this issue 3 years ago · comments

I am unable to build fbgemm_gpu due to the following error:

/usr/include/c++/7/bits/hashtable.h:268:7: error: static assertion failed: Cache the hash code or qualify your functors involved in hash code and bucket index computation with noexcept
       static_assert(noexcept(declval<const __hash_code_base_access&>()
       ^~~~~~~~~~~~~

Some relevant environment variables:

$ echo $TORCH_CUDA_ARCH_LIST, $CUDA_HOME, $CUDACXX
8.0, /usr/local/cuda-11.1, /usr/local/cuda-11.1/bin/nvcc

My conda env:

name: my_fbgemm_setup
channels:
  - pytorch-nightly
  - bottler
  - iopath
  - nvidia
  - conda-forge
  - defaults
dependencies:
  - _libgcc_mutex=0.1=conda_forge
  - _openmp_mutex=4.5=1_llvm
  - blas=1.0=mkl
  - ca-certificates=2021.10.26=h06a4308_2
  - colorama=0.4.4=pyh9f0ad1d_0
  - cudatoolkit=11.1.74=h6bb024c_0
  - iopath=0.1.9=py38
  - jinja2=3.0.3=pyhd8ed1ab_0
  - ld_impl_linux-64=2.36.1=hea4e1c9_2
  - libffi=3.3=h58526e2_2
  - libgcc-ng=11.2.0=h1d223b6_11
  - libstdcxx-ng=11.2.0=he4da1e4_11
  - libuv=1.42.0=h7f98852_0
  - libzlib=1.2.11=h36c2ea0_1013
  - llvm-openmp=12.0.1=h4bd325d_1
  - markupsafe=2.0.1=py38h497a2fe_1
  - mkl=2021.4.0=h8d4b97c_729
  - mkl-service=2.4.0=py38h7f8727e_0
  - mkl_fft=1.3.1=py38hd3c417c_0
  - mkl_random=1.2.2=py38h51133e4_0
  - ncurses=6.2=h58526e2_4
  - numpy=1.21.2=py38h20f2e39_0
  - numpy-base=1.21.2=py38h79a1101_0
  - nvidiacub=1.10.0=0
  - openssl=1.1.1l=h7f8727e_0
  - pip=21.3.1=pyhd8ed1ab_0
  - portalocker=2.3.2=py38h578d9bd_1
  - python=3.8.6=hffdb5ce_5_cpython
  - python_abi=3.8=2_cp38
  - pytorch=1.11.0.dev20211115=py3.8_cuda11.1_cudnn8.0.5_0
  - pytorch-mutex=1.0=cuda
  - readline=8.1=h46c0cb4_0
  - setuptools=59.1.0=py38h578d9bd_0
  - six=1.16.0=pyhd3eb1b0_0
  - sqlite=3.36.0=h9cd32fc_2
  - tbb=2021.4.0=h4bd325d_1
  - tk=8.6.11=h27826a3_1
  - tqdm=4.62.3=pyhd8ed1ab_0
  - typing_extensions=3.10.0.2=pyha770c72_0
  - wheel=0.37.0=pyhd8ed1ab_1
  - xz=5.2.5=h516909a_1
  - zlib=1.2.11=h36c2ea0_1013
prefix: /data/home/rahulkindi/miniconda3/envs/my_fbgemm_setup

Output of python3 setup.py build develop: setup.out.txt

Any help you can provide is appreciated!

Jianyu Huang · Answer 1 · Tue Nov 23 2021 04:19:29 GMT+0800 (China Standard Time)

Could you check if the new OSS build path from #665 can fix this issue?

Rick Weyrauch · Answer 2 · Tue Nov 23 2021 05:54:20 GMT+0800 (China Standard Time)

With #665 fixes the build failures. Howerver, the split_table_batched_embeddings_test and split_embedding_inference_converter_test tests are failing.

% python test/split_table_batched_embeddings_test.py Traceback (most recent call last): File "test/split_table_batched_embeddings_test.py", line 15, in <module> import fbgemm_gpu.split_table_batched_embeddings_ops as split_table_batched_embeddings_ops File "/home/rick/anaconda3/envs/pytorch/lib/python3.8/site-packages/fbgemm_gpu-0.0.1-py3.8-linux-x86_64.egg/fbgemm_gpu/split_table_batched_embeddings_ops.py", line 17, in <module> import fbgemm_gpu.split_embedding_codegen_lookup_invokers as invokers File "/home/rick/anaconda3/envs/pytorch/lib/python3.8/site-packages/fbgemm_gpu-0.0.1-py3.8-linux-x86_64.egg/fbgemm_gpu/split_embedding_codegen_lookup_invokers/__init__.py", line 19, in <module> import fbgemm_gpu.split_embedding_codegen_lookup_invokers.lookup_rowwise_weighted_adagrad as lookup_rowwise_weighted_adagrad # noqa: F401 ModuleNotFoundError: No module named 'fbgemm_gpu.split_embedding_codegen_lookup_invokers.lookup_rowwise_weighted_adagrad'

Jianyu Huang · Answer 3 · Tue Nov 23 2021 12:18:22 GMT+0800 (China Standard Time)

Should be fixed by #766.