DingXiaoH / RepLKNet-pytorch

Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs (CVPR 2022)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Two error during the compile for 19_large_depthwise_conv2d_torch_extension

ewrfcas opened this issue · comments

My environment:
python 3.8.8
cuda 11.1
pytorch 1.7.1/1.8.1/1.9 all failed

2 errors detected in the compilation of "forward_fp32.cu".
error: command '/usr/local/cuda-11.1/bin/nvcc' failed with exit status 1

forward_fp32.cu(212): error: more than one instance of constructor "cutlass::Tensor4DCoord::Tensor4DCoord" matches the argu
ment list:
            function "cutlass::Tensor4DCoord::Tensor4DCoord(cutlass::Tensor4DCoord::Index, cutlass::Tensor4DCoord::Index, c
utlass::Tensor4DCoord::Index, cutlass::Tensor4DCoord::Index)"
            function "cutlass::Tensor4DCoord::Tensor4DCoord(cutlass::Tensor4DCoord::LongIndex, cutlass::Tensor4DCoord::Long
Index, cutlass::Tensor4DCoord::LongIndex, cutlass::Tensor4DCoord::LongIndex)"
            argument types are: (int64_t, int64_t, int64_t, int)

forward_fp32.cu(232): error: no instance of constructor "cutlass::conv::kernel::ImplicitBatchedGemmTnDepthwiseConvo[6/1944]
ma_, Epilogue_, ThreadblockSwizzle_, ConvOperator, ConvProblemSize_>::Arguments::Arguments [with Mma_=cutlass::conv::thread
block::MmaTnPrecompPipelined<ThreadblockShape, cutlass::conv::threadblock::Dwconv2dTileIterator<cutlass::MatrixShape<64, 8>
, float, cutlass::layout::TensorNCHW, cutlass::transform::PitchLinearStripminedThreadMap<cutlass::layout::PitchLinearShape<
8, 64>, 128, 1>, 1, 0>, cutlass::conv::threadblock::RegularTileIteratorTransposed<cutlass::MatrixShape<64, 8>, float, cutla
ss::layout::ColumnMajor, 1, cutlass::conv::threadblock::DefaultMmaCore<ThreadblockShape, WarpShape, cutlass::gemm::GemmShap
e<1, 1, 1>, float, cutlass::layout::TensorNCHW, 1, float, cutlass::layout::TensorNCHW, 1, ElementDst, LayoutDst, cutlass::$
rch::OpClassSimt, 2, cutlass::arch::OpMultiplyAdd, true, cutlass::conv::ImplicitGemmMode::GEMM_TN, cutlass::arch::CacheOper
ation::Global, cutlass::arch::CacheOperation::Global>::TransposedPitchLinearThreadMapVec, 4>, cutlass::conv::threadblock::D
wconv2dTileFilterIteratorFpropPrecomp<cutlass::MatrixShape<8, 128>, float, cutlass::layout::TensorNCHW, cutlass::conv::thre
adblock::PitchLinearStripminedThreadMapStrided<cutlass::layout::PitchLinearShape<128, 8>, 128, 1>, 1>, cutlass::transform::
threadblock::RegularTileIterator<cutlass::MatrixShape<8, 128>, float, cutlass::layout::RowMajor, 0, cutlass::conv::threadbl
ock::PitchLinearStripminedThreadMapStrided<cutlass::layout::PitchLinearShape<128, 8>, 128, 1>, 4>, ElementDst, LayoutDst, c
utlass::gemm::threadblock::MmaPolicy<cutlass::gemm::warp::MmaSimt<WarpShape, float, cutlass::layout::ColumnMajor, float, cu
tlass::layout::RowMajor, ElementDst, cutlass::layout::RowMajor, cutlass::gemm::warp::MmaSimtPolicy<cutlass::MatrixShape<8,
4>, cutlass::layout::RowMajorInterleaved<2>, cutlass::gemm::GemmShape<4, 4, 1>>, 1, cutlass::ComplexTransform::kNone, cutla
ss::ComplexTransform::kNone, __nv_bool>, cutlass::MatrixShape<4, 0>, cutlass::MatrixShape<0, 0>, 1>, cutlass::NumericArrayC
onverter<float, float, 4, cutlass::FloatRoundStyle::round_to_nearest>, cutlass::NumericArrayConverter<float, float, 8, cutl
ass::FloatRoundStyle::round_to_nearest>, __nv_bool>, Epilogue_=cutlass::epilogue::threadblock::ConvolutionEpilogue<Threadbl
ockShape, cutlass::layout::TensorNCHW, 1, cutlass::gemm::warp::MmaSimt<WarpShape, float, cutlass::layout::ColumnMajor, floa
t, cutlass::layout::RowMajor, ElementDst, cutlass::layout::RowMajor, cutlass::gemm::warp::MmaSimtPolicy<cutlass::MatrixShap
e<8, 4>, cutlass::layout::RowMajorInterleaved<2>, cutlass::gemm::GemmShape<4, 4, 1>>, 1, cutlass::ComplexTransform::kNone,
cutlass::ComplexTransform::kNone, __nv_bool>, cutlass::epilogue::threadblock::Dwconv2dPredicatedTileIterator<cutlass::epilo
gue::threadblock::OutputTileOptimalThreadMap<cutlass::epilogue::threadblock::OutputTileShape<128, 1, 8, 1, 1>, cutlass::epi
logue::threadblock::OutputTileShape<1, 4, 2, 1, 8>, 128, 1, 32>, cutlass::layout::TensorNCHW, ElementDst>, cutlass::epilogu
e::warp::FragmentIteratorSimt<WarpShape, cutlass::gemm::thread::Mma<cutlass::gemm::GemmShape<8, 8, 1>, float, cutlass::layo
ut::ColumnMajor, float, cutlass::layout::RowMajor, ElementDst, cutlass::layout::RowMajor, cutlass::arch::OpMultiplyAdd, __n
v_bool>, cutlass::layout::RowMajor, cutlass::gemm::warp::MmaSimtPolicy<cutlass::MatrixShape<8, 4>, cutlass::layout::RowMajo
rInterleaved<2>, cutlass::gemm::GemmShape<4, 4, 1>>, cutlass::epilogue::warp::SimtPolicy<WarpShape, cutlass::gemm::thread::
Mma<cutlass::gemm::GemmShape<8, 8, 1>, float, cutlass::layout::ColumnMajor, float, cutlass::layout::RowMajor, ElementDst, c
utlass::layout::RowMajor, cutlass::arch::OpMultiplyAdd, __nv_bool>, cutlass::layout::RowMajor, cutlass::gemm::warp::MmaSimt
Policy<cutlass::MatrixShape<8, 4>, cutlass::layout::RowMajorInterleaved<2>, cutlass::gemm::GemmShape<4, 4, 1>>>>, cutlass::
epilogue::warp::TileIteratorSimt<WarpShape, cutlass::gemm::thread::Mma<cutlass::gemm::GemmShape<8, 8, 1>, float, cutlass::l
ayout::ColumnMajor, float, cutlass::layout::RowMajor, ElementDst, cutlass::layout::RowMajor, cutlass::arch::OpMultiplyAdd,
__nv_bool>, ElementDst, cutlass::layout::RowMajor, cutlass::gemm::warp::MmaSimtPolicy<cutlass::MatrixShape<8, 4>, cutlass::
layout::RowMajorInterleaved<2>, cutlass::gemm::GemmShape<4, 4, 1>>>, cutlass::epilogue::threadblock::SharedLoadIterator<cut
lass::epilogue::threadblock::OutputTileOptimalThreadMap<cutlass::epilogue::threadblock::OutputTileShape<128, 1, 8, 1, 1>, c
utlass::epilogue::threadblock::OutputTileShape<1, 4, 2, 1, 8>, 128, 1, 32>::CompactedThreadMap, ElementDst, 4>, cutlass::ep
ilogue::threadblock::Dwconv2dBiasTileIterator<cutlass::layout::TensorNCHW, ElementDst, 1>, EpilogueOp, cutlass::MatrixShape
<0, 17>, false>, ThreadblockSwizzle_=SwizzleThreadBlock, ConvOperator=cutlass::conv::Operator::kFprop, ConvProblemSize_=cut
lass::conv::Conv2dProblemSize]" matches the argument list
argument types are: ({...}, cutlass::TensorRef<ElementSrc, LayoutSrc>, cutlass::TensorRef<ElementSrc, LayoutSrc>, long, long, cutlass::TensorRef<ElementSrc, LayoutSrc>, {...})

Same error occurred on PyTorch 1.10 with CUDA 11.3/11.0 and cuDNN 8.4.1/8.2.0.
And we received an error from cutlass

cutlass/include/cutlass/fast_math.h(741): error: no suitable conversion function from "__half" to "float" exists

@ewrfcas
We attempted to solve this problem by downgrading Python version to 3.7.
It finally works.

Could you please share the environment you used to install? like os version, gcc version, whether used C++14

@sleeplessai

python 3.7.1 still not work. What is the minor version you used?