[BUG]: `STATIC_MAP_TEST` fails with `cudaErrorMisalignedAddress` when compiled with `-G`
wence- opened this issue · comments
Is this a duplicate?
- I confirmed there appear to be no duplicate issues for this bug (https://github.com/NVIDIA/cuCollections/issues)
Type of Bug
Runtime Error
Describe the bug
When compiling the test suite with -G
diff --git a/tests/CMakeLists.txt b/tests/CMakeLists.txt
index ebc37e3..d306245 100644
--- a/tests/CMakeLists.txt
+++ b/tests/CMakeLists.txt
@@ -38,7 +38,7 @@ function(ConfigureTest TEST_NAME)
target_include_directories(${TEST_NAME} PRIVATE ${CMAKE_CURRENT_SOURCE_DIR})
set_target_properties(${TEST_NAME} PROPERTIES
RUNTIME_OUTPUT_DIRECTORY "${CMAKE_BINARY_DIR}/tests")
- target_compile_options(${TEST_NAME} PRIVATE --compiler-options=-Wall --compiler-options=-Wextra
+ target_compile_options(${TEST_NAME} PRIVATE --compiler-options=-Wall --compiler-options=-Wextra -G
--expt-extended-lambda --expt-relaxed-constexpr -Xcompiler -Wno-subobject-linkage)
catch_discover_tests(${TEST_NAME} EXTRA_ARGS --allow-running-no-tests)
endfunction(ConfigureTest)
STATIC_MAP_TEST
does not complete but instead exits with SIGABRT.
How to Reproduce
Build with -G
$ ./STATIC_MAP_TEST
Randomness seeded to: 113517244
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
STATIC_MAP_TEST is a Catch2 v3.3.0 host application.
Run with -? for options
-------------------------------------------------------------------------------
User defined key and value type - key_pair_type<int32_t>, value_pair_type
<int32_t>
All inserted keys-value pairs should be contained
-------------------------------------------------------------------------------
/home/wence/Documents/src/rapids/first-party/cucollections/tests/static_map/custom_type_test.cu:209
...............................................................................
/home/wence/Documents/src/rapids/first-party/cucollections/tests/static_map/custom_type_test.cu:214: FAILED:
REQUIRE( cuco::test::all_of( insert_pairs, insert_pairs + num, [view] __attribute__((device))(cuco::pair<Key, Value> const& pair) { return view.contains(pair.first, hash_custom_key{}, custom_key_equals{}); }) )
due to unexpected exception with message:
CUDA error at: /home/wence/Documents/src/rapids/first-party/cucollections/
tests/utils.hpp54: cudaErrorMisalignedAddress misaligned address
terminate called after throwing an instance of 'thrust::system::system_error'
what(): CUDA free failed: cudaErrorMisalignedAddress: misaligned address
/home/wence/Documents/src/rapids/first-party/cucollections/tests/static_map/custom_type_test.cu:214: FAILED:
{Unknown expression after the reported line}
due to a fatal error condition:
SIGABRT - Abort (abnormal termination) signal
===============================================================================
test cases: 1 | 1 failed
assertions: 8 | 6 passed | 2 failed
[1] 80726 IOT instruction (core dumped) ./STATIC_MAP_TEST
Expected behavior
No such failure.
Reproduction link
No response
Operating System
No response
nvidia-smi output
No response
NVCC version
No response
compute sanitizer run:
compute-sanitizer ./STATIC_MAP_TEST_DEVICE_DEBUG
========= COMPUTE-SANITIZER
Randomness seeded to: 4277593713
========= Invalid __local__ read of size 8 bytes
========= at 0x120 in /home/wence/Documents/src/rapids/first-party/cucollections/include/cuco/detail/bitwise_compare.cuh:59:cuco::detail::bitwise_compare_impl<(unsigned long)8>::compare(const char *, const char *)
========= by thread (32,0,0) in block (0,0,0)
========= Address 0xfffc04 is misaligned
========= Saved host backtrace up to driver entry point at kernel launch time
========= Host Frame: [0x3050c2]
========= in /lib/x86_64-linux-gnu/libcuda.so.1
========= Host Frame: [0x14bec]
========= in /usr/local/cuda-12.2/lib64/libcudart.so.12
========= Host Frame:cudaLaunchKernel [0x6d57b]
========= in /usr/local/cuda-12.2/lib64/libcudart.so.12
========= Host Frame:void cuco::test::detail::count_if<thrust::transform_iterator<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), &(void CATCH2_INTERNAL_TEMPLATE_TEST_0<key_pair_type<int>, value_pair_type<int> >()), 3u>>, thrust::counting_iterator<int, thrust::use_default, thrust::use_default, thrust::use_default>, thrust::use_default, thrust::use_default>, __nv_dl_wrapper_t<__nv_dl_tag<void (*)(), &(void CATCH2_INTERNAL_TEMPLATE_TEST_0<key_pair_type<int>, value_pair_type<int> >()), 9u>, cuco::static_map<key_pair_type<int>, value_pair_type<int>, (cuda::std::__4::__detail::thread_scope)1, cuco::cuda_allocator<char> >::device_view> >(thrust::transform_iterator<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), &(void CATCH2_INTERNAL_TEMPLATE_TEST_0<key_pair_type<int>, value_pair_type<int> >()), 3u>>, thrust::counting_iterator<int, thrust::use_default, thrust::use_default, thrust::use_default>, thrust::use_default, thrust::use_default>, thrust::transform_iterator<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), &(void CATCH2_INTERNAL_TEMPLATE_TEST_0<key_pair_type<int>, value_pair_type<int> >()), 3u>>, thrust::counting_iterator<int, thrust::use_default, thrust::use_default, thrust::use_default>, thrust::use_default, thrust::use_default>, int*, __nv_dl_wrapper_t<__nv_dl_tag<void (*)(), &(void CATCH2_INTERNAL_TEMPLATE_TEST_0<key_pair_type<int>, value_pair_type<int> >()), 9u>, cuco::static_map<key_pair_type<int>, value_pair_type<int>, (cuda::std::__4::__detail::thread_scope)1, cuco::cuda_allocator<char> >::device_view>) [0x285e7]
========= in /home/wence/Documents/src/rapids/first-party/cucollections/build-12-2/tests/./STATIC_MAP_TEST_DEVICE_DEBUG
========= Host Frame:int cuco::test::count_if<thrust::transform_iterator<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), &(void CATCH2_INTERNAL_TEMPLATE_TEST_0<key_pair_type<int>, value_pair_type<int> >()), 3u>>, thrust::counting_iterator<int, thrust::use_default, thrust::use_default, thrust::use_default>, thrust::use_default, thrust::use_default>, __nv_dl_wrapper_t<__nv_dl_tag<void (*)(), &(void CATCH2_INTERNAL_TEMPLATE_TEST_0<key_pair_type<int>, value_pair_type<int> >()), 9u>, cuco::static_map<key_pair_type<int>, value_pair_type<int>, (cuda::std::__4::__detail::thread_scope)1, cuco::cuda_allocator<char> >::device_view> >(thrust::transform_iterator<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), &(void CATCH2_INTERNAL_TEMPLATE_TEST_0<key_pair_type<int>, value_pair_type<int> >()), 3u>>, thrust::counting_iterator<int, thrust::use_default, thrust::use_default, thrust::use_default>, thrust::use_default, thrust::use_default>, thrust::transform_iterator<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), &(void CATCH2_INTERNAL_TEMPLATE_TEST_0<key_pair_type<int>, value_pair_type<int> >()), 3u>>, thrust::counting_iterator<int, thrust::use_default, thrust::use_default, thrust::use_default>, thrust::use_default, thrust::use_default>, __nv_dl_wrapper_t<__nv_dl_tag<void (*)(), &(void CATCH2_INTERNAL_TEMPLATE_TEST_0<key_pair_type<int>, value_pair_type<int> >()), 9u>, cuco::static_map<key_pair_type<int>, value_pair_type<int>, (cuda::std::__4::__detail::thread_scope)1, cuco::cuda_allocator<char> >::device_view>, CUstream_st*) [clone .constprop.0] [0x2f33a]
========= in /home/wence/Documents/src/rapids/first-party/cucollections/build-12-2/tests/./STATIC_MAP_TEST_DEVICE_DEBUG
========= Host Frame:void CATCH2_INTERNAL_TEMPLATE_TEST_0<key_pair_type<int>, value_pair_type<int> >() [0x3608f]
========= in /home/wence/Documents/src/rapids/first-party/cucollections/build-12-2/tests/./STATIC_MAP_TEST_DEVICE_DEBUG
========= Host Frame:Catch::RunContext::runCurrentTest(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&) [0x18d898]
========= in /home/wence/Documents/src/rapids/first-party/cucollections/build-12-2/tests/./STATIC_MAP_TEST_DEVICE_DEBUG
========= Host Frame:Catch::RunContext::runTest(Catch::TestCaseHandle const&) [0x18dc38]
========= in /home/wence/Documents/src/rapids/first-party/cucollections/build-12-2/tests/./STATIC_MAP_TEST_DEVICE_DEBUG
========= Host Frame:Catch::Session::runInternal() [0x162b28]
========= in /home/wence/Documents/src/rapids/first-party/cucollections/build-12-2/tests/./STATIC_MAP_TEST_DEVICE_DEBUG
========= Host Frame:Catch::Session::run() [0x162eb0]
========= in /home/wence/Documents/src/rapids/first-party/cucollections/build-12-2/tests/./STATIC_MAP_TEST_DEVICE_DEBUG
========= Host Frame:main [0x27f90]
========= in /home/wence/Documents/src/rapids/first-party/cucollections/build-12-2/tests/./STATIC_MAP_TEST_DEVICE_DEBUG
========= Host Frame:../sysdeps/nptl/libc_start_call_main.h:58:__libc_start_call_main [0x29d90]
========= in /lib/x86_64-linux-gnu/libc.so.6
========= Host Frame:../csu/libc-start.c:379:__libc_start_main [0x29e40]
========= in /lib/x86_64-linux-gnu/libc.so.6
========= Host Frame:_start [0x28035]
========= in /home/wence/Documents/src/rapids/first-party/cucollections/build-12-2/tests/./STATIC_MAP_TEST_DEVICE_DEBUG
I think that the template-specialised bitwise_compare
implementations:
template <>
struct bitwise_compare_impl<4> {
__host__ __device__ inline static bool compare(char const* lhs, char const* rhs)
{
return *reinterpret_cast<uint32_t const*>(lhs) == *reinterpret_cast<uint32_t const*>(rhs);
}
};
template <>
struct bitwise_compare_impl<8> {
__host__ __device__ inline static bool compare(char const* lhs, char const* rhs)
{
return *reinterpret_cast<uint64_t const*>(lhs) == *reinterpret_cast<uint64_t const*>(rhs);
}
};
are invoking UB since there is no requirement that the char *
pointer inputs have enough alignment to be cast to uint64_t *
.
are invoking UB since there is no requirement that the char * pointer inputs have enough alignment to be cast to uint64_t *
In general, yes, but we control all of the addresses that are passed to bitwise_compare
and can guarantee their alignment. We should only ever be comparing either an address in the storage of the container (which is already aligned), the empty sentinel value, or some user-provided value.
That said, we may need to be a bit more pedantic about ensuring proper alignment for values that aren't part of the container storage, especially for potentially under-aligned types. For example,
should probably be:
alignas(sizeof(T)) T empty_sentinel_;
As alignof(T) < sizeof(T)
could be true.
Likewise here:
cuCollections/include/cuco/detail/equal_wrapper.cuh
Lines 86 to 89 in fd7263c
could be:
__device__ constexpr equal_result operator()(T const& lhs, U const& rhs) const noexcept
{
alignas(sizeof(T)) T const __a{lhs};
return cuco::detail::bitwise_compare(__a, empty_sentinel_) ? equal_result::EMPTY
: this->equal_to(lhs, rhs);