NVIDIA / cuCollections

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[BUG]: `STATIC_MAP_TEST` fails with `cudaErrorMisalignedAddress` when compiled with `-G`

wence- opened this issue · comments

Is this a duplicate?

Type of Bug

Runtime Error

Describe the bug

When compiling the test suite with -G

diff --git a/tests/CMakeLists.txt b/tests/CMakeLists.txt
index ebc37e3..d306245 100644
--- a/tests/CMakeLists.txt
+++ b/tests/CMakeLists.txt
@@ -38,7 +38,7 @@ function(ConfigureTest TEST_NAME)
     target_include_directories(${TEST_NAME} PRIVATE ${CMAKE_CURRENT_SOURCE_DIR})
     set_target_properties(${TEST_NAME} PROPERTIES
                                        RUNTIME_OUTPUT_DIRECTORY "${CMAKE_BINARY_DIR}/tests")
-    target_compile_options(${TEST_NAME} PRIVATE --compiler-options=-Wall --compiler-options=-Wextra
+    target_compile_options(${TEST_NAME} PRIVATE --compiler-options=-Wall --compiler-options=-Wextra -G
       --expt-extended-lambda --expt-relaxed-constexpr -Xcompiler -Wno-subobject-linkage)
     catch_discover_tests(${TEST_NAME} EXTRA_ARGS --allow-running-no-tests)
 endfunction(ConfigureTest)

STATIC_MAP_TEST does not complete but instead exits with SIGABRT.

How to Reproduce

Build with -G

$ ./STATIC_MAP_TEST 
Randomness seeded to: 113517244

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
STATIC_MAP_TEST is a Catch2 v3.3.0 host application.
Run with -? for options

-------------------------------------------------------------------------------
User defined key and value type - key_pair_type<int32_t>, value_pair_type
<int32_t>
  All inserted keys-value pairs should be contained
-------------------------------------------------------------------------------
/home/wence/Documents/src/rapids/first-party/cucollections/tests/static_map/custom_type_test.cu:209
...............................................................................

/home/wence/Documents/src/rapids/first-party/cucollections/tests/static_map/custom_type_test.cu:214: FAILED:
  REQUIRE( cuco::test::all_of( insert_pairs, insert_pairs + num, [view] __attribute__((device))(cuco::pair<Key, Value> const& pair) { return view.contains(pair.first, hash_custom_key{}, custom_key_equals{}); }) )
due to unexpected exception with message:
  CUDA error at: /home/wence/Documents/src/rapids/first-party/cucollections/
  tests/utils.hpp54: cudaErrorMisalignedAddress misaligned address

terminate called after throwing an instance of 'thrust::system::system_error'
  what():  CUDA free failed: cudaErrorMisalignedAddress: misaligned address
/home/wence/Documents/src/rapids/first-party/cucollections/tests/static_map/custom_type_test.cu:214: FAILED:
  {Unknown expression after the reported line}
due to a fatal error condition:
  SIGABRT - Abort (abnormal termination) signal

===============================================================================
test cases: 1 | 1 failed
assertions: 8 | 6 passed | 2 failed

[1]    80726 IOT instruction (core dumped)  ./STATIC_MAP_TEST

Expected behavior

No such failure.

Reproduction link

No response

Operating System

No response

nvidia-smi output

No response

NVCC version

No response

compute sanitizer run:

compute-sanitizer ./STATIC_MAP_TEST_DEVICE_DEBUG 
========= COMPUTE-SANITIZER
Randomness seeded to: 4277593713
========= Invalid __local__ read of size 8 bytes
=========     at 0x120 in /home/wence/Documents/src/rapids/first-party/cucollections/include/cuco/detail/bitwise_compare.cuh:59:cuco::detail::bitwise_compare_impl<(unsigned long)8>::compare(const char *, const char *)
=========     by thread (32,0,0) in block (0,0,0)
=========     Address 0xfffc04 is misaligned
=========     Saved host backtrace up to driver entry point at kernel launch time
=========     Host Frame: [0x3050c2]
=========                in /lib/x86_64-linux-gnu/libcuda.so.1
=========     Host Frame: [0x14bec]
=========                in /usr/local/cuda-12.2/lib64/libcudart.so.12
=========     Host Frame:cudaLaunchKernel [0x6d57b]
=========                in /usr/local/cuda-12.2/lib64/libcudart.so.12
=========     Host Frame:void cuco::test::detail::count_if<thrust::transform_iterator<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), &(void CATCH2_INTERNAL_TEMPLATE_TEST_0<key_pair_type<int>, value_pair_type<int> >()), 3u>>, thrust::counting_iterator<int, thrust::use_default, thrust::use_default, thrust::use_default>, thrust::use_default, thrust::use_default>, __nv_dl_wrapper_t<__nv_dl_tag<void (*)(), &(void CATCH2_INTERNAL_TEMPLATE_TEST_0<key_pair_type<int>, value_pair_type<int> >()), 9u>, cuco::static_map<key_pair_type<int>, value_pair_type<int>, (cuda::std::__4::__detail::thread_scope)1, cuco::cuda_allocator<char> >::device_view> >(thrust::transform_iterator<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), &(void CATCH2_INTERNAL_TEMPLATE_TEST_0<key_pair_type<int>, value_pair_type<int> >()), 3u>>, thrust::counting_iterator<int, thrust::use_default, thrust::use_default, thrust::use_default>, thrust::use_default, thrust::use_default>, thrust::transform_iterator<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), &(void CATCH2_INTERNAL_TEMPLATE_TEST_0<key_pair_type<int>, value_pair_type<int> >()), 3u>>, thrust::counting_iterator<int, thrust::use_default, thrust::use_default, thrust::use_default>, thrust::use_default, thrust::use_default>, int*, __nv_dl_wrapper_t<__nv_dl_tag<void (*)(), &(void CATCH2_INTERNAL_TEMPLATE_TEST_0<key_pair_type<int>, value_pair_type<int> >()), 9u>, cuco::static_map<key_pair_type<int>, value_pair_type<int>, (cuda::std::__4::__detail::thread_scope)1, cuco::cuda_allocator<char> >::device_view>) [0x285e7]
=========                in /home/wence/Documents/src/rapids/first-party/cucollections/build-12-2/tests/./STATIC_MAP_TEST_DEVICE_DEBUG
=========     Host Frame:int cuco::test::count_if<thrust::transform_iterator<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), &(void CATCH2_INTERNAL_TEMPLATE_TEST_0<key_pair_type<int>, value_pair_type<int> >()), 3u>>, thrust::counting_iterator<int, thrust::use_default, thrust::use_default, thrust::use_default>, thrust::use_default, thrust::use_default>, __nv_dl_wrapper_t<__nv_dl_tag<void (*)(), &(void CATCH2_INTERNAL_TEMPLATE_TEST_0<key_pair_type<int>, value_pair_type<int> >()), 9u>, cuco::static_map<key_pair_type<int>, value_pair_type<int>, (cuda::std::__4::__detail::thread_scope)1, cuco::cuda_allocator<char> >::device_view> >(thrust::transform_iterator<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), &(void CATCH2_INTERNAL_TEMPLATE_TEST_0<key_pair_type<int>, value_pair_type<int> >()), 3u>>, thrust::counting_iterator<int, thrust::use_default, thrust::use_default, thrust::use_default>, thrust::use_default, thrust::use_default>, thrust::transform_iterator<__nv_dl_wrapper_t<__nv_dl_tag<void (*)(), &(void CATCH2_INTERNAL_TEMPLATE_TEST_0<key_pair_type<int>, value_pair_type<int> >()), 3u>>, thrust::counting_iterator<int, thrust::use_default, thrust::use_default, thrust::use_default>, thrust::use_default, thrust::use_default>, __nv_dl_wrapper_t<__nv_dl_tag<void (*)(), &(void CATCH2_INTERNAL_TEMPLATE_TEST_0<key_pair_type<int>, value_pair_type<int> >()), 9u>, cuco::static_map<key_pair_type<int>, value_pair_type<int>, (cuda::std::__4::__detail::thread_scope)1, cuco::cuda_allocator<char> >::device_view>, CUstream_st*) [clone .constprop.0] [0x2f33a]
=========                in /home/wence/Documents/src/rapids/first-party/cucollections/build-12-2/tests/./STATIC_MAP_TEST_DEVICE_DEBUG
=========     Host Frame:void CATCH2_INTERNAL_TEMPLATE_TEST_0<key_pair_type<int>, value_pair_type<int> >() [0x3608f]
=========                in /home/wence/Documents/src/rapids/first-party/cucollections/build-12-2/tests/./STATIC_MAP_TEST_DEVICE_DEBUG
=========     Host Frame:Catch::RunContext::runCurrentTest(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&) [0x18d898]
=========                in /home/wence/Documents/src/rapids/first-party/cucollections/build-12-2/tests/./STATIC_MAP_TEST_DEVICE_DEBUG
=========     Host Frame:Catch::RunContext::runTest(Catch::TestCaseHandle const&) [0x18dc38]
=========                in /home/wence/Documents/src/rapids/first-party/cucollections/build-12-2/tests/./STATIC_MAP_TEST_DEVICE_DEBUG
=========     Host Frame:Catch::Session::runInternal() [0x162b28]
=========                in /home/wence/Documents/src/rapids/first-party/cucollections/build-12-2/tests/./STATIC_MAP_TEST_DEVICE_DEBUG
=========     Host Frame:Catch::Session::run() [0x162eb0]
=========                in /home/wence/Documents/src/rapids/first-party/cucollections/build-12-2/tests/./STATIC_MAP_TEST_DEVICE_DEBUG
=========     Host Frame:main [0x27f90]
=========                in /home/wence/Documents/src/rapids/first-party/cucollections/build-12-2/tests/./STATIC_MAP_TEST_DEVICE_DEBUG
=========     Host Frame:../sysdeps/nptl/libc_start_call_main.h:58:__libc_start_call_main [0x29d90]
=========                in /lib/x86_64-linux-gnu/libc.so.6
=========     Host Frame:../csu/libc-start.c:379:__libc_start_main [0x29e40]
=========                in /lib/x86_64-linux-gnu/libc.so.6
=========     Host Frame:_start [0x28035]
=========                in /home/wence/Documents/src/rapids/first-party/cucollections/build-12-2/tests/./STATIC_MAP_TEST_DEVICE_DEBUG

I think that the template-specialised bitwise_compare implementations:

template <>
struct bitwise_compare_impl<4> {
  __host__ __device__ inline static bool compare(char const* lhs, char const* rhs)
  {
    return *reinterpret_cast<uint32_t const*>(lhs) == *reinterpret_cast<uint32_t const*>(rhs);
  }
};

template <>
struct bitwise_compare_impl<8> {
  __host__ __device__ inline static bool compare(char const* lhs, char const* rhs)
  {
    return *reinterpret_cast<uint64_t const*>(lhs) == *reinterpret_cast<uint64_t const*>(rhs);
  }
};

are invoking UB since there is no requirement that the char * pointer inputs have enough alignment to be cast to uint64_t *.

are invoking UB since there is no requirement that the char * pointer inputs have enough alignment to be cast to uint64_t *

In general, yes, but we control all of the addresses that are passed to bitwise_compare and can guarantee their alignment. We should only ever be comparing either an address in the storage of the container (which is already aligned), the empty sentinel value, or some user-provided value.

That said, we may need to be a bit more pedantic about ensuring proper alignment for values that aren't part of the container storage, especially for potentially under-aligned types. For example,

T empty_sentinel_; ///< Sentinel value

should probably be:

alignas(sizeof(T)) T empty_sentinel_;

As alignof(T) < sizeof(T) could be true.

Likewise here:

__device__ constexpr equal_result operator()(T const& lhs, U const& rhs) const noexcept
{
return cuco::detail::bitwise_compare(lhs, empty_sentinel_) ? equal_result::EMPTY
: this->equal_to(lhs, rhs);

could be:

  __device__ constexpr equal_result operator()(T const& lhs, U const& rhs) const noexcept
  {
   alignas(sizeof(T)) T const __a{lhs};
   return cuco::detail::bitwise_compare(__a, empty_sentinel_) ? equal_result::EMPTY
                                                               : this->equal_to(lhs, rhs);