[BUG] matxException thrown

Question

[BUG] matxException thrown

siegLoesch opened this issue a year ago · comments

Describe the bug
During execution of example1_simple_slice an exception is thrown:
Tensor{int32_t} Rank: 2, Sizes:[2, 2], Strides:[4,1]
terminate called after throwing an instance of 'matx::detail::matxException'
what(): matxException (matxNotSupported: mtype == CU_MEMORYTYPE_HOST || mtype == 0: Invalid memory type for printing) - /opt/matx/include/matx/core/tensor_utils.h:822
Aborted

To Reproduce
Steps to reproduce the behavior:
1a. Taking cmake with CMakeLists.zip to configure and build the cuda file:

#include <matx.h>

using namespace matx;

int main() {
	auto t2 = make_tensor<int>({5, 4});
	// Initialize the tensor linearly
	t2.SetVals({
		{1, 2, 3, 4},
		{5, 6, 7, 8},
		{9, 10, 11, 12},
		{13, 14, 15, 16},
		{17, 18, 19, 20}
	});
	// TODO: Create a slide of the view t2 starting at the second element and
	// ending at the third element (inclusive) in both dimensions
	auto t2s = slice(t2, {1,1}, {3,3});
	t2s.PrefetchDevice(0);
	print(t2s);
	return 0;
}

2a. Running the resulting executable ex1SimpleSlice throws above exception.
To countercheck:
1b. Goto directory MatX/docs_input/notebooks
2b. Modify the shell script compile_and_run.sh inside subdir exercises as can be seen inside
compile_and_run.zip
(At least change 'ENABLE_CUTLASS' to 'MATX_ENABLE_CUTLASS', add -lnvToolsExt, add -lcuda and adapt the python version).
2b. Run bash exercises/compile_and_run.sh example1_simple_slice whichs throws the same exception.

Expected behavior
A print of the sliced tensor.

Code snippers
See above attached files.

System details (please complete the following information):

OS: Debian 11, Bullseye
CUDA version: [e.g CUDA 11.8]
g++ version: [e.g. 10.2.1]

Additional context
As soon as the print commant inside the cuda source file is commented out the execution terminates without error.
Appreciate any help from your side to proceed with the training material - thank you!
Siegfried

Cliff Burdick · Answer 1 · Thu Jun 08 2023 02:19:45 GMT+0800 (China Standard Time)

Hi @siegLoesch, I tried to reproduce this on my platform and saw:

Tensor{int32_t} Rank: 2, Sizes:[2, 2], Strides:[4,1]
000000: 6 7
000001: 10 11

That error indicates that it can't detect the type of pointer it is for printing. Can you please print out what mtype is right before that assert?

Siegfried Loesch · Answer 2 · Thu Jun 08 2023 02:51:20 GMT+0800 (China Standard Time)

Hello @cliffburdick, the value of mtype before the assert:
mtype before assert = 2
I checked also the Create and Permute examples which do not output mtype at all. I assume they do not enter the respective if clause:

#ifdef __CUDACC__
  cudaDeviceSynchronize();
  if constexpr (is_tensor_view_v<Op>) { ...

Thanks for your help and kind regards
Siegfried

Cliff Burdick · Answer 3 · Thu Jun 08 2023 04:49:17 GMT+0800 (China Standard Time)

That's interesting -- which GPU is this? I'm not sure why this wouldn't be triggering in our code. I can submit a patch if I can't reproduce it

Siegfried Loesch · Answer 4 · Thu Jun 08 2023 14:35:49 GMT+0800 (China Standard Time)

The GPU is: GeForce RTX 2060
BR
Siegfried

Cliff Burdick · Answer 5 · Fri Jun 09 2023 04:56:55 GMT+0800 (China Standard Time)

Hi @siegLoesch can you please try the print_device branch?

Siegfried Loesch · Answer 6 · Sat Jun 10 2023 00:48:46 GMT+0800 (China Standard Time)

Hello @cliffburdick,
that works well. Output from example1_simple_slice is (added original tensor output to check correctness of slice):
Tensor{int32_t} Rank: 2, Sizes:[5, 4], Strides:[4,1]
000000: 1 2 3 4
000001: 5 6 7 8
000002: 9 10 11 12
000003: 13 14 15 16
000004: 17 18 19 20
Tensor{int32_t} Rank: 2, Sizes:[2, 2], Strides:[4,1]
000000: 6 7
000001: 10 11
Tested also with example1_init and example1_permute which also yield correct results.

Thank you for your efforts and kind regards
Siegfried

Cliff Burdick · Answer 7 · Sat Jun 10 2023 00:52:52 GMT+0800 (China Standard Time)

Resolved by #436