NVIDIA / MatX

An efficient C++17 GPU numerical computing library with Python-like syntax

Home Page:https://nvidia.github.io/MatX

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[BUG] matxException thrown

siegLoesch opened this issue · comments

Describe the bug
During execution of example1_simple_slice an exception is thrown:
Tensor{int32_t} Rank: 2, Sizes:[2, 2], Strides:[4,1]
terminate called after throwing an instance of 'matx::detail::matxException'
what(): matxException (matxNotSupported: mtype == CU_MEMORYTYPE_HOST || mtype == 0: Invalid memory type for printing) - /opt/matx/include/matx/core/tensor_utils.h:822
Aborted

To Reproduce
Steps to reproduce the behavior:
1a. Taking cmake with CMakeLists.zip to configure and build the cuda file:

#include <matx.h>

using namespace matx;

int main() {
	auto t2 = make_tensor<int>({5, 4});
	// Initialize the tensor linearly
	t2.SetVals({
		{1, 2, 3, 4},
		{5, 6, 7, 8},
		{9, 10, 11, 12},
		{13, 14, 15, 16},
		{17, 18, 19, 20}
	});
	// TODO: Create a slide of the view t2 starting at the second element and
	// ending at the third element (inclusive) in both dimensions
	auto t2s = slice(t2, {1,1}, {3,3});
	t2s.PrefetchDevice(0);
	print(t2s);
	return 0;
}

2a. Running the resulting executable ex1SimpleSlice throws above exception.
To countercheck:
1b. Goto directory MatX/docs_input/notebooks
2b. Modify the shell script compile_and_run.sh inside subdir exercises as can be seen inside
compile_and_run.zip
(At least change 'ENABLE_CUTLASS' to 'MATX_ENABLE_CUTLASS', add -lnvToolsExt, add -lcuda and adapt the python version).
2b. Run bash exercises/compile_and_run.sh example1_simple_slice whichs throws the same exception.

Expected behavior
A print of the sliced tensor.

Code snippers
See above attached files.

System details (please complete the following information):

  • OS: Debian 11, Bullseye
  • CUDA version: [e.g CUDA 11.8]
  • g++ version: [e.g. 10.2.1]

Additional context
As soon as the print commant inside the cuda source file is commented out the execution terminates without error.
Appreciate any help from your side to proceed with the training material - thank you!
Siegfried

Hi @siegLoesch, I tried to reproduce this on my platform and saw:

Tensor{int32_t} Rank: 2, Sizes:[2, 2], Strides:[4,1]
000000: 6 7
000001: 10 11

That error indicates that it can't detect the type of pointer it is for printing. Can you please print out what mtype is right before that assert?

Hello @cliffburdick, the value of mtype before the assert:
mtype before assert = 2
I checked also the Create and Permute examples which do not output mtype at all. I assume they do not enter the respective if clause:

#ifdef __CUDACC__
  cudaDeviceSynchronize();
  if constexpr (is_tensor_view_v<Op>) { ...

Thanks for your help and kind regards
Siegfried

That's interesting -- which GPU is this? I'm not sure why this wouldn't be triggering in our code. I can submit a patch if I can't reproduce it

The GPU is: GeForce RTX 2060
BR
Siegfried

Hi @siegLoesch can you please try the print_device branch?

Hello @cliffburdick,
that works well. Output from example1_simple_slice is (added original tensor output to check correctness of slice):
Tensor{int32_t} Rank: 2, Sizes:[5, 4], Strides:[4,1]
000000: 1 2 3 4
000001: 5 6 7 8
000002: 9 10 11 12
000003: 13 14 15 16
000004: 17 18 19 20
Tensor{int32_t} Rank: 2, Sizes:[2, 2], Strides:[4,1]
000000: 6 7
000001: 10 11

Tested also with example1_init and example1_permute which also yield correct results.

Thank you for your efforts and kind regards
Siegfried

Resolved by #436