Tensor factories (and functions) should accept Tensors/Scalars as arguments to reduce sync points (?)

Question

Tensor factories (and functions) should accept Tensors/Scalars as arguments to reduce sync points (?)

c-hofer opened this issue 6 years ago · comments

Currently if you want to create a new Tensor from specifications residing on gpu a sync is needed (imho):
It would be greate if sthg like the following could work:

... my_func(...){ 
...
my_gpu_tensor = ...; // size N x 1
//now we want to create a new M x 1 tensor where M = my_gpu_tensor[N][0] 
auto new_size = Scalar(my_gpu_tensor[N][0]);
auto new_tensor = my_gpu_tensor.type({new_size}).

//or a Tensor of size my_gpu_tensor.slice(0, N-2) 
auto new_tensor_2 = my_gpu_tensor.type(my_gpu_tensor.slice(0, N-2).squeeze());
...
}

Currently I have to do first sthg like new_size = new_size.to<int>() to make it work.
But from my understanding this introduces a device->host action.
Hence, it interrupts the asynchronous nature of the gpu calls and it prevents me from calling
my_func asynchronously on several streams and then waiting together.

As I am not deep enough in the ATen sources, is it technically possible with reasonable effort to make this work? Or is there actually a way to do this I did not see?

regards c.hofer