moving tensors back and forth between CPU and GPU?

Question

moving tensors back and forth between CPU and GPU?

sflc6 opened this issue 6 years ago · comments

Super sorry if this is obvious, but -- how do I copy a tensor from CPU -> GPU and vice versa? I've been looking through the documentation, and can't seem to find how to do this?

Sam Stites · Answer 1 · Fri Apr 06 2018 04:09:10 GMT+0800 (China Standard Time)

There might be a better way to do this, but ATen compiles the following functions in THCTensorCopy:
https://github.com/zdevito/ATen/blob/master/src/THC/generic/THCTensorCopy.h#L37

There might also be some convenience functions like how pytorch lets you tensor.cuda() and tensor.cpu()

Christoph Hofer · Answer 2 · Thu Jun 21 2018 14:52:38 GMT+0800 (China Standard Time)

I'm also running against a wall here. The api is a little confusing for me here. For me it looks like one should do sthg like

data int32_t[] = ...; // data contains not only zeros
t_cpu = CPU(kInt).tensorFromBlob(&data[0], {10, 10}); // here content is fine
t_gpu = t_cpu.toType(t_cpu.type().toBackend(kCUDA).toScalarType(kInt)); // contains just zero

but this seems wrong as t_gpu contains just zeros after the operation.

@ezyang What am i missing?
@ezyang @zdevito It would be really great if a how to could be added to the readme file of ATen since loading data and the moving to gpu is a very common workflow imho :)

many thanks chofer

Edward Z. Yang · Answer 3 · Mon Jun 25 2018 03:10:20 GMT+0800 (China Standard Time)

If you are running reasonably recent master, I think the following should work:

at::Tensor t_gpu = t_cpu.to(at::kCUDA);

We should make t_cpu.cuda() work though...

CC @goldsborough

Christoph Hofer · Answer 4 · Mon Jun 25 2018 13:32:17 GMT+0800 (China Standard Time)

Hi,

my HEAD is 372d1d67356f054db64bdfb4787871ecdbbcbe0b.

to is not yet implemented, so it seems.

However, it looks like to problem is the creation with fromBlob(...). If i create a Tensor differently I can move it between CPU and GPU by using the toBackend method of the Tensor class, eg.
my_cpu_tensor.toBackend(Backend::CUDA); .

My workaround to bring externally allocated cpu data on the gpu in a tensor:

create array data on cpu
use cuda malloc, memcopy to bring it on the gpu
create a tensor with fromBlob from the allocated data
clone the tensor (in order not to mess with ATens memory management engine?)
cudafree the allocated space.

So from my point of view it seems that there is a transportation issue of the memory from the wild to the ATen controlled regime. But its just a guess ;)

cheers c.hofer

Soumith Chintala · Answer 5 · Mon Jun 25 2018 13:59:12 GMT+0800 (China Standard Time)

@c-hofer look at https://github.com/zdevito/ATen/blob/31d00ab7fdf00c258b0fad5b1b05af77e92b64a9/aten/src/ATen/test/dlconvertor_test.cpp

You can use the DLPack format which is a cross-framework, well-specified and simple format that we support importing from: https://github.com/dmlc/dlpack/

Christoph Hofer · Answer 6 · Mon Jun 25 2018 14:37:03 GMT+0800 (China Standard Time)

Thx, that's a valuable hint :)

Peter Goldsborough · Answer 7 · Mon Jun 25 2018 15:51:45 GMT+0800 (China Standard Time)

You can also clone on the CPU first and then move it to GPU, if that's feasible: CPU(kInt).tensorFromBlob(&data[0], {10, 10}).clone().toBackend(at::kCUDA).
The to() functions landed 6 days ago and are on master here: https://github.com/zdevito/ATen/blob/master/aten/src/ATen/templates/Tensor.h#L90

Christoph Hofer · Answer 8 · Mon Jun 25 2018 18:12:04 GMT+0800 (China Standard Time)

thx, this is surely more elegant ... by the way, any plans when the new ATen api will be more or less stable?