zdevito / ATen

ATen: A TENsor library for C++11

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

moving tensors back and forth between CPU and GPU?

sflc6 opened this issue · comments

commented

Super sorry if this is obvious, but -- how do I copy a tensor from CPU -> GPU and vice versa? I've been looking through the documentation, and can't seem to find how to do this?

There might be a better way to do this, but ATen compiles the following functions in THCTensorCopy:
https://github.com/zdevito/ATen/blob/master/src/THC/generic/THCTensorCopy.h#L37

There might also be some convenience functions like how pytorch lets you tensor.cuda() and tensor.cpu()

I'm also running against a wall here. The api is a little confusing for me here. For me it looks like one should do sthg like

data int32_t[] = ...; // data contains not only zeros
t_cpu = CPU(kInt).tensorFromBlob(&data[0], {10, 10}); // here content is fine
t_gpu = t_cpu.toType(t_cpu.type().toBackend(kCUDA).toScalarType(kInt)); // contains just zero 

but this seems wrong as t_gpu contains just zeros after the operation.

@ezyang What am i missing?
@ezyang @zdevito It would be really great if a how to could be added to the readme file of ATen since loading data and the moving to gpu is a very common workflow imho :)

many thanks chofer

If you are running reasonably recent master, I think the following should work:

at::Tensor t_gpu = t_cpu.to(at::kCUDA);

We should make t_cpu.cuda() work though...

CC @goldsborough

Hi,

my HEAD is 372d1d67356f054db64bdfb4787871ecdbbcbe0b.

to is not yet implemented, so it seems.

However, it looks like to problem is the creation with fromBlob(...). If i create a Tensor differently I can move it between CPU and GPU by using the toBackend method of the Tensor class, eg.
my_cpu_tensor.toBackend(Backend::CUDA); .

My workaround to bring externally allocated cpu data on the gpu in a tensor:

  1. create array data on cpu
  2. use cuda malloc, memcopy to bring it on the gpu
  3. create a tensor with fromBlob from the allocated data
  4. clone the tensor (in order not to mess with ATens memory management engine?)
  5. cudafree the allocated space.

So from my point of view it seems that there is a transportation issue of the memory from the wild to the ATen controlled regime. But its just a guess ;)

cheers c.hofer

@c-hofer look at https://github.com/zdevito/ATen/blob/31d00ab7fdf00c258b0fad5b1b05af77e92b64a9/aten/src/ATen/test/dlconvertor_test.cpp

You can use the DLPack format which is a cross-framework, well-specified and simple format that we support importing from: https://github.com/dmlc/dlpack/

Thx, that's a valuable hint :)

You can also clone on the CPU first and then move it to GPU, if that's feasible: CPU(kInt).tensorFromBlob(&data[0], {10, 10}).clone().toBackend(at::kCUDA).
The to() functions landed 6 days ago and are on master here: https://github.com/zdevito/ATen/blob/master/aten/src/ATen/templates/Tensor.h#L90

thx, this is surely more elegant ... by the way, any plans when the new ATen api will be more or less stable?