Option for having output tensors allocated in device memory?

Question

Option for having output tensors allocated in device memory?

jkrause1 opened this issue 2 years ago · comments

Hello,

I'm loading a model of a frozen graph and run it. I then check for the device of the resulting output-tensors and they all return to me
/job:localhost/replica:0/task:0/device:CPU:0
implying they reside in the host memory. I don't know if this is a result of how the graph is constructed, or if there are options missing I have to set, but I would prefer if they stay in the device memory, so I can access and process the data further via CUDA.

Miftah Bedru · Answer 1 · Wed Mar 01 2023 11:13:29 GMT+0800 (China Standard Time)

Hi ,
were you able to solve this ? I faced problem for a related task, meaning loading frozen_model - In my case , the error I got is terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
I can see it has something to do with memory. When I was loading the original model , there was no issue at all. I faced this only after trying the frozen model. The two model are nearly same in size but there could be difference in structure of the graph though which I didn't check.

Here is how I load my frozen model :
cppflow::model model("Froozen_model_dir", cppflow::model::TYPE::FROZEN_GRAPH);

and here is how I call inference on it with sampple input
output_tensor = model(input_1);

and I got this :

terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
Aborted (core dumped)

Any tip on how to solve this