ggml_allocr_alloc_graph allocated overlapping tensor memory
bssrdf opened this issue · comments
Hi, I have encountered a strange issue using ggml_allocr_alloc_graph
to allocate tensor data. When building the graph, I used no-alloca
context and later used ggml_allocr_alloc_graph
to allocate all tensors' data. However, I noticed two particular tensors have exactly the same memory address for their data
member. Is this a bug?
You can replicate the issue using my branch here. After building ggml
, run ./bin/test-alloc-graph
.
This is not bug, it is actually the main function of ggml-alloc. The memory of the tensors with intermediate results is reused as soon as they aren't needed anymore to reduce the size of the compute buffers. If you want every tensor to have a different address, you can use a context without no_alloc
, or ggml_backend_alloc_ctx_tensors
.
If you only want to inspect the results of intermediate computations, you can also compute the graph one node at a time, such as:
for (int i = 0; i < g1->n_nodes; i++) {
struct ggml_tensor * t1 = g1->nodes[i];
struct ggml_cgraph g1v = ggml_graph_view(g1, i, i + 1);
ggml_backend_graph_compute(backend, &g1v);
}
There was also a callback added to ggml_backend_sched
for this purpose in ggerganov/llama.cpp#4935.
If you want to keep some of the intermediate results, the recommended approach would be pre-allocate some tensors in a different buffer and use ggml_cpy
to copy the result there. Technically it is also possible to add a dependency at the end of the graph with a no-op such as ggml_scale(ctx, a, 1)
, but I wouldn't recommend that.
Thanks for the quick response.
Sorry I am new to ggml
. I understand this memory overwrite is fine for inference (i.e., forward compute). How about backward compute? Won't this memory overwrite defeat the purpose of backpropagation for training? I found out this behavior when trainning a VAE.
I don't know much about training, but I believe that the way the training examples in llama.cpp handle this is by adding dependencies at the end of the graph with ggml_scale(ctx, a, 1)
, which may be the best way to do this at the moment if you need to keep a lot of the intermediate results.
Thanks for the suggestions.