ggml_allocr_alloc_graph allocated overlapping tensor memory

Question

ggml_allocr_alloc_graph allocated overlapping tensor memory

bssrdf opened this issue 4 months ago · comments

Hi, I have encountered a strange issue using ggml_allocr_alloc_graph to allocate tensor data. When building the graph, I used no-alloca context and later used ggml_allocr_alloc_graph to allocate all tensors' data. However, I noticed two particular tensors have exactly the same memory address for their data member. Is this a bug?

You can replicate the issue using my branch here. After building ggml, run ./bin/test-alloc-graph.

The graph is a simple one:

slaren · Answer 1 · Thu Jan 18 2024 09:37:44 GMT+0800 (China Standard Time)

This is not bug, it is actually the main function of ggml-alloc. The memory of the tensors with intermediate results is reused as soon as they aren't needed anymore to reduce the size of the compute buffers. If you want every tensor to have a different address, you can use a context without no_alloc, or ggml_backend_alloc_ctx_tensors.
If you only want to inspect the results of intermediate computations, you can also compute the graph one node at a time, such as:

    for (int i = 0; i < g1->n_nodes; i++) {
        struct ggml_tensor * t1 = g1->nodes[i];
        struct ggml_cgraph g1v = ggml_graph_view(g1, i, i + 1);
        ggml_backend_graph_compute(backend, &g1v);
    }

There was also a callback added to ggml_backend_sched for this purpose in ggerganov/llama.cpp#4935.
If you want to keep some of the intermediate results, the recommended approach would be pre-allocate some tensors in a different buffer and use ggml_cpy to copy the result there. Technically it is also possible to add a dependency at the end of the graph with a no-op such as ggml_scale(ctx, a, 1), but I wouldn't recommend that.

bssrdf · Answer 2 · Thu Jan 18 2024 09:49:50 GMT+0800 (China Standard Time)

Thanks for the quick response.

Sorry I am new to ggml. I understand this memory overwrite is fine for inference (i.e., forward compute). How about backward compute? Won't this memory overwrite defeat the purpose of backpropagation for training? I found out this behavior when trainning a VAE.

slaren · Answer 3 · Thu Jan 18 2024 09:54:20 GMT+0800 (China Standard Time)

I don't know much about training, but I believe that the way the training examples in llama.cpp handle this is by adding dependencies at the end of the graph with ggml_scale(ctx, a, 1), which may be the best way to do this at the moment if you need to keep a lot of the intermediate results.

bssrdf · Answer 4 · Thu Jan 18 2024 10:17:53 GMT+0800 (China Standard Time)

Thanks for the suggestions.