Confirmation about the order of tensor dimensions

Question

Confirmation about the order of tensor dimensions

mwmercury opened this issue 9 months ago · comments

Hello! Thank you so much for developing and sharing this awesome library!
Can I ask a silly question?

I'm investigating the source code of gpt-neox and I see these lines from convert-h5-to-ggml.py file:

    for i in range(n_dims):
        fout.write(struct.pack("i", data.shape[n_dims - 1 - i]))
    fout.write(str)

Could you please tell me why do we need to store in the reverse order?
Thank you in advance!

Yavor Ivanov · Answer 1 · Sun Sep 03 2023 22:38:51 GMT+0800 (China Standard Time)

This is done, because the dimension order in GGML is the reverse of the dimension order used in PyTorch. In PyTorch the order is N x C x H x W. In GGML it is W x H x C x N.

N - the batch dimension, C - the channel dimension, H - number of rows, and W - number of columns.

In GGML W x H x C X N correspond to the ne[0] x ne[1] x ne[2] x ne[3] members of a tensor.

Say we have a 4 dimension tensor named "t" in both GGML and PyTorch.
Here is the correspondence:

GGML	PyTorch
t->ne[0]	t.shape[3]
t->ne[1]	t.shape[2]
t->ne[2]	t.shape[1]
t->ne[3]	t.shape[0]

mwmercury · Answer 2 · Sun Sep 03 2023 23:55:41 GMT+0800 (China Standard Time)

@YavorGIvanov
Thank you very much for your kind response!
I have another question: Why was the decision made to reverse the order of dimensions compared to the one used in PyTorch? Is there a specific reason, such as improved memory management or performance, for this choice?

Yavor Ivanov · Answer 3 · Mon Sep 04 2023 03:23:01 GMT+0800 (China Standard Time)

Here ggerganov will be able to provide the best answer, but I don't think the library was intended to match any other deep learning library and the initial version was written relatively quickly.

As you design the data type representing a tensor, you may decide to limit the dimensions to a fixed upper number and then use a static C++ array comprised of integers in order to store the size of each dimensions + dimension count integer. This avoids dynamic allocation of the dimension array making it more memory/cache friendly. Also it has the pro of making all tensors have the same dimension array (ne[4]) + count. However, in order to make the dimension array easy to use you need to store the dimensions in sequantial order.

E.g. This makes it easy to compare the width dimension of 2D, 3D and 4D tensor as you know that all of their width dimensions size is at ne[0] instead of ne[1], ne[2] and ne[3]

Yavor Ivanov · Answer 4 · Mon Sep 04 2023 23:52:29 GMT+0800 (China Standard Time)

If you have any additional questions, you can reopen the issue or open a new one with label "question".

mwmercury · Answer 5 · Tue Sep 05 2023 19:34:11 GMT+0800 (China Standard Time)

@YavorGIvanov
I'm sorry for my late response.
Very detailed and helpful explanation. Thank you so much!