Issue with memory consumption

Question

Issue with memory consumption

shyams2 opened this issue 7 years ago · comments

Shyam Sundar Sankaran commented 7 years ago

I think there may be a memory allocation issue with the Python interface to ArrayFire . Consider the following code:

In [1]: import arrayfire as af

In [2]: a = af.randu(100, 100, 100, 100)

In [3]: af.print_mem_info()
Memory Info
---------------------------------------------------------
|     POINTER      |    SIZE    |  AF LOCK  | USER LOCK |
---------------------------------------------------------
|  0x7f4c9cb90010  |   381.5 MB |       Yes |        No |
|       0x26ff790  |       1 KB |        No |        No |
---------------------------------------------------------

In [4]: a = af.randu(100, 100, 100, 100)

In [5]: af.print_mem_info()
Memory Info
---------------------------------------------------------
|     POINTER      |    SIZE    |  AF LOCK  | USER LOCK |
---------------------------------------------------------
|  0x7f4c80287010  |   381.5 MB |       Yes |        No |
|  0x7f4c9cb90010  |   381.5 MB |        No |        No |
|       0x26ff790  |       1 KB |        No |        No |
---------------------------------------------------------

In [6]: del a

In [7]: af.print_mem_info()
Memory Info
---------------------------------------------------------
|     POINTER      |    SIZE    |  AF LOCK  | USER LOCK |
---------------------------------------------------------
|  0x7f4c9cb90010  |   381.5 MB |        No |        No |
|  0x7f4c80287010  |   381.5 MB |        No |        No |
|       0x26ff790  |       1 KB |        No |        No |
---------------------------------------------------------

In [4] why is a being assigned to a new memory location with the older memory location not being deallocated. Additionally del a seems to only remove the lock. What is happening here?

Even when using a[:] to change the values of the array, this is the result:

In [2]: a = af.randu(100, 100, 100, 100)

In [3]: af.print_mem_info()
Memory Info
---------------------------------------------------------
|     POINTER      |    SIZE    |  AF LOCK  | USER LOCK |
---------------------------------------------------------
|  0x7fcf5490e010  |   381.5 MB |       Yes |        No |
|       0x1e1d430  |       1 KB |        No |        No |
---------------------------------------------------------

In [4]: a[:] = 0

In [5]: af.print_mem_info()
Memory Info
---------------------------------------------------------
|     POINTER      |    SIZE    |  AF LOCK  | USER LOCK |
---------------------------------------------------------
|  0x7fcf5490e010  |   381.5 MB |       Yes |        No |
|  0x7fcf38287010  |   381.5 MB |        No |        No |
|       0x1e1d430  |       1 KB |        No |        No |
---------------------------------------------------------

A similar trial using numpy shows:

In [1]: import numpy as np

In [2]: a = np.random.rand(100, 100, 100, 100)

In [3]: # htop shows that 762.939453125 MiB is taken as expected

In [4]: a = np.random.rand(100, 100, 100, 100)

In [5]: # memory consumption remains at  762.939453125 MiB

In [6]: del a # memory gets freed

Pavan Yalamanchili · Answer 1 · Wed Jul 19 2017 23:41:03 GMT+0800 (China Standard Time)

@ShyamSS-95 This is because the memory can only be "deleted" / "freed" when all references to a are gone. These references only go away after = is called. I am pretty sure numpy is also using 762MB x 2 when np.random.rand() is called on the RHS and freeing 762 MB once it assigns to a.

You only notice the issue in arrayfire because of the memory manager (it says onl 381 B is being used, the rest ca be reused again). If you deleted a before assigning to it then it wouldn't be a problem.

That said a[:] = 0 shouldn't use more memory. I'll look into that.

Mani Chandra · Answer 2 · Thu Jul 20 2017 18:55:50 GMT+0800 (China Standard Time)

@pavanky You are right about numpy allocating memory and we confirmed it. This is our (mine and @ShyamSS-95) understanding of the process:

In [2]: a = af.randu(100, 100, 100, 100)
In [4]: a = af.randu(100, 100, 100, 100)

In [2], the causal sequence of events is the following

The call to af.randu allocates memory
The pointer to the memory is passed to the variable a

Now in [4]:

The call to af.randu allocates additional memory at a distinct location from that allocated earlier
The pointer to the new location is then passed into a
The memory allocated earlier is now not accessible, and I think this is what AF LOCK = NO means.
The earlier memory should have been cleared up by Python's garbage collector since all references to it are gone. But that doesn't seem to be happening. Could it be because of some bug in af-python?

Pavan Yalamanchili · Answer 3 · Thu Jul 20 2017 22:41:45 GMT+0800 (China Standard Time)

@mchandra @ShyamSS-95

The memory is "free" as in it is no longer locked by anything. It can be reused by a new array that tries to allocate memory. You can verify this by assigning to a or a new variableb.

If for some reason you are running out of memory and the memory manager isnt doing its job, call af.device_gc() as a work around and file a bug. Otherwise using the memory manager is much faster than having constant memory allocs and frees.

Pavan Yalamanchili · Answer 4 · Fri Jul 21 2017 16:31:38 GMT+0800 (China Standard Time)

An example showing the memory manager at work: https://gist.github.com/pavanky/6a31486ba5d45c0484e2900bcac37362