sugarme / gotch

Go binding for Pytorch C++ API (libtorch)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Can memory leak in tensor-generated.go because of malloc(0) ?

pjongy opened this issue · comments

Generating null ptr by malloc(0) can make danling pointer not be freed even if AtFree called. (seems getting a way out from GC pool)

I'm not sure why it happens (I think current code is ok as well), It was an experimental bug detection and fixing.

You can check this by:

package main

import (
	"fmt"
	"sync"

	"github.com/sugarme/gotch"
	"github.com/sugarme/gotch/ts"
)


func main() {
	wg := &sync.WaitGroup{}
	wg.Add(1)
	for j :=0; j< 10; j++ {
		wg.Add(1)
		go func(kk int) {
			for i := 0; i < 100000; i++ {
				if i % 1000 == 0 {
					fmt.Printf("ran(%d): %d\n", kk, i)
				}
				a := ts.MustRand([]int64{3, 3, 3}, gotch.Float, gotch.CPU)
				permuted, _ := a.Permute([]int64{2, 1, 0}, false)
				a.MustDrop()
				permuted.MustDrop()
			}
			wg.Done()
		}(j)
	}
	wg.Wait()
}

And monitor resident memory usage simultaneously by (in my case, dummy package name is aaa.com)

top | grep aaa.com

[AS-iS]
before this patch,

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
   8687 root      20   0 2452832 112028  53760 S 238.0   0.3   0:02.38 aaa.com
   8687 root      20   0 2453228 119072  53760 S 283.2   0.4   0:05.24 aaa.com
   8687 root      20   0 2453624 123960  53760 R 282.0   0.4   0:08.06 aaa.com
   8687 root      20   0 2453888 128864  53760 S 286.0   0.4   0:10.92 aaa.com
   8687 root      20   0 2454152 146696  53760 S 300.0   0.4   0:13.92 aaa.com
   8687 root      20   0 2454680 155456  53760 S 392.0   0.5   0:17.84 aaa.com
   8687 root      20   0 2455076 157896  53760 S 390.0   0.5   0:21.74 aaa.com
   8687 root      20   0 2455472 159972  53760 S 393.0   0.5   0:25.67 aaa.com
   8687 root      20   0 2456000 167944  53760 S 396.0   0.5   0:29.63 aaa.com
   8687 root      20   0 2456528 180096  53760 R 390.0   0.6   0:33.53 aaa.com
   8687 root      20   0 2456528 181904  53760 S 393.0   0.6   0:37.46 aaa.com
   8687 root      20   0 2530392 171344  53760 S 112.9   0.5   0:38.60 aaa.com
   8687 root      20   0 2530392 171344  53760 S   0.0   0.5   0:38.60 aaa.com
   8687 root      20   0 2530392 171344  53760 S   0.0   0.5   0:38.60 aaa.com
   8687 root      20   0 2530392 171344  53760 S   0.0   0.5   0:38.60 aaa.com
   8687 root      20   0 2530392 171344  53760 S   0.0   0.5   0:38.60 aaa.com

resident memory (phsical) slightly growing up and does not cleanup memory even all goroutine did their iteration (you can see 0% CPU usage)

[TO-BE]
and after this patch,

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
   8926 root      20   0 2453212 108656  53952 S 262.0   0.3   0:02.67 aaa.com
   8926 root      20   0 2526944 112464  53952 R 275.2   0.3   0:05.45 aaa.com
   8926 root      20   0 2526944 112680  53952 S 279.0   0.3   0:08.24 aaa.com
   8926 root      20   0 2526944 112368  53952 R 277.0   0.3   0:11.01 aaa.com
   8926 root      20   0 2526944 122176  53952 S 309.0   0.4   0:14.10 aaa.com
   8926 root      20   0 2600676 120784  53952 S 382.0   0.4   0:17.92 aaa.com
   8926 root      20   0 2600676 121784  53952 S 384.0   0.4   0:21.76 aaa.com
   8926 root      20   0 2600676 115468  53952 S 384.0   0.4   0:25.60 aaa.com
   8926 root      20   0 2600676 125228  53952 S 382.0   0.4   0:29.42 aaa.com
   8926 root      20   0 2600676 116288  53952 S 386.0   0.4   0:33.28 aaa.com
   8926 root      20   0 2600676 122680  53952 S 378.2   0.4   0:37.10 aaa.com
   8926 root      20   0 2600676 107004  53952 S  73.0   0.3   0:37.83 aaa.com
   8926 root      20   0 2600676 107004  53952 S   0.0   0.3   0:37.83 aaa.com

Seems GC can collect memory (In my opinion it is a kind of memory of pointer (8 byte about 1 ctensor's pointer))

(I'm sorry to request PR before raise any issues)

@pjongy ,

Thanks for working on gotch.

Please see my comments on your PR. Thanks.

Is it right you commented to #114 ??
I can't see any comment on it...?

I'm sorry but I still can not see your comment.
Could you check once again if you submit comment? @sugarme

@pjongy ,

You are right. Just submitted the comments. Ta.

close now as discussed

@sugarme , Could you re-open this issue?
I brought more graceful method about this issue #116