nguyenq / lept4j

Java JNA Wrapper for Leptonica Image Processing Library

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Did you observe a severe memory leak when using Leptonica?

johnny5550822 opened this issue · comments

I observed there is severe memory leak when I am using Leptonica, did you observe that? I tried explicitly calling System.gc() or using JNA Memory.disposeAll(). None of this work. So, I am guess there is memory leak issue when using lept4J.

Which version are you using of leptonica, os, jdk?

Make sure you call relevant xxxDestroy() methods after use of Pix, Box, etc. to release native resources.

I am using lept1.2.3.

There are no xxxDestroy() method. Where can I find it?

Here is part of my code:

Pix pix1, pix2, pix3, pix4, pix5;
Leptonica leptInstance = Leptonica.INSTANCE;
...
pix2 = leptInstance.pixCloseGray(pix1, 51, 1);

There are no pix1.Destroy() or pix2.Destroy() function.

@nguyenq Sorry, where did it locate? I cannot see it in the document link.

@nguyenq I found an example in the repo about pixDestroy.
https://github.com/nguyenq/lept4j/blob/master/src/test/java/net/sourceforge/lept4j/DewarpTest.java

But, i performed that and there is still memory leak. In fact, after doing pixDestroy(pix1), I checked and found that pix1 is not null.

Pix pix1, pix2;
Leptonica leptInstance = Leptonica.INSTANCE;
...
pix2 = leptInstance.pixCloseGray(pix1, 51, 1);
pixDestroy(pix1);
pixDestroy(pix2);

  void pixDestroy(Pix pix){
    if (pix == null){
      return;
    }
    PointerByReference pRef = new PointerByReference();
    pRef.setValue(pix.getPointer());
    leptInstance.pixDestroy(pRef);
  }

The example mirrors the original c example. pix1's pointer will be set to null, not pix1 itself, which will then be reclaimed by gc when it goes out of scope.

@nguyenq I am running the code in a for-loop. So, I suppose after each iteration the java garbage collector recollect the memory. But, in my case, the memory usage is keep stacking up even I do pixDestroy(). Are you suggesting I should explicitly do System.gc() to force the Java Garbage Collector to check and it may do something?

Pix pix1, pix2;
Leptonica leptInstance = Leptonica.INSTANCE;
...
for-loop
  pix2 = leptInstance.pixCloseGray(pix1, 51, 1);
  pixDestroy(pix1);
  pixDestroy(pix2);
  System.gc(); // should I do this?

void pixDestroy(Pix pix){
  if (pix == null){
    return;
  }
  PointerByReference pRef = new PointerByReference();
  pRef.setValue(pix.getPointer());
  leptInstance.pixDestroy(pRef);
}

I think I found the problem. It is a bit tricky to use leptInstance. IN the below code, I have pix2 be reassigned twice. Normally, in Java, this is fine and the garbage collector will know to collect the original object of pix2 (from the first assigment). However, in leptInstance, the original first object of pix2 will be kept somewhere in the memory. Therefore, what I should do is to instead assign a new variable to take the second assignment of pix2, i.e..

Original code with memory leak

Pix pix1, pix2;
Leptonica leptInstance = Leptonica.INSTANCE;
...
for-loop
  pix2 = leptInstance.pixCloseGray(pix1, 51, 1);
  pix2 = leptInstance.pixErodeGray(pix2, 5, 1);
  pixDestroy(pix1);
  pixDestroy(pix2);

void pixDestroy(Pix pix){
  if (pix == null){
    return;
  }
  PointerByReference pRef = new PointerByReference();
  pRef.setValue(pix.getPointer());
  leptInstance.pixDestroy(pRef);
}

What I should change is this:

...
Pix pix1, pix2, pix3;
...
for-loop
  pix2 = leptInstance.pixCloseGray(pix1, 51, 1);
  pix2 = leptInstance.pixCloseGray(pix1, 51, 1);
  pix3 = leptInstance.pixErodeGray(pix2, 5, 1);

Although the above problem is solved, but when I call convertPixToImage(pix3) in a for-loop, there is severe memory leak again. I am pretty sure I pixDestroy(Pix3), so the problem is probably not related to pix3

BufferedImage bi = Leptutils.convertPixToImage(pix3);

I think it is a bug because in C++ code, we need to free the memory of "data" explicitly, which in the lept4j java version I cannot find a function to do that.

According to this example:
https://github.com/charlesw/tesseract-vs/blob/master/ccstruct/imagedata.cpp

void ImageData::SetPixInternal(Pix* pix, GenericVector<char>* image_data) {
  l_uint8* data;
  size_t size;
  pixWriteMem(&data, &size, pix, IFF_PNG);
  pixDestroy(&pix);
  image_data->init_to_size(size, 0);
  memcpy(&(*image_data)[0], data, size);
  free(data);
}

Yes, that seems to be the C-language standard practice of having separate variables to hold objects and subsequent destroying of these objects after use. That's why you see many separate instances of Pix and others when working with Leptonica data structures. The variables are not reused to hold different or new objects.

@nguyenq So, is there any way to solve the problem that I have when we are using pixWriteMem?

@nguyenq I guess we may have to request function add to C code of leptonica to free certain memory?

@johnny5550822, can you narrow down to the line and function where memory may not get released? Is it in convertPixToImage or SetPixInternal? If you think it's in Leptonica code, you'll need to bring Leptonica developers' attention to it.

@nguyenq
From here in the leptUtils, in the function convertPixToImage, you will call:
Leptonica.pixWriteMem(pdata, psize, pix, format);

In this line, the memory of pdata is not freed, which you need to free explicitly in C code (as shown in above comment.) I have already post an error on their github page, but seem like no one take care of it. :(

See:
DanBloomberg/leptonica#195

You may want to search through Lept4J's Leptonica API to see if there is any xxxDestroy() function that can take pdata type. Is bbufferDestroy() applicable?

Or search JNA API for any method that can free a byte array.

@nguyenq I have searched, but cannot find anything about it. Do you?

@nguyenq is this problem being resolved?

Hey @johnny5550822 @nguyenq assigned this issue to me by email.
I will try to find a solution together with you, but i need to dive in, so i am going to need a simple junit test class from you alltogether with some sample images. Also the JDK, Maven and Lept4J Version.

Deal?

@4F2E4A2E Sound good to me. I will create a sample Junit test, and then submit a pull request. Where should this test code go to?

Best thing would be a pull request, else gist, else if all fails as a comment.

Sure. I will do a pull request then.

@4F2E4A2E Sorry for delay, I will try to submit pull request as soon as I can. I have been busy for my work.

@johnny5550822 A fix has been committed. Please check it out.

@nguyenq Awesome! I will try that out. Thanks.