brianc / node-pg-copy-streams

COPY FROM / COPY TO for node-postgres. Stream from one database to another, and stuff.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Memory Usage directly proportional to the data

dennismphil opened this issue · comments

I used this library to load records from an Oracle database to a Postgres database. However the node process crashes 💣 after running a while with a garbage collection exception.

I have narrowed down the script just to include node-pg-copy-streams and read from a file on disk.

Test with a small file small.file

> ls -lh small.file
387B 
> node -v
v8.11.3

Instrumented the script to measure Resident Set Size and reported memory usage 25MB

Starting Memory Usage 24 MB
Successfully truncated table timeline.score
The script used approximately 25 MB

Not bad. 🆒

Now tried the same program with a large file big.file

> ls -lh big.file
214M

This time the memory usage reported is directly proportional to the input given 244MB‼️

> node scripts/test.js
Starting Memory Usage 24 MB
Successfully truncated table timeline.score
The script used approximately 244 MB
Done

Relevant portions of the script:

const inputFileStream = fs.createReadStream('./big.file');
...
const writeClient = await this.writePoolPg.connect();
const copyQueryStream = copyFrom(`COPY ${destTableName} FROM STDIN`);

const writeStream = writeClient.query(copyQueryStream);
inputFileStream
    .pipe(writeStream);

Full code

I suspect the library is not providing the advantage we would expect from streams where one would expect memory to not grow directly proportional to the input.

Hello,
thanks you for this report. I have not experienced such a leak myself but it seems worth understanding what is happening here.

Could you launch the same test with a file of several GB to confirm that the leak is proportional at such sizes ? afaik 244MB could still be in the not-yet-garbage-collected space or could be space allocated to the node process by the system.

I just terminated early. It reached the 2GB node memory limit and crashes with an out of bound memory exception

I'm experiencing high memory usage as well. The relationship is not linear, but it's definitely using a lot of memory. I'm trying to move ~220,000 records from bigquery to a postgres database. Memory usage climbs to nearly 3GB, drops some, then hovers around that mark after calling the done callback from within connect. It also hangs for a very long time after calling done. I was initially trying closer to 500,000 records but it would just keep using memory until about 7gb (which is the limit on the system I'm using) and was killed by the kernel.

I tried taking a memory snapshot, but I'm not that familiar with reading those. I saw arrays that were 100K objects or more in memory though, and they seemed to be attached to copyFrom.

Any hints as to how I can help debug this further? Or some way to get it to flush or something so it's not holding so many objects in memory?

Sorry for the delay on this. I have unfortunately been AFK for most of 2018 and am looking into this issue. If you have new experience/results to share on this do not hesitate !

Do you know of a node project that has tests around issues like these (memory&cpu usage with streams) ?

I wrote a script to try and replicate the memory issue mentioned in the issue.
To generate data, I use the seq unix command

seq 0 29999999

This is equivalent to a 246MB file.
the result is rssMin:22.52MB rssMax:60.75MB

seq 0 199999999

This is equivalent to a 1GB file.
the result is rssMin:20.52MB rssMax:55.21MB

At this stage I can't reproduce the issue. If you still use this module, it would be very helpful if you can test the bench/copy-from-memory.js script I just commited. Otherwise I'll close the issue in a few days.

Thanks for your help.

Running into this problem as well. When I run it on a Windows machine, nodejs memory usage alternates between 60 and 80 MB, but when I attempt to run it on Linux, the memory usage just keeps rising until the program crashes. In both tests I used Node 8. The Windows machine is using Postgres 9.6 and the Linux machine is using Postgres 10.7. In both cases the database in on the same machine as the nodejs program.

Hello, thanks for the report.
Are the results you mention coming from the "bench/copy-from-memory.js" script ?
My tests are on linux / postgres 9.6 and the memory seem to be under control.

We need to find a basic script that highlights the issue.

Hhhmm, everything works with that script. I will check to make sure that is it this library that is causing the memory leak and not something else in my program.

It would be very interesting to understand the difference between this script and your program to understand where the memory keeps growing.

On closer inspection of the heap dump, it seems that this library was a red herring and the memory leak was actually coming a different library I was using for the stream that I'm feeding into the copy stream, which doesn't seem to be correctly disposing its data as the PgCopyStream reads from it, and ends up with an absurdly long linked list of now useless data.

ok @mpharoah-d2l thanks for investigating. I will leave this issue open a few weeks more in case someone manages to find a test case highlighting an issue with the library.