Large column value not split into chunks

Question

Large column value not split into chunks

pspi opened this issue 4 years ago · comments

When reading a large column value, only one large chunk covering the complete value gets emitted. I was expecting the stream to emit multiple smaller chunks.

Test

const { Client } = require("pg");
const { to: copyTo } = require("pg-copy-streams");

(async () => {
  const client = new Client();
  await client.connect();

  await client.query(`create temporary table temp (field text)`);
  await client.query(`insert into temp(field) values (repeat('-', 1048576))`);
  
  const fromDatabase = client.query(
    copyTo(`
    COPY (SELECT field FROM temp)
      TO STDOUT`)
  );

  fromDatabase.on("data", (chunk) => {
    console.log(`read ${chunk.length} bytes`);
  });

  fromDatabase.on("end", () => {
    client.end();
  });
})();

Output

read 1048577 bytes

For large values (for example BYTEA column that holds a file) emitting a single chunk slows down performance significantly because the emitted Buffer has to be allocated for several megabytes. To give you an idea, running this same test with larger field sizes:

field size	running time
1MB	0.1 s
10MB	0.9s
20MB	3.1s
30MB	6.4s

Versions: pg 8.0.2, pg-copy-streams 2.2.2, node v13.8.0, PostgreSQL 12.1 on x86_64-apple-darwin16.7.0

jeromew · Answer 1 · Thu Apr 16 2020 00:37:26 GMT+0800 (China Standard Time)

Hello,
thanks for the report.

I browsed the copy to implementation here - https://github.com/brianc/node-pg-copy-streams/blob/master/copy-to.js#L41
It is true that currently we gather a full tuple from the database before emitting it so I understand that with huge rows (or fields) the implementation is going to be memory-hungry (and perhaps time hungry too if memory allocation becomes slow after many rows).

I fear that some people using the library currently expect it to output full rows so we would have to think about the backward compatibility if we were to modify this behaviour.

I did not check the postgres documentation about this I remember that there was cases when postgres itself was handling full rows.

can I ask what your use case is ?

Panu Pitkamaki · Answer 2 · Thu Apr 16 2020 01:50:43 GMT+0800 (China Standard Time)

I fear that some people using the library currently expect it to output full rows so we would have to think about the backward compatibility if we were to modify this behaviour.

That is very valid fear. I'm thinking whether the used abstraction is ideal. I was under the impression that the contract for streams are that they deliver a larger whole efficiently in small chunks and avoid holding anything in memory in its entirety. One effectively receives parts of the end result.

For contract that emits full rows, would EventEmitter-approach be a more intuitive solution? There one could explicitly state what's arriving:

rowsemitter.on('row', ...); // process single row
rowsemitter.on('end', ....); // all rows have been read

can I ask what your use case is ?

To stream an image stored in BYTEA column to client through Node.js server.

I browsed the copy to implementation here - https://github.com/brianc/node-pg-copy-streams/blob/master/copy-to.js#L41

I did not check the postgres documentation about this I remember that there was cases when postgres itself was handling full rows.

Inspecting this further, pg pushes 64kB chunks to _transform(). The chunks are interspersed with control data (CopyOutResponse, CopyData messages (always one per row) and CopyDone) but the chunks are not aligned at row boundaries - CopyData representing one row can span over several chunks.

In theory, the pg-copy-streams implementation could strip control data and push payload from those 64kB chunks forward as they arrive. Would you be open to this kind of solution? I could create a PR.

jeromew · Answer 3 · Thu Apr 16 2020 23:58:18 GMT+0800 (China Standard Time)

Hello I was re-reading the protocol spec regarding COPY

and I think that the current implementation was mislead by this sentence about CopyData : "Data that forms part of a COPY data stream. Messages sent from the backend will always correspond to single data rows, but messages sent by frontends might divide the data stream arbitrarily."

the fact that usually, 64kB fits many rows so nobody really bothered.

Since the goal of this library is to have high throughput and low memory requirements, I could envision a major 3.0 version that would break the backward compatibility and more or less keep the chunk size that postgres outputs. Some people might not be happy but we could document and make a recommendation for a simple streaming csv parser that would aggregate the csv compatible chunks - https://csv.js.org/parse/ might do the trick we need to test + add a warning with an upgrade path in the changelog

I am not in favor of outputting 'row' events since we would need to buffer the whole row to pass it in the event which is exactly what we want to avoid.

If you want to give it a try and send a PR I will find time to review it.

As a reminder we will have to check the following boxes before a 3.0 can be published :

stop buffering row data in copy to
make all tests pass
Add a test that highlights the fact that postgres chunks rows
Add a test with a streaming csv parser that shows that un-chunking works with the recommended solution
Modify the README to add now mandatory the streaming csv parser
Modify Changelog to mention the breaking change and clarify the upgrade path

Regarding the storage of images as BYTEA inside postgres, I have been tinkering for some time with ideas around this (small images as a BYTEA field, or big images split over many lines in a table with an offset for re-ordering). Maybe the problem you uncovered can lead to an acceptable solution for a "1 row per image" solution, whatever the image size. But there might still be a problem with > 1GB images because the oversized fields are TOASTED - https://www.postgresql.org/docs/9.5/storage-toast.html . Have you already encountered this issue ?

Keep me posted on this

Brian C · Answer 4 · Fri Apr 17 2020 00:25:58 GMT+0800 (China Standard Time)

Ohhh yeah it makes sense to emit fixed sized chunks. Would be subtle backwards compatible break but I think probably overall better for efficiency as @jeromew said. I agree with all his thoughts FWIW! 😄

jeromew · Answer 5 · Mon Apr 20 2020 17:57:30 GMT+0800 (China Standard Time)

Hello @pspi have you started working on a PR for this ? I could do it but do not want to collide with your work.

Panu Pitkamaki · Answer 6 · Mon Apr 20 2020 19:14:06 GMT+0800 (China Standard Time)

@jeromew Yep, I'm working on it. The radio silence was because I don't have anything to show yet.

jeromew · Answer 7 · Mon Apr 20 2020 20:33:53 GMT+0800 (China Standard Time)

@pspi great ! thanks for your work on this.

Panu Pitkamaki · Answer 8 · Mon Apr 20 2020 20:39:00 GMT+0800 (China Standard Time)

@jeromew You know what. You probably know around this module way better than I do. And this issue looks like a tricky nut to crack. If you're eager, why don't you give it a shot as well. We could join forces and compare notes off-channel. You can find my contact info in my profile.

jeromew · Answer 9 · Mon Apr 20 2020 20:48:24 GMT+0800 (China Standard Time)

ok i'll take a look in parallel. I'll send you an email.

jeromew · Answer 10 · Mon Apr 20 2020 23:04:17 GMT+0800 (China Standard Time)

After reading the code once again, I better understand the problem :

postgres is sending chunks of ~64kB
we prepare a buffer of 64kB
for all full rows in the chunk we extract the relevant data rows inside this buffer
once done, we push the extracted rows
if there is a leftover on the chunk, we prefix it to the next postgres chunk, creating a bigger chunk.

rows are extracted only when a full row has been seen so when there is a huge row (>64kB) the buffer that we prefix to the pg chunk keeps growing until a full row can be seen. This is probably the degredation that you experiment with a huge BYTEA, since the module needs to buffer at least the whole BYTEA before pushing it.

Panu Pitkamaki · Answer 11 · Sat May 02 2020 12:46:53 GMT+0800 (China Standard Time)

This issue was resolved with #94.

A test was also added in 8eff422 that verifies the functionality.

jeromew · Answer 12 · Tue May 05 2020 03:30:14 GMT+0800 (China Standard Time)

I temporarily re-open this issue to make sure we have a common understanding of what is happening.

pg-copy-stream version 3.0.0 now streams the data payload of a copyOut without buffering the whole CopyData messages (if a CopyData message that spans across N chunks these chunks will be streamed as they flow). This seems to fix the issue @pspi created.

but after reading pg's internals, I understand that pg is still buffering the whole CopyData message in this code https://github.com/brianc/node-postgres/blob/master/packages/pg/lib/connection.js#L110

@brianc do you confirm that even with the line https://github.com/brianc/node-pg-copy-streams/blob/master/copy-to.js#L30 this.connection.removeAllListeners('copyData') pg will buffer the whole CopyData message in packet-reader ? Is there a way we could avoid this ?

This would mean that even with pg-copy-stream 3.0.0, after COPYing a 100MB image out of postgres, pg will have allocated a 100MB buffer and will keep it until the connection is closed.

I suppose that this does not trigger the same memory/GC issue that @pspi was seeing because packet-reader is designed to grow and re-use an internal buffer which does not trigger the constant memory alloc/free that was causing the initial problem but this is not ideal memory-wise.

Maybe we can find a solution for this

jeromew · Answer 13 · Mon May 11 2020 21:22:32 GMT+0800 (China Standard Time)

I'll close this as I think it is fixed with 4.0.0