mkabilov / pg2ch

Data streaming from postgresql to clickhouse via logical replication mechanism

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

CollapsingSortedBlockInputStream: Incorrect data: number of rows with sign = 1

importnil opened this issue · comments

Hello.

We're using a 2 node cluster of ClickHouse with CollapsingMergeTree engine tables.

We're just testing out the pg2ch plugin, and getting problems with updates for this engine. CH generates warnings about wrong number of rows with signs 1, -1, sometimes. As we noticed, this happens quite often when pg2ch receives a whole bunch of DML operations corresponding to the same unique primary id.

The error looks like:

2019.09.24 13:42:14.484495 [ 17 ] {} <Warning> CollapsingSortedBlockInputStream: Incorrect data: number of rows with sign = 1 (4) differs with number of rows with sign = -1 (1) by more than one (for key: 1219488).

Row from the logs above had 12 rows corresponding to it right after the warning, and the strange thing it actually collapsed to 2 rows after a while.

Can you help with this issue?

as for now I'd recommend checking out the development version (5a26c77) of the pg2ch tool which is using clickhouse http protocol; you can find it in the http branch.

It contains various bug fixes and performance optimizations, also, you'll have to change the clickhouse schema of the table you're replicating: you'll need to add the lsn column there, please have a look at the README file in the http branch

if you have any questions please feel free to reach me out

Testing the http branch, here're the first two issues:

  1. After initial sync of tables from PG, pg2ch fails on some row with error:
Sep 24 17:05:14 pg2ch[5213]: 2019-09-24T17:05:14.679Z#011#033[31mFATAL#033[0m#011replicator/replicator.go:347#011could not sync table books: could not sync: could not load to CH: got 500 status code from cli$
Sep 24 17:05:14 pg2ch[5213]: ERROR: Line feed found where tab is expected. It's like your file has less columns than expected.

The problem here is with an original row containing a value with \n in some of the columns. The previous version could sync all the data without any error. The plugin shutdowns after this error.

  1. After that, if you start the plugin again, there's another error:
Sep 24 20:14:51 pg2ch[7511]: could not start: could not init tables: could not init books: could not delete dirty tuples from default.books table: got 501 status code from clickhouse: Code: 48, e.displayText() = DB::Exception: Mutations are not supported by storage Distributed (version 19.13.3.26 (official build))

I guess it's not connected to the first one.

I guess the second problem could be resolved with pointing the alter query to the local table?

Okay, so the main cause of the issue stated, was the missing sharding key needed to be specified on table creation in CH. So without it, rows corresponding to one logical item went randomly to nodes 1/2 thus leading to broken collapsing by CH.

This is resolved, I guess.

But as for http branch version, pg2ch still fails exporting data into CH that has \n in column values.

But as for http branch version, pg2ch still fails exporting data into CH that has \n in column values.

That's a good finding. I'll look into that. Thank you