Running squeeze causes continuous lag build up in logical replication architecture

Question

Running squeeze causes continuous lag build up in logical replication architecture

patrickmglynn opened this issue 2 years ago · comments

Hello,

I have observed that running pg_squeeze on large tables either manually or on a schedule, leads to replication lag building constantly and requires intervention to prevent continuous WAL growth.

Our environment is Postgresql 12.1 running under Redhat 7.9, with pg_squeeze version 1.3:

test_db=# \! rpm -qa | grep squeeze
pg_squeeze_12-1.3.1-1.rhel7.x86_64

test_db=# \dx pg_squeeze
                          List of installed extensions
    Name    | Version | Schema  |                  Description
------------+---------+---------+------------------------------------------------
 pg_squeeze | 1.3     | squeeze | A tool to remove unused space from a relation.
(1 row)

One publisher replicates data to one subscriber. max_replication_slots and max_wal_senders are both set to 10 on the publisher, max_wal_senders is 20 on the subscriber and we currently have 6 subscriptions operating in total (one per schema to replicate).

Prior to running squeeze the replication slots on the publisher are all active and the subscription state is streaming:

test_db=# SELECT * FROM pg_replication_slots;
      slot_name      |  plugin  | slot_type | datoid |  database   | temporary | active | active_pid | xmin | catalog_xmin |  restart_lsn  | confirmed_flush_lsn
---------------------+----------+-----------+--------+-------------+-----------+--------+------------+------+--------------+---------------+---------------------
 sub1                | pgoutput | logical   |  16402 | test_db     | f         | t      |      21570 |      |    987626303 | 36E0/23DB3188 | 36E0/23F900F8
 sub2                | pgoutput | logical   |  16402 | test_db     | f         | t      |      23310 |      |    987626303 | 36E0/23DB3188 | 36E0/23F900F8
 sub3                | pgoutput | logical   |  16402 | test_db     | f         | t      |      23311 |      |    987626303 | 36E0/23DB3188 | 36E0/23F900F8
 sub4                | pgoutput | logical   |  16402 | test_db     | f         | t      |      23313 |      |    987626303 | 36E0/23DB3188 | 36E0/23F900F8
 sub5                | pgoutput | logical   |  16402 | test_db     | f         | t      |      23390 |      |    987626303 | 36E0/23DB3188 | 36E0/23F900F8
 sub6                | pgoutput | logical   |  16402 | test_db     | f         | t      |      23389 |      |    987626303 | 36E0/23DB3188 | 36E0/23F900F8
(6 rows)

test_db=# SELECT * FROM pg_stat_replication;
  pid  | usesysid |       usename       |  application_name   | client_addr  |         client_hostname         | client_port |         backend_start         | backend_xmin |   state   |   sent_lsn    |   write_lsn   |   flush_lsn   |  replay_lsn   |    write_lag    |    flush_lag    |   replay_lag    | sync_priority | sync_state |       
   reply_time
-------+----------+---------------------+---------------------+--------------+---------------------------------+-------------+-------------------------------+--------------+-----------+---------------+---------------+---------------+---------------+-----------------+-----------------+-----------------+---------------+------------+-------------------------------
 21570 |   991114 | logical_replication | sub1                | **.***.**.** | my.fqdn.org                     |       40072 | 2022-02-11 15:51:01.531025+08 |              | streaming | 36E0/285D19C0 | 36E0/285D19C0 | 36E0/285D19C0 | 36E0/285D19C0 | 00:00:00.00045  | 00:00:00.00045  | 00:00:00.00045  |             0 | async      | 2022-02-21 15:32:17.793954+08
 23310 |   991114 | logical_replication | sub2                | **.***.**.** | my.fqdn.org                     |       40094 | 2022-02-11 15:51:06.703731+08 |              | streaming | 36E0/285D19C0 | 36E0/285D19C0 | 36E0/285D19C0 | 36E0/285D19C0 | 00:00:00.000435 | 00:00:00.000435 | 00:00:00.000435 |             0 | async      | 2022-02-21 15:32:17.793945+08
 23311 |   991114 | logical_replication | sub3                | **.***.**.** | my.fqdn.org                     |       40096 | 2022-02-11 15:51:06.71485+08  |              | streaming | 36E0/285D19C0 | 36E0/285D19C0 | 36E0/285D19C0 | 36E0/285D19C0 | 00:00:00.000411 | 00:00:00.000411 | 00:00:00.000411 |             0 | async      | 2022-02-21 15:32:17.793884+08
 23313 |   991114 | logical_replication | sub4                | **.***.**.** | my.fqdn.org                     |       40100 | 2022-02-11 15:51:06.726784+08 |              | streaming | 36E0/285D19C0 | 36E0/285D19C0 | 36E0/285D19C0 | 36E0/285D19C0 | 00:00:00.000395 | 00:00:00.000395 | 00:00:00.000395 |             0 | async      | 2022-02-21 15:32:17.793891+08
 23390 |   991114 | logical_replication | sub5                | **.***.**.** | my.fqdn.org                     |       40204 | 2022-02-11 15:51:12.325621+08 |              | streaming | 36E0/285D19C0 | 36E0/285D19C0 | 36E0/285D19C0 | 36E0/285D19C0 | 00:00:00.000588 | 00:00:00.000588 | 00:00:00.000588 |             0 | async      | 2022-02-21 15:32:17.793901+08
 23389 |   991114 | logical_replication | sub6                | **.***.**.** | my.fqdn.org                     |       40202 | 2022-02-11 15:51:12.322993+08 |              | streaming | 36E0/285D19C0 | 36E0/285D19C0 | 36E0/285D19C0 | 36E0/285D19C0 | 00:00:00.000343 | 00:00:00.000343 | 00:00:00.000343 |             0 | async      | 2022-02-21 15:32:17.794075+08
(6 rows)

If I attempt to run the below on a large table within the sub1 publication that occupies 83GB on disk (68GB of this is bloat):

test_db=# \dt+ myschema.bloated_table;
                         List of relations
 Schema    |       Name        | Type  | Owner |  Size     | Description 
-----------+-------------------+-------+-------+--------+-------------
 myschema  | bloated_table     | table | myuser| 83 GB     |
(1 row)

test_db=# SELECT squeeze.squeeze_table('myschema', 'bloated_table', null, null, null);
 squeeze_table 
---------------

(1 row)

The operation finishes successfully (reducing the size of the table dramatically), but replication has now lagged during the squeeze process and then never recovers:

test_db=# \dt+ myschema.bloated_table;
                       List of relations
 Schema    |       Name        | Type  | Owner |  Size     | Description 
-----------+-------------------+-------+-------+--------+-------------
 myschema  | bloated_table     | table | myuser| 9817 MB   |
(1 row)

test_db=# SELECT * FROM pg_replication_slots;
      slot_name      |  plugin  | slot_type | datoid |  database   | temporary | active | active_pid | xmin | catalog_xmin |  restart_lsn  | confirmed_flush_lsn
---------------------+----------+-----------+--------+-------------+-----------+--------+------------+------+--------------+---------------+---------------------
 sub1                | pgoutput | logical   |  16402 | test_db     | f         | t      |      12535 |      |    988419346 | 36E3/2C8877A0 | 36E5/C43037F0
 sub2                | pgoutput | logical   |  16402 | test_db     | f         | t      |      12536 |      |    988419346 | 36E3/2C8877A0 | 36E5/C43037F0
 sub3                | pgoutput | logical   |  16402 | test_db     | f         | t      |      12537 |      |    988419346 | 36E3/2C8877A0 | 36E5/C43037F0
 sub4                | pgoutput | logical   |  16402 | test_db     | f         | t      |      12538 |      |    988419346 | 36E3/2C8877A0 | 36E5/C43037F0
 sub5                | pgoutput | logical   |  16402 | test_db     | f         | t      |      12539 |      |    988419346 | 36E3/2C8877A0 | 36E5/C43037F0
 sub6                | pgoutput | logical   |  16402 | test_db     | f         | t      |      12540 |      |    988419346 | 36E3/2C8877A0 | 36E5/C43037F0
(6 rows)

test_db=# SELECT * FROM pg_stat_replication;
  pid  | usesysid |       usename       |  application_name   | client_addr  |         client_hostname          | client_port |         backend_start         | backend_xmin | state   |   sent_lsn    |   write_lsn   |   flush_lsn   |  replay_lsn   | write_lag | flush_lag | replay_lag | sync_priority | sync_state | reply_time
-------+----------+---------------------+---------------------+--------------+----------------------------------+-------------+-------------------------------+--------------+---------+---------------+---------------+---------------+---------------+-----------+-----------+------------+---------------+------------+-------------------------------
 12535 |   991114 | logical_replication | sub1                | **.***.**.** | my.fqdn.org                      |       56826 | 2022-02-22 10:56:55.3298+08   |              | catchup | 36E5/C430CA30 | 36E5/C43037F0 | 36E5/C43037F0 | 36E5/C43037F0 |           |           |            |             0 | async      | 2022-02-22 11:05:28.787884+08
 12536 |   991114 | logical_replication | sub2                | **.***.**.** | my.fqdn.org                      |       56828 | 2022-02-22 10:56:55.339875+08 |              | catchup | 36E5/C430CA30 | 36E5/C43037F0 | 36E5/C43037F0 | 36E5/C43037F0 |           |           |            |             0 | async      | 2022-02-22 11:05:28.776368+08
 12537 |   991114 | logical_replication | sub3                | **.***.**.** | my.fqdn.org                      |       56830 | 2022-02-22 10:56:55.350883+08 |              | catchup | 36E5/C430CA30 | 36E5/C43037F0 | 36E5/C43037F0 | 36E5/C43037F0 |           |           |            |             0 | async      | 2022-02-22 11:05:28.73297+08
 12538 |   991114 | logical_replication | sub4                | **.***.**.** | my.fqdn.org                      |       56832 | 2022-02-22 10:56:55.361507+08 |              | catchup | 36E5/C430CA30 | 36E5/C43037F0 | 36E5/C43037F0 | 36E5/C43037F0 |           |           |            |             0 | async      | 2022-02-22 11:05:29.009785+08
 12539 |   991114 | logical_replication | sub5                | **.***.**.** | my.fqdn.org                      |       56834 | 2022-02-22 10:56:55.371779+08 |              | catchup | 36E5/C430CA30 | 36E5/C43037F0 | 36E5/C43037F0 | 36E5/C43037F0 |           |           |            |             0 | async      | 2022-02-22 11:05:28.800016+08
 12540 |   991114 | logical_replication | sub6                | **.***.**.** | my.fqdn.org                      |       56836 | 2022-02-22 10:56:55.382464+08 |              | catchup | 36E5/C430CA30 | 36E5/C43037F0 | 36E5/C43037F0 | 36E5/C43037F0 |           |           |            |             0 | async      | 2022-02-22 11:05:29.212577+08
(6 rows)

test_db=# \! du -sh /var/lib/pgsql/12/data/pg_wal
11G     /var/lib/pgsql/12/data/pg_wal

From the above we can see the slots are still active on the publisher but the subscription is now stuck in the catchup state (also the restart_lsn and confirmed_flush_lsn do not change) and the size of the pg_wal directory has now grown to 11GB and continues to grow over time.

Checking the subscriber logs shows the replication workers timing out as the squeeze process runs:

<2022-02-22 12:38:02 AEDT  >ERROR:  terminating logical replication worker due to timeout
<2022-02-22 12:38:02 AEDT  >LOG:  background worker "logical replication worker" (PID 5085) exited with exit code 1
<2022-02-22 12:38:02 AEDT  >LOG:  logical replication apply worker for subscription "sub3" has started
<2022-02-22 12:38:02 AEDT  >LOG:  background worker "logical replication worker" (PID 5148) exited with exit code 1
<2022-02-22 12:38:02 AEDT  >ERROR:  could not start WAL streaming: ERROR:  replication slot "sub3" is active for PID 23390

wal_sender_timeout is 5min on the publisher and 1min on the subscriber side.

At this point the only way to get the subscriptions working again is to DROP and CREATE them again, thus losing the built up WAL files. ALTER SUBSCRIPTION DISABLE/ENABLE has no effect and neither does restarting the postgresql-12 service on the subscriber side.

Is there a misconfiguration or setting I have overlooked here?

Please advise any suggestions as to how I can incorporate squeeze into our environment alongside our logical replication setup.

ahouska · Answer 1 · Tue Feb 22 2022 14:31:57 GMT+0800 (China Standard Time)

Can you please test the same scenario with pg_squeeze 1.4 (actually 1.4.1 is the last release)? It should consume less disk space (see #51 (comment) for details), so the processing might also be a bit faster. If the replication still gets stuck due to logical worker's exit, I suggest you to increase the wal_receiver_timeout configuration variable (on the subscriber's side).

Patrick · Answer 2 · Tue Feb 22 2022 15:11:48 GMT+0800 (China Standard Time)

Thanks for the quick reply.

I will work on building the 1.4.1 release, in the meantime is there an .rpm available for this? I couldn't find anything in the postgres yum repo.

ahouska · Answer 3 · Tue Feb 22 2022 17:03:32 GMT+0800 (China Standard Time)

I thought the point of the issue #47 (comment) was to add pg_squeeze 1.4 to the official repository, but it really appears not to be there. I've submitted a request to setup the build.

ahouska · Answer 4 · Thu Feb 24 2022 20:28:47 GMT+0800 (China Standard Time)

pg_squeeze 1.4 is now in the PGDG repository (yum). It can only be installed to PG 14 though. You need to install it from source if you use lower PG version.

Patrick · Answer 5 · Tue Apr 26 2022 12:12:39 GMT+0800 (China Standard Time)

I was able to install 1.4.1 from the official PG12 repository. The subscriber has wal_receiver_timeout set to 5min, unfortunately replication lag remains fixed at the size of the table being squeezed once the squeeze process completes.

I will re-test in PG14 once we are able to move to this version.