pgpartman / pg_partman

Partition management extension for PostgreSQL

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Issue running multiple updates on partitioned table in a single transaction.

mraasvel opened this issue · comments

To reproduce I set up some tables like below, with a base table example_table and a history table example_table_history. The example_table has a versioning TRIGGER which runs this function: https://github.com/nearform/temporal_tables/blob/master/versioning_function.sql whenever an update / delete / insert is done on the main table in order to preserve history.

CREATE TABLE example_table_history
(
    id              INTEGER      NOT NULL,
    col1            VARCHAR      NOT NULL,
    col2            VARCHAR      NOT NULL,
    last_changed    TIMESTAMPTZ  NOT NULL,

    system_period tstzrange    NOT NULL DEFAULT TSTZRANGE(CURRENT_TIMESTAMP, NULL)
) PARTITION BY RANGE (last_changed);

CREATE TABLE example_table_history_table_template
(
    LIKE example_table_history
);

ALTER TABLE example_table_history_table_template ADD PRIMARY KEY (id, last_changed);

SELECT partman.create_parent(
               'public.example_table_history',
               'last_changed',
               '1 month',
               p_template_table := 'public.example_table_history_table_template',
               p_premake := 6);
UPDATE partman.part_config SET retention = '2 months'::INTERVAL WHERE parent_table = 'public.example_table_history';


CREATE TABLE example_table
(
    id            INTEGER      NOT NULL,
    col1          VARCHAR      NOT NULL,
    col2          VARCHAR      NOT NULL,
    last_changed  TIMESTAMPTZ  NOT NULL DEFAULT NOW(),

    system_period tstzrange    NOT NULL DEFAULT TSTZRANGE(CURRENT_TIMESTAMP, NULL)
) PARTITION BY RANGE (last_changed);

CREATE TABLE example_table_table_template
(
    LIKE example_table
);

ALTER TABLE example_table_table_template
    ADD PRIMARY KEY (id);

SELECT partman.create_parent(
               'public.example_table',
               'last_changed',
               '1 month',
               p_template_table := 'public.example_table_table_template',
               p_premake := 6);


UPDATE partman.part_config
SET retention = '4 months'::INTERVAL
WHERE parent_table = 'public.example_table';


CREATE TRIGGER versioning_trigger
    BEFORE INSERT OR UPDATE OR DELETE
    ON example_table
    FOR EACH ROW
EXECUTE PROCEDURE versioning(
        'system_period', 'example_table_history', TRUE
                  );

INSERT INTO example_table (id, col1, col2)
VALUES (1, 'sad', '1'),
       (2, 'ok', '2'),
       (3, 'happy', '3');

So far so good, we have a table with 3 rows:

postgres@127:testdb> select * from example_table
+----+-------+------+-------------------------------+---------------------------------------+
| id | col1  | col2 | last_changed                  | system_period                         |
|----+-------+------+-------------------------------+---------------------------------------|
| 1  | sad   | 1    | 2024-03-04 12:42:26.666032+00 | [2024-03-04 12:42:26.666032+00, None) |
| 2  | ok    | 2    | 2024-03-04 12:42:26.666032+00 | [2024-03-04 12:42:26.666032+00, None) |
| 3  | happy | 3    | 2024-03-04 12:42:26.666032+00 | [2024-03-04 12:42:26.666032+00, None) |
+----+-------+------+-------------------------------+---------------------------------------+
SELECT 3

Now things go wrong if I run the following update and schema changes in a single transaction:

UPDATE example_table SET col2 = '4', last_changed = now() WHERE id = 1;

ALTER TABLE example_table_history RENAME COLUMN col1 TO col1_old;
ALTER TABLE example_table_history ADD COLUMN col1 VARCHAR DEFAULT NULL;

ALTER TABLE example_table RENAME COLUMN col1 TO col1_old;
ALTER TABLE example_table ADD COLUMN col1 VARCHAR DEFAULT NULL;

-- Expected behavior: this query should update all rows
-- Actual behavior: only affects rows touched by previous update
UPDATE example_table SET col1 = col1_old, last_changed = now() WHERE col1 IS NULL;

The first update: UPDATE example_table SET col2 = '4', last_changed = now() WHERE id = 1 runs correctly:

postgres@127:testdb> UPDATE example_table SET col2 = '4', last_changed = now() WHERE id = 1;
 UPDATE 1
Time: 0.005s
postgres@127:testdb> select * from example_table
+----+-------+------+-------------------------------+---------------------------------------+
| id | col1  | col2 | last_changed                  | system_period                         |
|----+-------+------+-------------------------------+---------------------------------------|
| 2  | ok    | 2    | 2024-03-04 12:42:26.666032+00 | [2024-03-04 12:42:26.666032+00, None) |
| 3  | happy | 3    | 2024-03-04 12:42:26.666032+00 | [2024-03-04 12:42:26.666032+00, None) |
| 1  | sad   | 4    | 2024-03-04 12:44:55.052027+00 | [2024-03-04 12:44:55.052027+00, None) |
+----+-------+------+-------------------------------+---------------------------------------+
SELECT 3

The alter table also works as expected:

postgres@127:testdb>
ALTER TABLE example_table_history RENAME COLUMN col1 TO col1_old;
ALTER TABLE example_table_history ADD COLUMN col1 VARCHAR DEFAULT NULL;
ALTER TABLE example_table RENAME COLUMN col1 TO col1_old;
ALTER TABLE example_table ADD COLUMN col1 VARCHAR DEFAULT NULL;
 
ALTER TABLE
ALTER TABLE
ALTER TABLE
ALTER TABLE
Time: 0.014s
postgres@127:testdb> select * from example_table
+----+----------+------+-------------------------------+---------------------------------------+--------+
| id | col1_old | col2 | last_changed                  | system_period                         | col1   |
|----+----------+------+-------------------------------+---------------------------------------+--------|
| 2  | ok       | 2    | 2024-03-04 12:42:26.666032+00 | [2024-03-04 12:42:26.666032+00, None) | <null> |
| 3  | happy    | 3    | 2024-03-04 12:42:26.666032+00 | [2024-03-04 12:42:26.666032+00, None) | <null> |
| 1  | sad      | 4    | 2024-03-04 12:44:55.052027+00 | [2024-03-04 12:44:55.052027+00, None) | <null> |
+----+----------+------+-------------------------------+---------------------------------------+--------+
SELECT 3

Only now when I run the final query is where things go wrong. Only the columns that were updated by the initial update statement are also updated by this statement. Even though it says UPDATE 3, only 1 row is actually updated.

postgres@127:testdb> UPDATE example_table SET col1 = col1_old, last_changed = now() WHERE col1 IS NULL;
UPDATE 3
Time: 0.007s
postgres@127:testdb> select * from example_table
+----+----------+------+-------------------------------+---------------------------------------+--------+
| id | col1_old | col2 | last_changed                  | system_period                         | col1   |
|----+----------+------+-------------------------------+---------------------------------------+--------|
| 2  | ok       | 2    | 2024-03-04 12:44:55.052027+00 | [2024-03-04 12:44:55.052027+00, None) | <null> |
| 3  | happy    | 3    | 2024-03-04 12:44:55.052027+00 | [2024-03-04 12:44:55.052027+00, None) | <null> |
| 1  | sad      | 4    | 2024-03-04 12:44:55.052027+00 | [2024-03-04 12:44:55.052027+00, None) | sad    |
+----+----------+------+-------------------------------+---------------------------------------+--------+
SELECT 3
Time: 0.008s

Expected behavior: no values in col1 are null and all columns are updated based on col1_old.
Actual behavior: only the rows updated in a statement executed previously in the same transaction are updated.

The issue doesn't happen without the versioning function trigger, though the behavior of the versioning function doesn't seem to me like it should be problematic. Maybe I'm misunderstanding how a trigger like this interacts with a partitioned table or how to correctly update partitioned tables, please let me know if that's the case. Changing the order of the updates and moving the ALTER to the start of the transaction produces the correct behavior.

Thank you for your time and help :)

Apologies, but I'm not quite sure why this is happening. It's not anything particular to pg_partman that I can see so far, but just partitioning in general with PG. I think you may get better answers on the PG mailing lists. Can try the general one first and if you don't get a response, maybe the hackers or bugs list.

https://www.postgresql.org/list/

If I have a chance I'll try and come back to revisit this as well and see if I can figure anything out, but I at least wanted to point you to some additional help in case I don't get back to it for a while. If you do find an answer on the lists, I'd appreciate you letting me know as well. Thanks!

Thanks for your reply! I wasn't sure what or where the problem exactly was either, I'll take a look at the mailing list and will post here if I find anything.