pgpartman / pg_partman

Partition management extension for PostgreSQL

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Error while upgrading extention from 4.6.2 to 5.0.0

hyde1 opened this issue · comments

Hello,

I am running postgresql on AWS RDS. I upgrade postgresql from v14.7 to 14.10. In this postgres version the extension pg_partman is available on v5.0.0. So I ran the following query:

ALTER EXTENSION pg_partman UPDATE TO '5.0.0';

And run into the following error:

ERROR:  42501: cannot fire deferred trigger within security-restricted operation
LOCATION:  afterTriggerMarkEvents, trigger.c:4259
Time: 279.538 ms

Every call to a pg_partman function or procedure fails with an error (eg. CALL partman.run_maintenance_proc();) so I am stuck with a non working pg_partman extension.

Any idea what this error means? I am runing this command as the main RDS superuser.

PS : all my existing partitions are native postgresql partitions. And I only use yearly partition with integer range subpartitions.

Thanks

Mind sharing the \d+ output of one of your partition sets and the contents of the part_config table for that partition set?

Is there any other context around that error to say which function and which line of code is causing it?

I'm wondering if this may be an issue in RDS itself since this almost looks like an event-level trigger which I'm not using anywhere in pg_partman. https://www.postgresql.org/docs/current/event-triggers.html
I'd recommend also opening up a ticket with them to get started looking at it from there end too.

Hi @keithf4,

I've been able to reproduce the issue with this simple partition set, with no data inside, on a fresh postgresql v14.7 and pg_partman v4.6.2:

-- Install extension
CREATE SCHEMA partman;
CREATE EXTENSION pg_partman WITH SCHEMA partman;

-- Create partitionned table
CREATE TABLE orders_new (
id BIGINT NOT NULL,
zdate DATE,
id_brand BIGINT,
id_restaurant BIGINT
) PARTITION BY RANGE (zdate);

-- Create constraint and index
ALTER TABLE orders_new ADD CONSTRAINT unique_id_by_part UNIQUE (id, id_brand, zdate);
CREATE INDEX orders_new_brand_restaurant_zdate ON orders_new (id_brand, id_restaurant, zdate);

-- Create template table
CREATE TABLE orders_template (like orders_new);

-- pg_partman: create partitions
SELECT partman.create_parent(
p_parent_table := 'public.orders_new'
, p_control := 'zdate'
, p_type := 'native'
, p_interval := 'yearly'
, p_start_partition := '2018-01-01'
, p_template_table := 'public.orders_template'
);

SELECT partman.create_sub_parent(
p_top_parent := 'public.orders_new'
, p_control := 'id_brand'
, p_type := 'native'
, p_native_check := 'yes'
, p_interval := '100'
, p_premake := 25
);

-- Update the number of subpartitions to create after the first partitions are created
UPDATE partman.part_config_sub set sub_premake = 4 where sub_parent = 'public.orders_new';

-- Call the pg_partman maintenance proc
CALL partman.run_maintenance_proc();

Then upgrade to RDS Postgresql v14.10.
Then run ALTER EXTENSION pg_partman UPDATE TO '5.0.0';

Please find below the output of \d+ on the main partition table orders_new:

+---------------+--------------------------+-----------+----------+
| Column        | Type                     | Modifiers | Storage  |
|---------------+--------------------------+-----------+----------|
| id            | bigint                   |  not null | plain    |
| zdate         | date                     |           | plain    |
| id_brand      | bigint                   |           | plain    |
| id_restaurant | bigint                   |           | plain    |
+---------------+--------------------------+-----------+----------+
Indexes:
    "unique_id_by_part" UNIQUE CONSTRAINT, btree (id, id_brand, zdate)
    "orders_new_brand_restaurant_zdate" btree (id_brand, id_restaurant, zdate)
Partition key: RANGE (zdate)
Partitions: public.orders_new_default DEFAULT
            public.orders_new_p2018 FOR VALUES FROM ('2018-01-01') TO ('2019-01-01')
            public.orders_new_p2019 FOR VALUES FROM ('2019-01-01') TO ('2020-01-01')
            public.orders_new_p2020 FOR VALUES FROM ('2020-01-01') TO ('2021-01-01')
            public.orders_new_p2021 FOR VALUES FROM ('2021-01-01') TO ('2022-01-01')
            public.orders_new_p2022 FOR VALUES FROM ('2022-01-01') TO ('2023-01-01')
            public.orders_new_p2023 FOR VALUES FROM ('2023-01-01') TO ('2024-01-01')
            public.orders_new_p2024 FOR VALUES FROM ('2024-01-01') TO ('2025-01-01')
            public.orders_new_p2025 FOR VALUES FROM ('2025-01-01') TO ('2026-01-01')
            public.orders_new_p2026 FOR VALUES FROM ('2026-01-01') TO ('2027-01-01')
            public.orders_new_p2027 FOR VALUES FROM ('2027-01-01') TO ('2028-01-01')

Then on one of the subpartition \d+ orders_new_p2018:

+---------------+--------------------------+-----------+----------+
| Column        | Type                     | Modifiers | Storage  |
|---------------+--------------------------+-----------+----------|
| id            | bigint                   |  not null | plain    |
| zdate         | date                     |           | plain    |
| id_brand      | bigint                   |           | plain    |
| id_restaurant | bigint                   |           | plain    |
+---------------+--------------------------+-----------+----------+
Indexes:
    "orders_new_p2018_id_id_brand_zdate_key" UNIQUE CONSTRAINT, btree (id, id_brand, zdate)
    "orders_new_p2018_id_brand_id_restaurant_zdate_idx" btree (id_brand, id_restaurant, zdate)
Partition of: public.orders_new FOR VALUES FROM ('2018-01-01') TO ('2019-01-01')
Partition constraint: ((zdate IS NOT NULL) AND (zdate >= '2018-01-01'::date) AND (zdate < '2019-01-01'::date))
Partition key: RANGE (id_brand)
Partitions: public.orders_new_p2018_default DEFAULT
            public.orders_new_p2018_p0 FOR VALUES FROM ('0') TO ('100')
            public.orders_new_p2018_p100 FOR VALUES FROM ('100') TO ('200')
            public.orders_new_p2018_p1000 FOR VALUES FROM ('1000') TO ('1100')
            public.orders_new_p2018_p1100 FOR VALUES FROM ('1100') TO ('1200')
            public.orders_new_p2018_p1200 FOR VALUES FROM ('1200') TO ('1300')
            public.orders_new_p2018_p1300 FOR VALUES FROM ('1300') TO ('1400')
            public.orders_new_p2018_p1400 FOR VALUES FROM ('1400') TO ('1500')
            public.orders_new_p2018_p1500 FOR VALUES FROM ('1500') TO ('1600')
            public.orders_new_p2018_p1600 FOR VALUES FROM ('1600') TO ('1700')
            public.orders_new_p2018_p1700 FOR VALUES FROM ('1700') TO ('1800')
            public.orders_new_p2018_p1800 FOR VALUES FROM ('1800') TO ('1900')
            public.orders_new_p2018_p1900 FOR VALUES FROM ('1900') TO ('2000')
            public.orders_new_p2018_p200 FOR VALUES FROM ('200') TO ('300')
            public.orders_new_p2018_p2000 FOR VALUES FROM ('2000') TO ('2100')
            public.orders_new_p2018_p2100 FOR VALUES FROM ('2100') TO ('2200')
            public.orders_new_p2018_p2200 FOR VALUES FROM ('2200') TO ('2300')
            public.orders_new_p2018_p2300 FOR VALUES FROM ('2300') TO ('2400')
            public.orders_new_p2018_p2400 FOR VALUES FROM ('2400') TO ('2500')
            public.orders_new_p2018_p2500 FOR VALUES FROM ('2500') TO ('2600')
            public.orders_new_p2018_p300 FOR VALUES FROM ('300') TO ('400')
            public.orders_new_p2018_p400 FOR VALUES FROM ('400') TO ('500')
            public.orders_new_p2018_p500 FOR VALUES FROM ('500') TO ('600')
            public.orders_new_p2018_p600 FOR VALUES FROM ('600') TO ('700')
            public.orders_new_p2018_p700 FOR VALUES FROM ('700') TO ('800')
            public.orders_new_p2018_p800 FOR VALUES FROM ('800') TO ('900')
            public.orders_new_p2018_p900 FOR VALUES FROM ('900') TO ('1000')

I agree with you it might be an issue in RDS itself. I have openned up a ticket on my AWS account with this simple configuration and they escalated the issue internally. I'll keep you posted here.

Is there any other context around that error to say which function and which line of code is causing it?

Sadly I have no other context arount that error...

Best,

I ran your example and ran into another issue

github599=# ALTER EXTENSION pg_partman UPDATE TO '5.0.0';
ERROR:  cannot ALTER TABLE "part_config_sub" because it has pending trigger events

It didn't even let the upgrade proceed in my case. I think I ran into this before, so I'll continue looking into it as well.

Ran into this previously here. Will try and work out a resolution to at least allow the upgrade to proceed.

#167

You said you are already somehow running 5.0.0 in RDS?

No the update did not succeed so I am stuck in a non-working state.

What version does it say is installed?

Did you run anything else as part of the upgrade other than the ALTER EXTENSION command?

What errors are you getting?

I currently have version 4.6.2 installed:

select * from pg_available_extensions where name = 'pg_partman';
+------------+-----------------+-------------------+------------------------------------------------------+
| name       | default_version | installed_version | comment                                              |
|------------+-----------------+-------------------+------------------------------------------------------|
| pg_partman | 5.0.0           | 4.6.2             | Extension to manage partitioned tables by time or ID |
+------------+-----------------+-------------------+------------------------------------------------------+

I looks like AWS exposes only 1 version per Postgresql version:

+------------+---------+-----------+-----------+---------+-------------+--------+----------+------------------------------------------------------+
| name       | version | installed | superuser | trusted | relocatable | schema | requires | comment                                              |
|------------+---------+-----------+-----------+---------+-------------+--------+----------+------------------------------------------------------|
| pg_partman | 5.0.0   | False     | True      | False   | False       | <null> | <null>   | Extension to manage partitioned tables by time or ID |
+------------+---------+-----------+-----------+---------+-------------+--------+----------+------------------------------------------------------+

So I don't know if every thing is broken because 4.6.2 is not available on RDS Postgresql v14.10 or because the update failed.

The one an only query I ran is:

ALTER EXTENSION pg_partman UPDATE TO '5.0.0';

That gave me this error:

ERROR:  42501: cannot fire deferred trigger within security-restricted operation
LOCATION:  afterTriggerMarkEvents, trigger.c:4259
Time: 279.538 ms

Ok, is that the only error you've gotten then? What is leading you to believe you're in a broken state?

Sorry, after that upgrade (but I did not test before trying to upgrade), the following query CALL partman.run_maintenance_proc(); it fails with the following error:

Child table given does not exist (<NULL>)
CONTEXT: PL/pgSQL function show_partition_info(text,text,text) line 34 at RAISE
SQL statement "SELECT child_start_time                                              FROM partman.show_partition_info(v_parent_schema||'.'||v_last_partition, v_row.partition_interval, v_row.parent_table)"
PL/pgSQL function partman.run_maintenance(text,boolean,boolean) line 202 at SQL statement
SQL statement "SELECT partman.run_maintenance('public.orders_new', p_jobmon := 't')"
PL/pgSQL function partman.run_maintenance_proc(integer,boolean,boolean) line 42 at EXECUTE
DETAIL:
HINT:
CONTEXT:  PL/pgSQL function partman.run_maintenance(text,boolean,boolean) line 408 at RAISE
SQL statement "SELECT partman.run_maintenance('public.orders_new', p_jobmon := 't')"
PL/pgSQL function partman.run_maintenance_proc(integer,boolean,boolean) line 42 at EXECUTE

If you call partman.run_maintenance() or partman.run_maintenace('tableschema.tablename') filling in one of of your partition set names with the second call, does it work?

If those fail as well, is there any data in the part_config table on the broken system?

Just trying to narrow down where the issue is

On my database with data partman.run_maintenance() and partman.run_maintenace('public.orders_new') returns the exact same error:

Child table given does not exist (<NULL>)
CONTEXT: PL/pgSQL function show_partition_info(text,text,text) line 34 at RAISE
SQL statement "SELECT child_start_time                                              FROM partman.show_partition_info(v_parent_schema||'.'||v_last_partition, v_row.partition_interval, v_row.parent_table)"
PL/pgSQL function partman.run_maintenance(text,boolean,boolean) line 202 at SQL statement
DETAIL:
HINT:
CONTEXT:  PL/pgSQL function partman.run_maintenance(text,boolean,boolean) line 408 at RAISE

On my test database without data there is no error with theses calls. Even if I instert data into the partitions after the failed upgrade and I retry the run_maintenance() function.

Can you show me the \d+ for public.orders_new on the broken system as well as the entire entry in part_config for it? SELECT * from partman.part_config WHERE parent_table = 'public.orders_new'; If could do the expanded view in psql using \x to make it easier to read I'd appreciate it (like below)

github599=# \x
Expanded display is on.

github599=# SELECT * from partman.part_config WHERE parent_table = 'public.orders_new';
-[ RECORD 1 ]--------------+-----------------------
parent_table               | public.orders_new
control                    | zdate
partition_type             | native
partition_interval         | 1 year
constraint_cols            | 
premake                    | 4
optimize_trigger           | 4
optimize_constraint        | 30
epoch                      | none
inherit_fk                 | t
retention                  | 
retention_schema           | 
retention_keep_table       | t
retention_keep_index       | t
infinite_time_partitions   | f
datetime_string            | YYYY
automatic_maintenance      | on
jobmon                     | t
sub_partition_set_full     | f
undo_in_progress           | f
trigger_exception_handling | f
upsert                     | 
trigger_return_null        | t
template_table             | public.orders_template
publications               | 
inherit_privileges         | f
constraint_valid           | t
subscription_refresh       | 
drop_cascade_fk            | f

Hi @keithf4 sorry I made a mistake in my last comment.

Before the upgrade I renamed the partionned table public.orders_new to public.orders (and the old orders table to orders_new in transaction). Thus partman.run_maintenace('public.orders_new') is returning an error but partman.run_maintenace('public.orders') does not complain.

It might be due to the renaming but orders_new is still referenced in the part_config instead of the orders table that is the partionned table.

SELECT * from partman.part_config WHERE parent_table = 'public.orders_new';
-[ RECORD 1 ]-------------------------
parent_table               | public.orders_new
control                    | zdate
partition_type             | native
partition_interval         | 1 year
constraint_cols            | <null>
premake                    | 4
optimize_trigger           | 4
optimize_constraint        | 30
epoch                      | none
inherit_fk                 | True
retention                  | <null>
retention_schema           | <null>
retention_keep_table       | True
retention_keep_index       | True
infinite_time_partitions   | False
datetime_string            | YYYY
automatic_maintenance      | on
jobmon                     | True
sub_partition_set_full     | False
undo_in_progress           | False
trigger_exception_handling | False
upsert                     |
trigger_return_null        | True
template_table             | public.orders_template
publications               | <null>
inherit_privileges         | False
constraint_valid           | True
subscription_refresh       | <null>
drop_cascade_fk            | False

So I guess I can update the parent_table column of the part_config table to reflect the right table name?

The partition table being managed by partman must match the table name in the part_config table. However, if you're renaming the table, you're going to have to rename all the child tables as well. Prior to version 5.x, partman looks up the child tables by parsing the parent table name. If the parent and child table base names do not match, maintenance cannot work.

At this point, it may be easier to rename your table back to the way it originally was.

Indeed. I renamed back the table name, the run_maintenace() and run_maintenance_proc() now works well (the upgrade still fails).

So post-5.0.0 I can rename the table safely and modify the part_config.parent_table column as well?

Thanks

Yes, I'll work on a fix for the upgrade and hopefully have it out relatively soon. Thankfully it is all just one transaction, so everything gets rolled back. Was worried the upgrade was somehow putting things into a bad state.

I can't say for certain it will work. I look up the child table names in the catalog for the most part, but I cannot say I've fully tested having child table names that differ from the parent base-name. If you really need to rename things, I would plan on renaming all child tables as well at this point. You can see in some of the documentation guides how to generate a bunch of SQL statements based on child names.

Long term I plan to see about specifically testing for and allowing child table names to differ from the parent base name.

I have a PR up for the next release that contains a fix for this issue.

#602

Basically the initial update to 5.x will have to be split into two parts to get around a transaction constraint condition. If you wouldn't mind testing this out (please don't run on production yet) to see if it solves the issue, I'd appreciate it.

If all seems good for you, I'll try and get the new version pushed out.

Hi @keithf4,

Good job and thanks a lot for your quick fix.

I will try your fix today, but I may need a little time to create a test env to be able to test this fix and I won't be able to test it on RDS as RDS only alows whitelisted extensions and versions.

I'll keep you posted if the fix works with my usecase on a fresh postgresql install outside rds.

Hi @keithf4,

I did some testing with v5.0.1.

I still could not update directly from 4.6.2 to 5.0.1. But I had to upgrade to 5.0.0 first (it worked) then to 5.0.1 🎉

Good job, I hope 5.0.1 it will be adopted by aws after you release it.

Thanks for the feedback!

Hi @keithf4,
After digging more into your fix I realized that the error is due to that query in the updates/pg_partman--4.7.4--5.0.0.sql file line 267 in the v5.0.0 branch:

CREATE TABLE @extschema@.part_config_sub (
    sub_parent text
    , sub_control text NOT NULL
    , sub_partition_interval text NOT NULL
    , sub_partition_type text NOT NULL
    , sub_premake int NOT NULL DEFAULT 4
    , sub_automatic_maintenance text NOT NULL DEFAULT 'on'
    , sub_template_table text
    , sub_retention text
    , sub_retention_schema text
    , sub_retention_keep_index boolean NOT NULL DEFAULT true
    , sub_retention_keep_table boolean NOT NULL DEFAULT true
    , sub_epoch text NOT NULL DEFAULT 'none'
    , sub_constraint_cols text[]
    , sub_optimize_constraint int NOT NULL DEFAULT 30
    , sub_infinite_time_partitions boolean NOT NULL DEFAULT false
    , sub_jobmon boolean NOT NULL DEFAULT true
    , sub_inherit_privileges boolean DEFAULT false
    , sub_constraint_valid boolean DEFAULT true NOT NULL
    , sub_ignore_default_data boolean NOT NULL DEFAULT true
    , sub_default_table boolean default true
    , sub_date_trunc_interval TEXT
    , CONSTRAINT part_config_sub_pkey PRIMARY KEY (sub_parent)
    , CONSTRAINT part_config_sub_sub_parent_fkey FOREIGN KEY (sub_parent) REFERENCES @extschema@.part_config (parent_table) ON DELETE CASCADE ON UPDATE CASCADE DEFERRABLE INITIALLY DEFERRED
    , CONSTRAINT positive_premake_check CHECK (sub_premake > 0)
);

If we remove the DEFERRABLE INITIALLY DEFERRED control on the part_config_sub_sub_parent_fkey constraint to not defer the constraint checks then the error is not occuring again. Maybe there is a reason why that control has been added? It might be a simpler fix to just remove this control?

I made a simple test from the v5.0.0 branch with removing this control and the update went well.

Anyway your fix works fine too.

The FK is deferred on purpose. I left the reason as a comment so I wouldn't forget why as well.

https://github.com/pgpartman/pg_partman/blob/master/sql/tables/tables.sql#L35

Maybe I'll see if I can simply change it to deferred as part of the 5.0.1 update vs moving all the constraint definitions. Thanks for narrowing that down further!

Ok it makes sense!

So I've greatly simplified the update. It also seems to work without splitting it into two transactions, at least with this simple case. I think I'm still going to leave those split transaction instructions in there just in case someone encounters it. Mind giving it another try?

Thanks so much for investigating that further!

Hi @keithf4,

Great the update (by going throw 5.0.0 first) is still working fine.

During my testing I noticed a small change between 4.x and 5.x and I dont think it is intentional:

In 4.x we could create a native partition with a nullable control column (but it was not possible for non-native partitions as we can see in this condition in create_parent.sql at line 116).

In 5.x there is only native paritions and this control in now enforced as we can see here in the same file.

Is it intentional or is it a mistake? I can open another issue if you prefer. But I think it can be fixed in 5.0.1 as well. In we case I created a native partition with a nullable column (which is allowed by postgres), I can upgrade to 5.0.1 but I would not be able to recreate the same partition set on the same schema now.

Best,

That not being enforced in native was an oversight. It should be enforced unless anyone can show me a good use-case for not enforcing it.

Null values in the control column for the time/integer based partitioning supported by pg_partman can cause long term issues since the only tables the NULL values can go into is the default table. Data building up in the default can have serious performance issue on maintenance the more they build up. Enforcing the control to be not-null avoids a foot-gun for users.

Sure, I agree with you, maybe you should add this to the docs or at least the 5.x upgrade guide?

Will add it to the changelog. I thought I had done that. Thanks!