postgres -> postgres include/exclude logic not working correctly
pob8888 opened this issue · comments
-
pgloader --version
3.6.3~devel and also 3.6.2079646 (compiled from src today)
-
did you test a fresh compile from the source tree?
yes
- did you search for other similar issues?
yes - issue #1153 similar, but was not resolved/progressed
- how can I reproduce the bug?
load database
from pgsql://adminuser:redacted@dbsrc.abcd.eu-west-2.rds.amazonaws.com:5432/postgres
into pgsql://adminuser:redacted@dbdst.abcd.eu-west-2.rds.amazonaws.com:5432/postgres
with drop schema
including only table names matching ~/.*/ in schema 'schema1'
including only table names matching ~/.*/ in schema 'schema2'
excluding table names matching ~/_staging/ in schema 'schema1'
-- excluding table names matching ~/_staging/ in schema 'schema2'
cast type "character varying" to "character varying" keep typemod
;
- pgloader output you obtain
2024-01-24T20:21:37.008000Z LOG pgloader version "3.6.3~devel"
2024-01-24T20:21:37.008000Z LOG Data errors in '/tmp/pgloader/'
2024-01-24T20:21:37.008000Z LOG Parsing commands from file #P"/home/pob/my.dms"
2024-01-24T20:21:37.196002Z LOG Migrating from #<PGSQL-CONNECTION pgsql://adminuser@dbsrc.abcd.eu-west-2.rds.amazonaws.com:5432/postgres {1007EE2E93}>
2024-01-24T20:21:37.196002Z LOG Migrating into #<PGSQL-CONNECTION pgsql://adminuser@dbdst.abcd.eu-west-2.rds.amazonaws.com:5432/postgres {1007EE46A3}>
2024-01-24T20:21:39.052005Z LOG report summary reset
table name errors rows bytes total time
------------------------- --------- --------- --------- --------------
fetch meta data 0 1 0.336s
Create Schemas 0 0 0.164s
Create SQL Types 0 0 0.052s
Create tables 0 2 0.144s
Set Table OIDs 0 1 0.020s
------------------------- --------- --------- --------- --------------
"schema1"."films_staging" 0 0 0.312s
------------------------- --------- --------- --------- --------------
COPY Threads Completion 0 4 0.316s
Index Build Completion 0 0 0.000s
Reset Sequences 0 0 0.260s
Primary Keys 0 0 0.000s
Create Foreign Keys 0 0 0.000s
Create Triggers 0 0 0.096s
Install Comments 0 0 0.000s
------------------------- --------- --------- --------- --------------
Total import time ✓ 0 0.672s
- data that is being loaded, if relevant
\d schema*.*
Table "schema1.films_staging"
Column | Type | Collation | Nullable | Default
---------+-----------------------+-----------+----------+-------------------------
title | character varying(20) | | | NULL::character varying
release | date | | |
awards | character varying(20) | | | NULL::character varying
- How the data is different from what you expected, if relevant
I am expecting it to NOT load the table schema1.films_staging
I am also expecting to see schema1.films_me, schema2.films, schema2.filmes_staging being copied
Instead, it is copying only schema1.films_staging - it is as though the exclude is inverted.
Ultimately, I am trying to achieve something like this config:
load database
from pgsql://adminuser:redacted@dbsrc.abcd.eu-west-2.rds.amazonaws.com:5432/postgres
into pgsql://adminuser:redacted@dbdst.abcd.eu-west-2.rds.amazonaws.com:5432/postgres
with drop schema
including only table names matching ~/.*/ in schema 'schema1'
including only table names matching ~/.*/ in schema 'schema2'
excluding table names matching ~/_staging/ in schema 'schema1'
excluding table names matching ~/_staging/ in schema 'schema2'
cast type "character varying" to "character varying" keep typemod
;
successfully copy all tables from schema1 and schema2, excluding any tables in schema1 or schema2 that match the string *_staging
Further explanation:
src database:
postgres=> \d schema*.*
Table "schema1.films_me"
Column | Type | Collation | Nullable | Default
---------+-----------------------+-----------+----------+---------
title | character varying(20) | | |
release | date | | |
awards | character varying(20) | | |
Table "schema1.films_staging"
Column | Type | Collation | Nullable | Default
---------+-----------------------+-----------+----------+---------
title | character varying(20) | | |
release | date | | |
awards | character varying(20) | | |
Table "schema2.films"
Column | Type | Collation | Nullable | Default
---------+-----------------------+-----------+----------+---------
title | character varying(20) | | |
release | date | | |
awards | character varying(20) | | |
Table "schema2.films_staging"
Column | Type | Collation | Nullable | Default
---------+-----------------------+-----------+----------+---------
title | character varying(20) | | |
release | date | | |
awards | character varying(20) | | |
postgres=>
destination database before running pgloader:
postgres=> \d schema*.*
Did not find any relation named "schema*.*".
postgres=>
run pgloader my.loader
pob@pob-laptop:~$ ./pgloader my.loader
2024-01-24T20:25:26.008000Z LOG pgloader version "3.6.2079646"
2024-01-24T20:25:26.008000Z LOG Data errors in '/tmp/pgloader/'
2024-01-24T20:25:26.008000Z LOG Parsing commands from file #P"/home/pob/my.loader"
2024-01-24T20:25:26.208001Z LOG Migrating from #<PGSQL-CONNECTION pgsql://adminuser@dbsrc.abcd.eu-west-2.rds.amazonaws.com:5432/postgres {1007E2A7A3}>
2024-01-24T20:25:26.208001Z LOG Migrating into #<PGSQL-CONNECTION pgsql://adminuser@dbdst.abcd.eu-west-2.rds.amazonaws.com:5432/postgres {1007E2BFB3}>
2024-01-24T20:25:27.924004Z LOG report summary reset
table name errors rows bytes total time
------------------------- --------- --------- --------- --------------
fetch meta data 0 1 0.412s
Create Schemas 0 0 0.080s
Create SQL Types 0 0 0.048s
Create tables 0 2 0.148s
Set Table OIDs 0 1 0.020s
------------------------- --------- --------- --------- --------------
"schema1"."films_staging" 0 0 0.236s
------------------------- --------- --------- --------- --------------
COPY Threads Completion 0 4 0.244s
Index Build Completion 0 0 0.000s
Reset Sequences 0 0 0.292s
Primary Keys 0 0 0.000s
Create Foreign Keys 0 0 0.000s
Create Triggers 0 0 0.040s
Install Comments 0 0 0.000s
------------------------- --------- --------- --------- --------------
Total import time ✓ 0 0.576s
pob@pob-laptop:~$
destination database after running pgloader
postgres=> \d schema*.*
Table "schema1.films_staging"
Column | Type | Collation | Nullable | Default
---------+-----------------------+-----------+----------+-------------------------
title | character varying(20) | | | NULL::character varying
release | date | | |
awards | character varying(20) | | | NULL::character varying
Facing a similar issue. My set up is slightly different, but the symptom seems to be the same.
I'm running in a docker container, so I didn't do a fresh build locally. My Dockerfile is very basic:
FROM dimitri/pgloader:latest
I have a subset of tables I want to migrate from sqlserver (2016) -> postgres. The source and target DBs are hosted in docker containers as well.
This example only has a single table, as it reproduces the error consistently (my ms.load file can be seen in the log below). I'm setting a fetch limit to avoid hitting out of heap errors.
I run pgloader with -d and -v to see what information i could get:
# uname -a
Linux 01d1a0a03d1c 5.10.102.1-microsoft-standard-WSL2 #1 SMP Wed Mar 2 00:30:59 UTC 2022 x86_64 GNU/Linux
# pgloader --version
pgloader version "3.6.7~devel"
compiled with SBCL 2.1.1.debian
# pgloader -d -v ms.load
pgloader version 3.6.7~devel
compiled with SBCL 2.1.1.debian
sb-impl::*default-external-format* :UTF-8
tmpdir: #P"/tmp/pgloader/"
2024-01-24T23:30:27.000000Z NOTICE Starting pgloader, log system is ready.
2024-01-24T23:30:27.010000Z INFO Starting monitor
2024-01-24T23:30:27.010000Z LOG pgloader version "3.6.7~devel"
2024-01-24T23:30:27.020000Z INFO Parsed command:
load database
from mssql://<user>@<host>/<src_db>
into pgsql://<user>@<host>/<dst_db>
including only table names like 'address' in schema 'application'
WITH prefetch rows = 10000
;
2024-01-24T23:30:27.090000Z DEBUG CONNECTED TO #<PGLOADER.PGSQL:PGSQL-CONNECTION pgsql://<user>@<host>/<dst_db> {1007E429A3}>
2024-01-24T23:30:27.090000Z DEBUG SET client_encoding TO 'utf8'
2024-01-24T23:30:27.090000Z DEBUG SET application_name TO 'pgloader'
2024-01-24T23:30:27.100000Z LOG Migrating from #<MSSQL-CONNECTION mssql://<user>@<host>/<src_db> {1007E41EE3}>
2024-01-24T23:30:27.100000Z LOG Migrating into #<PGSQL-CONNECTION pgsql://<user>@<host>/<dst_db> {1007E429A3}>
Max connections reached, increase value of TDS_MAX_CONN
2024-01-24T23:30:27.120000Z SQL MSSQL: sending query: -- params: dbname
-- table-type-name
-- including
-- filter-list-to-where-clause including
-- excluding
-- filter-list-to-where-clause excluding
select c.TABLE_SCHEMA,
c.TABLE_NAME,
c.COLUMN_NAME,
c.DATA_TYPE,
CASE
WHEN c.COLUMN_DEFAULT LIKE '((%' AND c.COLUMN_DEFAULT LIKE '%))' THEN
CASE
WHEN SUBSTRING(c.COLUMN_DEFAULT,3,len(c.COLUMN_DEFAULT)-4) = 'newid()' THEN 'GENERATE_UUID'
WHEN SUBSTRING(c.COLUMN_DEFAULT,3,len(c.COLUMN_DEFAULT)-4) LIKE 'convert(%varchar%,getdate(),%)' THEN 'today'
WHEN SUBSTRING(c.COLUMN_DEFAULT,3,len(c.COLUMN_DEFAULT)-4) = 'getdate()' THEN 'CURRENT_TIMESTAMP'
WHEN SUBSTRING(c.COLUMN_DEFAULT,3,len(c.COLUMN_DEFAULT)-4) LIKE '''%''' THEN SUBSTRING(c.COLUMN_DEFAULT,4,len(c.COLUMN_DEFAULT)-6)
ELSE SUBSTRING(c.COLUMN_DEFAULT,3,len(c.COLUMN_DEFAULT)-4)
END
WHEN c.COLUMN_DEFAULT LIKE '(%' AND c.COLUMN_DEFAULT LIKE '%)' THEN
CASE
WHEN SUBSTRING(c.COLUMN_DEFAULT,2,len(c.COLUMN_DEFAULT)-2) = 'newid()' THEN 'GENERATE_UUID'
WHEN SUBSTRING(c.COLUMN_DEFAULT,2,len(c.COLUMN_DEFAULT)-2) LIKE 'convert(%varchar%,getdate(),%)' THEN 'today'
WHEN SUBSTRING(c.COLUMN_DEFAULT,2,len(c.COLUMN_DEFAULT)-2) = 'getdate()' THEN 'CURRENT_TIMESTAMP'
WHEN SUBSTRING(c.COLUMN_DEFAULT,2,len(c.COLUMN_DEFAULT)-2) LIKE '''%''' THEN SUBSTRING(c.COLUMN_DEFAULT,3,len(c.COLUMN_DEFAULT)-4)
ELSE SUBSTRING(c.COLUMN_DEFAULT,2,len(c.COLUMN_DEFAULT)-2)
END
ELSE c.COLUMN_DEFAULT
END,
c.IS_NULLABLE,
COLUMNPROPERTY(object_id(c.TABLE_NAME), c.COLUMN_NAME, 'IsIdentity'),
c.CHARACTER_MAXIMUM_LENGTH,
c.NUMERIC_PRECISION,
c.NUMERIC_PRECISION_RADIX,
c.NUMERIC_SCALE,
c.DATETIME_PRECISION,
c.CHARACTER_SET_NAME,
c.COLLATION_NAME
from INFORMATION_SCHEMA.COLUMNS c
join INFORMATION_SCHEMA.TABLES t
on c.TABLE_SCHEMA = t.TABLE_SCHEMA
and c.TABLE_NAME = t.TABLE_NAME
where c.TABLE_CATALOG = 'axelerator_hannover'
and t.TABLE_TYPE = 'BASE TABLE'
and ((c.table_schema = 'application' and c.table_name LIKE 'address'))
order by c.table_schema, c.table_name, c.ordinal_position;
2024-01-24T23:30:27.150000Z SQL MSSQL: sending query: -- params: including
-- filter-list-to-where-clause including
-- excluding
-- filter-list-to-where-clause excluding
select schema_name(schema_id) as SchemaName,
o.name as TableName,
REPLACE(i.name, '.', '_') as IndexName,
co.[name] as ColumnName,
i.is_unique,
i.is_primary_key,
i.filter_definition
from sys.indexes i
join sys.objects o on i.object_id = o.object_id
join sys.index_columns ic on ic.object_id = i.object_id
and ic.index_id = i.index_id
join sys.columns co on co.object_id = i.object_id
and co.column_id = ic.column_id
where schema_name(schema_id) not in ('dto', 'sys')
and ((schema_name(schema_id) = 'application' and o.name LIKE 'address'))
order by SchemaName,
o.[name],
i.[name],
ic.is_included_column,
ic.key_ordinal;
2024-01-24T23:30:27.170000Z SQL MSSQL: sending query: -- params: dbname
-- including
-- filter-list-to-where-clause including
-- excluding
-- filter-list-to-where-clause excluding
SELECT
REPLACE(KCU1.CONSTRAINT_NAME, '.', '_') AS 'CONSTRAINT_NAME'
, KCU1.TABLE_SCHEMA AS 'TABLE_SCHEMA'
, KCU1.TABLE_NAME AS 'TABLE_NAME'
, KCU1.COLUMN_NAME AS 'COLUMN_NAME'
, KCU2.TABLE_SCHEMA AS 'UNIQUE_TABLE_SCHEMA'
, KCU2.TABLE_NAME AS 'UNIQUE_TABLE_NAME'
, KCU2.COLUMN_NAME AS 'UNIQUE_COLUMN_NAME'
, RC.UPDATE_RULE AS 'UPDATE_RULE'
, RC.DELETE_RULE AS 'DELETE_RULE'
FROM INFORMATION_SCHEMA.REFERENTIAL_CONSTRAINTS RC
JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE KCU1
ON KCU1.CONSTRAINT_CATALOG = RC.CONSTRAINT_CATALOG
AND KCU1.CONSTRAINT_SCHEMA = RC.CONSTRAINT_SCHEMA
AND KCU1.CONSTRAINT_NAME = RC.CONSTRAINT_NAME
JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE KCU2
ON KCU2.CONSTRAINT_CATALOG = RC.UNIQUE_CONSTRAINT_CATALOG
AND KCU2.CONSTRAINT_SCHEMA = RC.UNIQUE_CONSTRAINT_SCHEMA
AND KCU2.CONSTRAINT_NAME = RC.UNIQUE_CONSTRAINT_NAME
WHERE KCU1.ORDINAL_POSITION = KCU2.ORDINAL_POSITION
AND KCU1.TABLE_CATALOG = 'axelerator_hannover'
AND KCU1.CONSTRAINT_CATALOG = 'axelerator_hannover'
AND KCU1.CONSTRAINT_SCHEMA NOT IN ('dto', 'sys')
AND KCU1.TABLE_SCHEMA NOT IN ('dto', 'sys')
AND KCU2.TABLE_SCHEMA NOT IN ('dto', 'sys')
and ((kcu1.table_schema = 'application' and kcu1.table_name LIKE 'address'))
ORDER BY KCU1.CONSTRAINT_NAME, KCU1.ORDINAL_POSITION;
2024-01-24T23:30:27.180000Z ERROR MSSQL ERROR: %dbsqlexec fail
2024-01-24T23:30:27.180000Z LOG You might need to review the FreeTDS protocol version in your freetds.conf file, see http://www.freetds.org/userguide/choosingtdsprotocol.htm
2024-01-24T23:30:27.180000Z LOG report summary reset
table name errors read imported bytes total time read write
----------------- --------- --------- --------- --------- -------------- --------- ---------
fetch meta data 0 0 0 0.000s
----------------- --------- --------- --------- --------- -------------- --------- ---------
----------------- --------- --------- --------- --------- -------------- --------- ---------
2024-01-24T23:30:27.190000Z INFO Stopping monitor
#
I'm not an expert on sqlserver, but I believe there is an issue in the last query using kcu1 rather than KCU1 as the table alias. Running a simple query (just via SQL Server Management Studio):
select *
from INFORMATION_SCHEMA.KEY_COLUMN_USAGE KCU1
ORDER BY KCU1.CONSTRAINT_NAME
<returns rows>
select *
from INFORMATION_SCHEMA.KEY_COLUMN_USAGE KCU1
WHERE kcu1.table_name LIKE 'address'
Msg 4104, Level 16, State 1, Line 41
The multi-part identifier "kcu1.table_name" could not be bound.
Removing the including only table names like 'address' in schema 'application'
from the ms.load file removes this issue and my migration runs successfully.