dimitri / pgloader

Migrate to PostgreSQL in a single command!

Home Page:http://pgloader.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

pgloader repeatedly disconnecting from db during migration attempt

gdelia-pm opened this issue · comments

I am attempting to use pgloader to migrate a mysql database to postgres but having a heap of trouble doing it. Regardless of how I approach it I'm pretty consistently seeing what I assume is a bug that causes pgloader to crash out.

What I've Done

  1. created a simple azure hosted postgres database and stopped running connections to my working mysql instance.
  2. created a .load migration file that contains the parameters a vendor suggested we use for the migration. I've adjusted this a few times to account for changes I've seen other suggest but regardless I see the same error each time
LOAD DATABASE
  FROM    mysql://mmdbdevuser:password@127.0.0.1:3306/mattermost
  INTO    pgsql://mmdbdevuser:password@hostname:5432/mattermost

WITH data only,
  workers = 8, concurrency = 1,
  multiple readers per thread, rows per range = 50000,
  create no tables, create no indexes,
  preserve index names

SET PostgreSQL PARAMETERS
  maintenance_work_mem to '512MB',
  work_mem to '48MB'

SET MySQL PARAMETERS
  net_read_timeout  = '1500',
  net_write_timeout = '1500'

CAST column Channels.Type to "channel_type" drop typemod,
  column Teams.Type to "team_type" drop typemod,
  column UploadSessions.Type to "upload_session_type" drop typemod,
  column Drafts.Priority to text,
  type int when (= precision 11) to integer drop typemod,
  type bigint when (= precision 20) to bigint drop typemod,
  type text to varchar drop typemod,
  type tinyint when (<= precision 4) to boolean using tinyint-to-boolean,
  type json to jsonb drop typemod

EXCLUDING TABLE NAMES MATCHING ~<IR_>, ~<focalboard>

BEFORE LOAD DO
  $$ ALTER SCHEMA public RENAME TO mattermost; $$,
  $$ DROP INDEX IF EXISTS idx_posts_message_txt; $$,
  $$ DROP INDEX IF EXISTS idx_fileinfo_content_txt; $$

AFTER LOAD DO
  $$ UPDATE mattermost.db_migrations set name='add_createat_to_teamembers' where version=92; $$,
  $$ CREATE INDEX IF NOT EXISTS idx_posts_message_txt ON mattermost.posts USING gin(to_tsvector('english', message)); $$,
  $$ CREATE INDEX IF NOT EXISTS idx_fileinfo_content_txt ON mattermost.fileinfo USING gin(to_tsvector('english', content)); $$,
  $$ ALTER SCHEMA mattermost RENAME TO public; $$,
  $$ SELECT pg_catalog.set_config('search_path', '"$user", public', false); $$,
  $$ ALTER USER  SET SEARCH_PATH TO 'public'; $$;
  1. I create the postgres database and set the schema --I've tried skipping this step to and get the same error, I'm including it here to clarify that the psql database seems to work fine when I interact with it without using psqL
  2. I start the pgloader command and it crashes with the below error
2023-10-25T21:10:35.259000Z ERROR A thread failed with error: Database error: Connection to database server lost.
2023-10-25T21:10:35.624000Z ERROR Date/time: 2023-10-25-21:10!
An unhandled error condition has been signalled:
   Database error: Connection to database server lost.


Backtrace for: #<SB-THREAD:THREAD "lparallel" RUNNING {10090FBB53}>
0: (TRIVIAL-BACKTRACE:PRINT-BACKTRACE-TO-STREAM #<SB-IMPL::CHARACTER-STRING-OSTREAM {1017674163}>)
1: (TRIVIAL-BACKTRACE:PRINT-BACKTRACE #<CL-POSTGRES:DATABASE-CONNECTION-LOST {1017673EB3}> :OUTPUT NIL :IF-EXISTS :APPEND :VERBOSE NIL)
2: ((LAMBDA (CONDITION) :IN PGLOADER.LOAD:COPY-FROM) #<CL-POSTGRES:DATABASE-CONNECTION-LOST {1017673EB3}>)
3: (LPARALLEL.KERNEL::CONDITION-HANDLER #<CL-POSTGRES:DATABASE-CONNECTION-LOST {1017673EB3}>)
4: (SB-KERNEL::%SIGNAL #<CL-POSTGRES:DATABASE-CONNECTION-LOST {1017673EB3}>)
5: (ERROR #<CL-POSTGRES:DATABASE-CONNECTION-LOST {1017673EB3}>)
6: (SB-KERNEL:WITH-SIMPLE-CONDITION-RESTARTS ERROR NIL CL-POSTGRES:DATABASE-CONNECTION-LOST :MESSAGE "Connection to database server lost.")
7: (CL-POSTGRES::ENSURE-CONNECTION #<CL-POSTGRES:DATABASE-CONNECTION {1008D8EE73}> 0)
8: (CL-POSTGRES:EXEC-QUERY #<CL-POSTGRES:DATABASE-CONNECTION {1008D8EE73}> "ROLLBACK" CL-POSTGRES:IGNORE-ROW-READER)
9: (PGLOADER.PGCOPY::STREAM-ROWS-TO-COPY #1=#S(PGLOADER.CATALOG:TABLE :SOURCE-NAME #2="posts" :NAME "posts" :SCHEMA #3=#S(PGLOADER.CATALOG:SCHEMA :SOURCE-NAME #4="mattermost" :NAME "mattermost" :CATALOG #S(PGLOADER.CATALOG:CATALOG  ........................
10: (PGLOADER.PGCOPY:COPY-ROWS-FROM-QUEUE #<PGLOADER.SOURCE.MYSQL:COPY-MYSQL {100E181A13}> #S(LPARALLEL.VECTOR-QUEUE:VECTOR-QUEUE :IMPL #S(LPARALLEL.VECTOR-QUEUE::RAW-QUEUE :DATA #(NIL NIL NIL NIL NIL NIL NIL NIL NIL NIL ................................
11: ((FLET "BODY-FN0" :IN LPARALLEL.KERNEL::MAKE-CHANNELED-TASK))
12: ((LAMBDA NIL :IN LPARALLEL.KERNEL::MAKE-CHANNELED-TASK))
13: (LPARALLEL.KERNEL::EXEC-TASK/WORKER #<unavailable argument> #S(LPARALLEL.KERNEL::WORKER :HANDSHAKE/FROM-WORKER #S(LPARALLEL.CONS-QUEUE:CONS-QUEUE :IMPL #S(LPARALLEL.RAW-QUEUE:RAW-QUEUE :HEAD NIL :TAIL NIL) :LOCK #<SB-THREAD:MUTEX #1="Anonymous lock" (free)> :CVAR NIL) :HANDSHAKE/TO-WORKER #S(LPARALLEL.CONS-QUEUE:CONS-QUEUE :IMPL #S(LPARALLEL.RAW-QUEUE:RAW-QUEUE :HEAD NIL :TAIL NIL) :LOCK #<SB-THREAD:MUTEX #1# (free)> :CVAR #<SB-THREAD:WAITQUEUE Anonymous condition variable {1007B276D3}>) :EXIT-NOTIFICATION #S(LPARALLEL.CONS-QUEUE:CONS-QUEUE :IMPL #S(LPARALLEL.RAW-QUEUE:RAW-QUEUE :HEAD NIL :TAIL NIL) :LOCK #<SB-THREAD:MUTEX #1# (free)> :CVAR NIL) :THREAD #<SB-THREAD:THREAD "lparallel" RUNNING {10090FBB53}> :RUNNING-CATEGORY :DEFAULT :INDEX 2 :TASKS #S(LPARALLEL.SPIN-QUEUE:SPIN-QUEUE :HEAD (LPARALLEL.SPIN-QUEUE::DUMMY (#<CLOSURE #2=(LAMBDA NIL :IN LPARALLEL.KERNEL::MAKE-CHANNELED-TASK) {100E184ECB}> . :DEFAULT) (#<CLOSURE #2# {100E18783B}> . :DEFAULT) (#<CLOSURE #2# {100F180EBB}> . :DEFAULT) (#<CLOSURE #2# {100F18366B}> . :DEFAULT) (#<CLOSURE #2# {100F185E7B}> . :DEFAULT) (#<CLOSURE #2# {100FBB067B}> . :DEFAULT) (#<CLOSURE #2# {100FBB2FFB}> . :DEFAULT) (#<CLOSURE #2# {100FBB583B}> . :DEFAULT) (#<CLOSURE #2# {100FEDFF9B}> . :DEFAULT) (#<CLOSURE #2# {101089989B}> . :DEFAULT) (#<CLOSURE #2# {101089C0CB}> . :DEFAULT) (#<CLOSURE #2# {101089E8FB}> . :DEFAULT) (#<CLOSURE #2# {10119B904B}> . :DEFAULT) (#<CLOSURE #2# {10119BB8AB}> . :DEFAULT) . #3=((#<CLOSURE #2# {10119BE02B}> . :DEFAULT))) :TAIL #3#)))
14: (LPARALLEL.KERNEL::WORKER-LOOP #<LPARALLEL.KERNEL:KERNEL :NAME "lparallel" :WORKER-COUNT 8 :USE-CALLER NIL :ALIVE T :SPIN-COUNT 2000 {10090FABC3}> #S(LPARALLEL.KERNEL::WORKER :HANDSHAKE/FROM-WORKER #S(LPARALLEL.CONS-QUEUE:CONS-QUEUE :IMPL #S(LPARALLEL.RAW-QUEUE:RAW-QUEUE :HEAD NIL :TAIL NIL) :LOCK #<SB-THREAD:MUTEX #1="Anonymous lock" (free)> :CVAR NIL) :HANDSHAKE/TO-WORKER #S(LPARALLEL.CONS-QUEUE:CONS-QUEUE :IMPL #S(LPARALLEL.RAW-QUEUE:RAW-QUEUE :HEAD NIL :TAIL NIL) :LOCK #<SB-THREAD:MUTEX #1# (free)> :CVAR #<SB-THREAD:WAITQUEUE Anonymous condition variable {1007B276D3}>) :EXIT-NOTIFICATION #S(LPARALLEL.CONS-QUEUE:CONS-QUEUE :IMPL #S(LPARALLEL.RAW-QUEUE:RAW-QUEUE :HEAD NIL :TAIL NIL) :LOCK #<SB-THREAD:MUTEX #1# (free)> :CVAR NIL) :THREAD #<SB-THREAD:THREAD "lparallel" RUNNING {10090FBB53}> :RUNNING-CATEGORY :DEFAULT :INDEX 2 :TASKS #S(LPARALLEL.SPIN-QUEUE:SPIN-QUEUE :HEAD (LPARALLEL.SPIN-QUEUE::DUMMY (#<CLOSURE #2=(LAMBDA NIL :IN LPARALLEL.KERNEL::MAKE-CHANNELED-TASK) {100E184ECB}> . :DEFAULT) (#<CLOSURE #2# {100E18783B}> . :DEFAULT) (#<CLOSURE #2# {100F180EBB}> . :DEFAULT) (#<CLOSURE #2# {100F18366B}> . :DEFAULT) (#<CLOSURE #2# {100F185E7B}> . :DEFAULT) (#<CLOSURE #2# {100FBB067B}> . :DEFAULT) (#<CLOSURE #2# {100FBB2FFB}> . :DEFAULT) (#<CLOSURE #2# {100FBB583B}> . :DEFAULT) (#<CLOSURE #2# {100FEDFF9B}> . :DEFAULT) (#<CLOSURE #2# {101089989B}> . :DEFAULT) (#<CLOSURE #2# {101089C0CB}> . :DEFAULT) (#<CLOSURE #2# {101089E8FB}> . :DEFAULT) (#<CLOSURE #2# {10119B904B}> . :DEFAULT) (#<CLOSURE #2# {10119BB8AB}> . :DEFAULT) . #3=((#<CLOSURE #2# {10119BE02B}> . :DEFAULT))) :TAIL #3#)))
15: (LPARALLEL.KERNEL::%CALL-WITH-TASK-HANDLER #<unavailable argument>)
16: ((LAMBDA NIL :IN LPARALLEL.KERNEL::CALL-WITH-WORKER-CONTEXT))
17: (LPARALLEL.KERNEL::CALL-WITH-WORKER-CONTEXT #<CLOSURE (LAMBDA NIL :IN LPARALLEL.KERNEL::ENTER-WORKER-LOOP) {1007B276BB}> #<FUNCTION FUNCALL> #<LPARALLEL.KERNEL:KERNEL :NAME "lparallel" :WORKER-COUNT 8 :USE-CALLER NIL :ALIVE T :SPIN-COUNT 2000 {10090FABC3}> #S(LPARALLEL.KERNEL::WORKER :HANDSHAKE/FROM-WORKER #S(LPARALLEL.CONS-QUEUE:CONS-QUEUE :IMPL #S(LPARALLEL.RAW-QUEUE:RAW-QUEUE :HEAD NIL :TAIL NIL) :LOCK #<SB-THREAD:MUTEX #1="Anonymous lock" (free)> :CVAR NIL) :HANDSHAKE/TO-WORKER #S(LPARALLEL.CONS-QUEUE:CONS-QUEUE :IMPL #S(LPARALLEL.RAW-QUEUE:RAW-QUEUE :HEAD NIL :TAIL NIL) :LOCK #<SB-THREAD:MUTEX #1# (free)> :CVAR #<SB-THREAD:WAITQUEUE Anonymous condition variable {1007B276D3}>) :EXIT-NOTIFICATION #S(LPARALLEL.CONS-QUEUE:CONS-QUEUE :IMPL #S(LPARALLEL.RAW-QUEUE:RAW-QUEUE :HEAD NIL :TAIL NIL) :LOCK #<SB-THREAD:MUTEX #1# (free)> :CVAR NIL) :THREAD #<SB-THREAD:THREAD "lparallel" RUNNING {10090FBB53}> :RUNNING-CATEGORY :DEFAULT :INDEX 2 :TASKS #S(LPARALLEL.SPIN-QUEUE:SPIN-QUEUE :HEAD (LPARALLEL.SPIN-QUEUE::DUMMY (#<CLOSURE #2=(LAMBDA NIL :IN LPARALLEL.KERNEL::MAKE-CHANNELED-TASK) {100E184ECB}> . :DEFAULT) (#<CLOSURE #2# {100E18783B}> . :DEFAULT) (#<CLOSURE #2# {100F180EBB}> . :DEFAULT) (#<CLOSURE #2# {100F18366B}> . :DEFAULT) (#<CLOSURE #2# {100F185E7B}> . :DEFAULT) (#<CLOSURE #2# {100FBB067B}> . :DEFAULT) (#<CLOSURE #2# {100FBB2FFB}> . :DEFAULT) (#<CLOSURE #2# {100FBB583B}> . :DEFAULT) (#<CLOSURE #2# {100FEDFF9B}> . :DEFAULT) (#<CLOSURE #2# {101089989B}> . :DEFAULT) (#<CLOSURE #2# {101089C0CB}> . :DEFAULT) (#<CLOSURE #2# {101089E8FB}> . :DEFAULT) (#<CLOSURE #2# {10119B904B}> . :DEFAULT) (#<CLOSURE #2# {10119BB8AB}> . :DEFAULT) . #3=((#<CLOSURE #2# {10119BE02B}> . :DEFAULT))) :TAIL #3#)))
18: ((LAMBDA NIL :IN LPARALLEL.KERNEL::MAKE-WORKER-THREAD))
19: ((LABELS BORDEAUX-THREADS::%BINDING-DEFAULT-SPECIALS-WRAPPER :IN BORDEAUX-THREADS::BINDING-DEFAULT-SPECIALS))
20: ((FLET SB-UNIX::BODY :IN SB-THREAD::NEW-LISP-THREAD-TRAMPOLINE))
21: ((FLET "WITHOUT-INTERRUPTS-BODY-4" :IN SB-THREAD::NEW-LISP-THREAD-TRAMPOLINE))
22: ((FLET SB-THREAD::WITH-MUTEX-THUNK :IN SB-THREAD::NEW-LISP-THREAD-TRAMPOLINE))
23: ((FLET "WITHOUT-INTERRUPTS-BODY-1" :IN SB-THREAD::CALL-WITH-MUTEX))
24: (SB-THREAD::CALL-WITH-MUTEX #<CLOSURE (FLET SB-THREAD::WITH-MUTEX-THUNK :IN SB-THREAD::NEW-LISP-THREAD-TRAMPOLINE) {7F833B95ED7B}> #<SB-THREAD:MUTEX "thread result lock" owner: #<SB-THREAD:THREAD "lparallel" RUNNING {10090FBB53}>> NIL T NIL)
25: (SB-THREAD::NEW-LISP-THREAD-TRAMPOLINE #<SB-THREAD:THREAD "lparallel" RUNNING {10090FBB53}> NIL #<CLOSURE (LABELS BORDEAUX-THREADS::%BINDING-DEFAULT-SPECIALS-WRAPPER :IN BORDEAUX-THREADS::BINDING-DEFAULT-SPECIALS) {10090FBAFB}> NIL)
26: ("foreign function: call_into_lisp")
27: ("foreign function: new_thread_trampoline")
  1. I'm unsure how to proceed from here once it happens. I don't see the postgres db actually load any data, when this happens, and I don't see enough info in either mysql or psql that tells me which side's connection has crashed or why. Both database servers remain up the whole time and are accessible and queryable when this occurs.
  • pgloader --version

       pgloader version "3.6.999791d" 
       compiled with SBCL 2.0.1.debian```
    
    
  • did you test a fresh compile from the source tree?
    Yes, I tried it with a fresh compile in this example, I saw the same using the ubuntu rpm as well

  • did you search for other similar issues?
    yes, the only thing I saw that was similar came here #1122 but I had no success trying to adjust what they did to get around it (and they noted it only worked intermittently for them even with the changes)

Something worth noting is that if I use the INCLUDING ONLY TABLES parameters I can move a singe table at a time, but if I remove that and drop workers to 1 the migration seems to hang, and at 2 the migration behaves as described above.

I'm Getting the same error for Mattermost Database migration, My setup is as follows,

You can find the details here: https://forum.mattermost.com/t/migration-from-mysql-8-to-postgres-database/17166

I've 2 docker containers one for MySQL 8 and for Psql15, running on my VM (2 core, 4 GB), I've tried tweaking the WITH configurations like

WITH data only,
     workers = 4, concurrency = 1,
     multiple readers per thread, rows per range = 5000,
     create no tables, create no indexes,
     batch size = 50MB, prefetch rows = 500,
     preserve index names

And the pgloader command to.

pgloader --dynamic-space-size 1564 migration.load > migration.log

I have the original MySQL database sizes about 6 GB (MySQL data-directory size)…

But after running some time the pgloader CPU usage reaches very high and it stops with error…

Database error: Connection to database server lost.

It is happening for me after 2GB of data transferred to postgress data directory, I’ve tried tuning the values in the WITH, reducing --dynamic-space-size also even tried increasing the server capacity,

But getting the connection lost error when reaching the 1.8 GB - 2 GB mark.

@Akshaychdev Let me know if you've had any luck. I found a way around this (no thanks to my weeks long case with MM support) for my instance and I'm happy to share to see if it will work for more folks.

I found that only the posts table was causing crashes, so I did a 2 part move and some data transforms after the fact.

  1. Create a .load file for the posts table migration and run the migration against your instance. This should work without crashing

`
LOAD DATABASE
FROM mysql://user:PASSWORD@localhost:3306/mattermost
INTO pgsql://user:PASSWORd@localhost:5432/mattermost

				WITH
				  workers = 2,
				  concurrency = 1,
				  preserve index names
				
				SET PostgreSQL PARAMETERS
				  maintenance_work_mem to '512MB',
				  work_mem to '48MB'
				
				CAST type int when (= precision 11) to integer drop typemod,
				     type bigint when (= precision 20) to bigint drop typemod,
				     type text to varchar drop typemod,
				     type json to jsonb drop typemod
				
				SET MySQL PARAMETERS
				  net_read_timeout  = '1500',
				  net_write_timeout = '1500'
				
				INCLUDING ONLY TABLE NAMES MATCHING 'posts'
				
				BEFORE LOAD DO
				  $$ ALTER SCHEMA public RENAME TO mattermost; $$,
				  $$ ALTER TABLE public.posts ALTER COLUMN message   TYPE character varying(65535) COLLATE pg_catalog."default"; $$,
				  $$ ALTER TABLE public.posts ALTER COLUMN hashtags  TYPE character varying(1000)  COLLATE pg_catalog."default"; $$,
				  $$ ALTER TABLE public.posts ALTER COLUMN filenames TYPE character varying(4000)  COLLATE pg_catalog."default"; $$,
				  $$ ALTER TABLE public.posts ALTER COLUMN fileids   TYPE character varying(3000)  COLLATE pg_catalog."default"; $$
				
				AFTER LOAD DO
				        $$ ALTER SCHEMA mattermost RENAME TO public; $$

;
`

  1. Then update the postgres tables constraint to match the what the morph command sets. I also took some time to compare the schema set by the above to the schema the morph command sets to make sure they match. You can do this and make adjustments, but it will probably work either way
    ALTER TABLE public.posts RENAME CONSTRAINT random name TO posts_pkey;
  2. Use your previous pgloader command to move all other tables besides posts (either with an include only tables or exclude tables command'
  3. Update your matttermost config driver/dsn to the new database
  4. profit?

We have another user reported the same issue, where the connection got lost during the migration process. I wonder if this is a false-positive output as the connection should've been restored with cea82a6 cc @dimitri