dimitri / pgloader

Migrate to PostgreSQL in a single command!

Home Page:http://pgloader.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

MSSQL: How to speed up data copying?

makeitokay opened this issue · comments

  • pgloader --version

    I'm using ghcr.io/dimitri/pgloader:latest docker image.

  • did you test a fresh compile from the source tree?

  • did you search for other similar issues?

  • how can I reproduce the bug?

load database
 from mssql://<user>:<password>@<host>/<database>
 into pgsql://<user>:<password>@<host>/<database>

with
 data only,
 truncate,
 create no schemas,
 create no tables,
 create no indexes,
 no foreign keys,
 quote identifiers,

set mssql parameters textsize to '1073741824'

alter schema 'dbo' rename to 'public';

I am migrating a 42 GB database from SQL Server to PostgreSQL. At the moment, this process takes 23-25 minutes. I want to speed up this process as much as possible.
Here is what has already been done to optimize:

  • a large number of resources have been allocated to postgres, as well as pgloader - so we don't run into CPU and ram
  • the network has been tested using iperf3, we don't run into it either
  • all indexes, foreign keys, and primary keys were deleted from the postgresql database before migration
  • configured postgresql: shared_buffers 8gb, effective_cache_size 15gb, work_mem, maintenance_work_mem 5gb
  • postgres works on a fast ssd drive (450 MB/s for reading and writing)

With these actions, it was possible to achieve a result in 23-25 minutes. However, pgloader has batching and concurrency settings, which I also tried to configure.

  1. With the default settings, the migration summary looks like this
    Total import time 268413205 268413205 42.2 GB 25m11.697s
  2. When trying to increase the number of workers and concurrency by 1 (workers = 5, concurrency = 2), migration becomes twice as slow: only 20 GB were copied in the same 25 minutes.
  3. With default concurrency settings, but with batch rows 100,000, prefetch rows 400,000, batch size 1 gb, the result is:
    Total import time 268413205 268413205 42.2 GB 26m36.852s
    It didn't get better, but it got even a little worse.
  4. The increase also did not help: batch rows 200,000, prefetch rows 800,000, batch size 2 gb
    Total import time 268413205 268413205 42.2 GB 26m40.961s

In addition to configuring the batching and parallelism of pgloader, I also tried to speed up the migration by configuring postgres parameters such as archive_mode, max_wal_size, checkpoint_timeout, synchronous_commit, but none of this affected the speed in any way.

Tell me, please, how can I speed up the process of copying data using pgloader parameters? Do you have any tips on which postgres settings to set up to speed up data insertion? Do you have any ideas what else you can explore to speed up the process?