Load CSV to Postgres very slow
xqin1 opened this issue · comments
When loading CSV file to Postgres, it seems that the stream of the file limits the number of records being inserted into the PG table each time.
the code:
let {cmd, text, details} = await ogr2ogr(filePath,
{
format: 'PostgreSQL',
options: ['-nln', 'test', '-gt', 2000000],
destination: connection
});
On Postgresql side, the following statement is executed multiple times and each time to insert small number of records:
COPY "test" ("wkb_geometry", "filer_id", "filer_desc") FROM STDIN;
This is for V3 only, I don't have the issue with V2. Is there any settings for Postgresql only?
I think the issue is with option -skipfailures
being set by default. With this option, the -gt
option is no longer valie.
Here's from OGR2OGR documentment:
When writing into transactional DBMS (SQLite/PostgreSQL,MySQL, etc…), it might be beneficial to increase the number of INSERT statements executed between BEGIN TRANSACTION and COMMIT TRANSACTION statements. This number is specified with the -gt option. For example, for SQLite, explicitly defining -gt 65536 ensures optimal performance while populating some table containing many hundreds of thousands or millions of rows. However, note that -skipfailures overrides -gt and sets the size of transactions to 1.
Is it possible to remove -skipfailures
as default option?
-skipfailures
is very commonly used, in most cases so it is enabled by default due to historically that being the options most users want to get the results they want. That said, we don't have a way to turn it off and should for these cases. I wonder if we add a disableSkipFailures
option.
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.
This issue was closed because it has been stalled for 5 days with no activity.
Sorry being late on this. An disableSkipFailures
option would solve the issue, any chance to implement this feature?