[Question] After testing, it feels much slower than postgresql.

Question

[Question] After testing, it feels much slower than postgresql.

long2ice opened this issue 5 months ago · comments

A very simple usage scenario. I run the integration test locally. It takes more than 60 seconds to use postgresql 11, but it takes more than 130 seconds to run on polardb. Both are stand-alone versions deployed using docker.

It shouldn't feel so different? Is my usage posture wrong? Is there any relevant parameters that can be tuned?

polardb-bot · Answer 1 · Fri Jan 05 2024 10:14:20 GMT+0800 (China Standard Time)

Hi @long2ice ~ Thanks for opening this issue! 🎉

Please make sure you have provided enough information for subsequent discussion.

We will get back to you as soon as possible. ❤️

Mr Dk. · Answer 2 · Fri Jan 05 2024 10:20:24 GMT+0800 (China Standard Time)

@long2ice Hi, thanks for testing PolarDB-PG.

Can you describe about your testing workload? Is it mainly DDL, or DML, or something else? Also, What is the shared_buffer size of PostgreSQL 11?

long2ice · Answer 3 · Fri Jan 05 2024 10:24:30 GMT+0800 (China Standard Time)

@mrdrivingduck Hello, thanks for your quick reply! The testing workload is create database, insert dataset, and then run tests. Most of test case is DML. And postgres shared_buffers is 128M, for polardb is 2GB.

Mr Dk. · Answer 4 · Fri Jan 05 2024 10:32:13 GMT+0800 (China Standard Time)

@long2ice Could you please add some log print during these three phase: [TIME] create database [TIME] insert data [TIME] run tests [TIME]? So that we can peek which part is slow.

There could be other reasons. For example, in the container of PolarDB-PG, actually there is three databases running: one is primary, two is replica, with synchronous_commit set to on. I'm not sure if it is a problem.

long2ice · Answer 5 · Fri Jan 05 2024 10:39:19 GMT+0800 (China Standard Time)

polardb:
create database and run migration (DDL): 28s
init data: 22s
run all test: 65s
total: 115s

postgres:
create database and run migration (DDL): fast
init data: 13s
run all test: 55s
total: 68s

Mr Dk. · Answer 6 · Fri Jan 05 2024 10:56:21 GMT+0800 (China Standard Time)

@long2ice

Try following commands in PolarDB-PG container:

Stop two replica database:

pg_ctl -D /var/polardb/replica_datadir1/ stop
pg_ctl -D /var/polardb/replica_datadir2/ stop

Drop the replication slot on primary:

select pg_drop_replication_slot('replica1');
select pg_drop_replication_slot('replica2');

long2ice · Answer 7 · Fri Jan 05 2024 11:11:33 GMT+0800 (China Standard Time)

postgres@e086c61cd078:~$ pg_ctl -D /var/polardb/replica_datadir1/ stop
pg_ctl: PID file "/var/polardb/replica_datadir1/postmaster.pid" does not exist
Is server running?
postgres@e086c61cd078:~$ pg_ctl -D /var/polardb/replica_datadir2/ stop
pg_ctl: PID file "/var/polardb/replica_datadir2/postmaster.pid" does not exist
Is server running?

Maybe they are not running?
I used polardb/polardb_pg_local_instance docker image to deploy that.

Mr Dk. · Answer 8 · Fri Jan 05 2024 11:26:06 GMT+0800 (China Standard Time)

@long2ice Can you run ps -ef to see if there are three process groups running? If there is only one, that's fine.

long2ice · Answer 9 · Fri Jan 05 2024 11:27:27 GMT+0800 (China Standard Time)

postgres@da9f3038df35:~$ ps -ef
UID          PID    PPID  C STIME TTY          TIME CMD
postgres       1       0  0 11:26 ?        00:00:00 /bin/bash ./docker-entrypoint.sh postgres
postgres      16       1  1 11:26 ?        00:00:00 /home/postgres/tmp_basedir_polardb_pg_1100_bld/bin/postgres -D /var/polardb/primary_datadir
postgres      17      16  0 11:26 ?        00:00:00 postgres(5432): logger  0
postgres      18      16  0 11:26 ?        00:00:00 postgres(5432): logger  1
postgres      19      16  0 11:26 ?        00:00:00 postgres(5432): logger  2
postgres      20      16  0 11:26 ?        00:00:00 postgres(5432): background flashback log inserter  
postgres      21      16  0 11:26 ?        00:00:00 postgres(5432): background flashback log writer  
postgres      23      16  0 11:26 ?        00:00:00 postgres(5432): polar worker process  
postgres      24      16  0 11:26 ?        00:00:00 postgres(5432): PSS dispatcher  
postgres      25      16  0 11:26 ?        00:00:00 postgres(5432): PSS dispatcher  
postgres      26      16  0 11:26 ?        00:00:00 postgres(5432): polar wal pipeliner  
postgres      28      16  0 11:26 ?        00:00:00 postgres(5432): checkpointer  
postgres      29      16  0 11:26 ?        00:00:00 postgres(5432): background writer  
postgres      30      16  0 11:26 ?        00:00:00 postgres(5432): walwriter  
postgres      31      16  1 11:26 ?        00:00:00 postgres(5432): background logindex writer  
postgres      32      16  0 11:26 ?        00:00:00 postgres(5432): autovacuum launcher  
postgres      33      16  0 11:26 ?        00:00:00 postgres(5432): stats collector  
postgres      34      16  0 11:26 ?        00:00:00 postgres(5432): TimescaleDB Background Worker Launcher  
postgres      35      16  0 11:26 ?        00:00:00 postgres(5432): logical replication launcher  
postgres      36      16  0 11:26 ?        00:00:00 postgres(5432): polar parallel bgwriter  
postgres      37      16  0 11:26 ?        00:00:00 postgres(5432): polar parallel bgwriter  
postgres      38      16  0 11:26 ?        00:00:00 postgres(5432): polar parallel bgwriter  
postgres      39      16  0 11:26 ?        00:00:00 postgres(5432): polar parallel bgwriter  
postgres      40      16  0 11:26 ?        00:00:00 postgres(5432): polar parallel bgwriter  
postgres      64       1  0 11:26 ?        00:00:00 tail -f /dev/null
postgres      65       0  0 11:27 pts/0    00:00:00 bash
postgres      77      65  0 11:27 pts/0    00:00:00 ps -ef

Mr Dk. · Answer 10 · Fri Jan 05 2024 11:28:54 GMT+0800 (China Standard Time)

@long2ice Seems that only primary node is running. Drop the replication slot on primary, and run the test again?

long2ice · Answer 11 · Fri Jan 05 2024 11:37:22 GMT+0800 (China Standard Time)

There maybe no replication slot

Mr Dk. · Answer 12 · Fri Jan 05 2024 11:40:38 GMT+0800 (China Standard Time)

Weird. Is your image update to date?

docker pull polardb/polardb_pg_local_instance

By default, there will be three nodes running inside container.

long2ice · Answer 13 · Fri Jan 05 2024 11:53:33 GMT+0800 (China Standard Time)

I recreate the container, now there are three nodes. Then I run tests, which cost 143s.
Then I stop replica nodes and remove slots, and run tests, which cost 82s. It seems faster, but still slow than postgres.

Mr Dk. · Answer 14 · Fri Jan 05 2024 12:13:33 GMT+0800 (China Standard Time)

Which phase does this (82-68) seconds come from?

long2ice · Answer 15 · Fri Jan 05 2024 12:21:36 GMT+0800 (China Standard Time)

Looks like create database and migration, just ddl

Mr Dk. · Answer 16 · Fri Jan 05 2024 12:51:52 GMT+0800 (China Standard Time)

@long2ice OK. In a real benchmark scenario, the time of preparation (table schema creation, data importing) is not calculated, we usually care about the TPS (transaction per second) or QPS (query per second) on CRUD (DML). DDL usually cannot be executed concurrently so that cannot measure the throughput of a system.

The benefit of shutting down replica is because for some DDL, the primary writes a WAL record and must wait until replicas read and replay the WAL record before it can move on. This incurs extra I/O operation and latency. Why I put three nodes in our Docker container is because some cluster level features can be experienced easily, not for performance benchmarking.

long2ice · Answer 17 · Fri Jan 05 2024 13:11:32 GMT+0800 (China Standard Time)

OK, thanks for your help!

long2ice · Answer 18 · Fri Jan 05 2024 16:24:56 GMT+0800 (China Standard Time)

@mrdrivingduck Hello, after test another found is that the writing speed is slow. Is there anything that can be optimized?

Mr Dk. · Answer 19 · Fri Jan 05 2024 17:52:31 GMT+0800 (China Standard Time)

@mrdrivingduck Hello, after test another found is that the writing speed is slow. Is there anything that can be optimized?

We have known writing could be a short board of PolarDB-PG, especially INSERT. If you are importing data, Use PG's COPY grammar instead of INSERT.

long2ice · Answer 20 · Fri Jan 05 2024 17:55:10 GMT+0800 (China Standard Time)

I know, thanks for your reply!

long2ice · Answer 21 · Mon Jan 08 2024 10:28:58 GMT+0800 (China Standard Time)

Hello, happy new week! Sorry to bother you again. I found another strange phenomenon. After I import data to polardb, If I make select query at once, the query result may be different from the expected result. But if I wait serval minutes and select again, anything work fine. What's the problem? What I did was stop replica nodes and remove replica slots, and set polar_enable_shared_server = off, polar_enable_shm_aset = off. I found change the two options can resolve some problems of query timeout.

liuchengshan-lcs · Answer 22 · Mon Jan 08 2024 11:33:14 GMT+0800 (China Standard Time)

Hello, happy new week! Sorry to bother you again. I found another strange phenomenon. After I import data to polardb, If I make select query at once, the query result may be different from the expected result. But if I wait serval minutes and select again, anything work fine. What's the problem? What I did was stop replica nodes and remove replica slots, and set polar_enable_shared_server = off, polar_enable_shm_aset = off. I found change the two options can resolve some problems of query timeout.

These two parameters represent the shared server capability. After setting them, it's necessary to restart the database. Please set them to 'off' and restart the database to verify if the inconsistency issue still persists.it seems that the data consistency problem is not closely related to these two parameters. When you observe at two different querying time points, you can check if there is any difference in background processes and whether there is an operation similar to replay taking place.

long2ice · Answer 23 · Mon Jan 08 2024 11:39:32 GMT+0800 (China Standard Time)

They both are off already, but the inconsistency issue still persists. So I wonder there are other options to make effect.

liuchengshan-lcs · Answer 24 · Mon Jan 08 2024 11:48:41 GMT+0800 (China Standard Time)

They both are `off` already, but the inconsistency issue still persists. So I wonder there are other options to make effect.

Have you restarted the database after completing the parameter settings?

long2ice · Answer 25 · Mon Jan 08 2024 11:49:31 GMT+0800 (China Standard Time)

Yes. of course. I use pg_ctl -D /var/polardb/primary_datadir/ restart

liuchengshan-lcs · Answer 26 · Mon Jan 08 2024 11:57:52 GMT+0800 (China Standard Time)

At the moment when the data inconsistency issues arise during the query, could you run the ps -ef command to check the background process activities?

long2ice · Answer 27 · Mon Jan 08 2024 12:01:03 GMT+0800 (China Standard Time)

long2ice commented 5 months ago

liuchengshan-lcs · Answer 28 · Mon Jan 08 2024 12:06:07 GMT+0800 (China Standard Time)

ok,at the moment of data consistency, please also execute the ps -ef command to observe the background process activities.Let's compare to see if there is a process that might be obstructing data visibility.

long2ice · Answer 29 · Mon Jan 08 2024 13:44:03 GMT+0800 (China Standard Time)

OK, that's it.

long2ice · Answer 30 · Mon Jan 08 2024 13:46:19 GMT+0800 (China Standard Time)

Looks like a transaction connection.

liuchengshan-lcs · Answer 31 · Mon Jan 08 2024 14:00:38 GMT+0800 (China Standard Time)

Yes, it seems the issue is likely caused by this user process. This appears to be a user process，do you know what operation it is executing? It's probable that this process is performing some sort of DML operation on the query data (akin to the data not yet being committed). Alternatively, you could use the gdb command to examine what this operation is doing.

long2ice · Answer 32 · Mon Jan 08 2024 14:12:39 GMT+0800 (China Standard Time)

My work flow is create database, seed data ( exec large sql file in transaction) and run migration to keep database table latest ( some DDL). But I think if the program finish, the action in database should also finish. That doesn't seem to be the case.

long2ice · Answer 33 · Mon Jan 08 2024 14:19:22 GMT+0800 (China Standard Time)

update: looks like not related to the transaction connection. I tried again, no transaction connection but still failed.

liuchengshan-lcs · Answer 34 · Mon Jan 08 2024 14:28:33 GMT+0800 (China Standard Time)

Could you send over the operational process so we can verify it later? now I think the most likely possibility is that the workflow considers the data writing to be completed before the transaction is committed.

Mr Dk. · Answer 35 · Mon Jan 08 2024 14:31:22 GMT+0800 (China Standard Time)

@long2ice Any minimum reproduce SQLs and steps based on the container started from the Docker image? We can deploy a docker container from the same image and it is easier to reproduce.

long2ice · Answer 36 · Mon Jan 08 2024 15:44:21 GMT+0800 (China Standard Time)

I can't reproduce in minimum program, It seems to happen when there's a lot of data or DDL

long2ice · Answer 37 · Thu Jan 18 2024 10:57:31 GMT+0800 (China Standard Time)

Finally we resolve that by turn off preread related settings.

Mr Dk. · Answer 38 · Thu Jan 18 2024 13:07:26 GMT+0800 (China Standard Time)

Finally we resolve that by turn off preread related settings.

Which setting did you set exactly? I am curious.