[Question] After testing, it feels much slower than postgresql.
long2ice opened this issue · comments
A very simple usage scenario. I run the integration test locally. It takes more than 60 seconds to use postgresql 11, but it takes more than 130 seconds to run on polardb. Both are stand-alone versions deployed using docker.
It shouldn't feel so different? Is my usage posture wrong? Is there any relevant parameters that can be tuned?
Hi @long2ice ~ Thanks for opening this issue! 🎉
Please make sure you have provided enough information for subsequent discussion.
We will get back to you as soon as possible. ❤️
@long2ice Hi, thanks for testing PolarDB-PG.
Can you describe about your testing workload? Is it mainly DDL, or DML, or something else? Also, What is the shared_buffer
size of PostgreSQL 11?
@mrdrivingduck Hello, thanks for your quick reply! The testing workload is create database, insert dataset, and then run tests. Most of test case is DML. And postgres shared_buffers is 128M, for polardb is 2GB.
@long2ice Could you please add some log print during these three phase: [TIME] create database [TIME] insert data [TIME] run tests [TIME]? So that we can peek which part is slow.
There could be other reasons. For example, in the container of PolarDB-PG, actually there is three databases running: one is primary, two is replica, with synchronous_commit
set to on. I'm not sure if it is a problem.
polardb:
create database and run migration (DDL): 28s
init data: 22s
run all test: 65s
total: 115s
postgres:
create database and run migration (DDL): fast
init data: 13s
run all test: 55s
total: 68s
Try following commands in PolarDB-PG container:
Stop two replica database:
pg_ctl -D /var/polardb/replica_datadir1/ stop
pg_ctl -D /var/polardb/replica_datadir2/ stop
Drop the replication slot on primary:
select pg_drop_replication_slot('replica1');
select pg_drop_replication_slot('replica2');
postgres@e086c61cd078:~$ pg_ctl -D /var/polardb/replica_datadir1/ stop
pg_ctl: PID file "/var/polardb/replica_datadir1/postmaster.pid" does not exist
Is server running?
postgres@e086c61cd078:~$ pg_ctl -D /var/polardb/replica_datadir2/ stop
pg_ctl: PID file "/var/polardb/replica_datadir2/postmaster.pid" does not exist
Is server running?
Maybe they are not running?
I used polardb/polardb_pg_local_instance
docker image to deploy that.
@long2ice Can you run ps -ef
to see if there are three process groups running? If there is only one, that's fine.
postgres@da9f3038df35:~$ ps -ef
UID PID PPID C STIME TTY TIME CMD
postgres 1 0 0 11:26 ? 00:00:00 /bin/bash ./docker-entrypoint.sh postgres
postgres 16 1 1 11:26 ? 00:00:00 /home/postgres/tmp_basedir_polardb_pg_1100_bld/bin/postgres -D /var/polardb/primary_datadir
postgres 17 16 0 11:26 ? 00:00:00 postgres(5432): logger 0
postgres 18 16 0 11:26 ? 00:00:00 postgres(5432): logger 1
postgres 19 16 0 11:26 ? 00:00:00 postgres(5432): logger 2
postgres 20 16 0 11:26 ? 00:00:00 postgres(5432): background flashback log inserter
postgres 21 16 0 11:26 ? 00:00:00 postgres(5432): background flashback log writer
postgres 23 16 0 11:26 ? 00:00:00 postgres(5432): polar worker process
postgres 24 16 0 11:26 ? 00:00:00 postgres(5432): PSS dispatcher
postgres 25 16 0 11:26 ? 00:00:00 postgres(5432): PSS dispatcher
postgres 26 16 0 11:26 ? 00:00:00 postgres(5432): polar wal pipeliner
postgres 28 16 0 11:26 ? 00:00:00 postgres(5432): checkpointer
postgres 29 16 0 11:26 ? 00:00:00 postgres(5432): background writer
postgres 30 16 0 11:26 ? 00:00:00 postgres(5432): walwriter
postgres 31 16 1 11:26 ? 00:00:00 postgres(5432): background logindex writer
postgres 32 16 0 11:26 ? 00:00:00 postgres(5432): autovacuum launcher
postgres 33 16 0 11:26 ? 00:00:00 postgres(5432): stats collector
postgres 34 16 0 11:26 ? 00:00:00 postgres(5432): TimescaleDB Background Worker Launcher
postgres 35 16 0 11:26 ? 00:00:00 postgres(5432): logical replication launcher
postgres 36 16 0 11:26 ? 00:00:00 postgres(5432): polar parallel bgwriter
postgres 37 16 0 11:26 ? 00:00:00 postgres(5432): polar parallel bgwriter
postgres 38 16 0 11:26 ? 00:00:00 postgres(5432): polar parallel bgwriter
postgres 39 16 0 11:26 ? 00:00:00 postgres(5432): polar parallel bgwriter
postgres 40 16 0 11:26 ? 00:00:00 postgres(5432): polar parallel bgwriter
postgres 64 1 0 11:26 ? 00:00:00 tail -f /dev/null
postgres 65 0 0 11:27 pts/0 00:00:00 bash
postgres 77 65 0 11:27 pts/0 00:00:00 ps -ef
@long2ice Seems that only primary node is running. Drop the replication slot on primary, and run the test again?
Weird. Is your image update to date?
docker pull polardb/polardb_pg_local_instance
By default, there will be three nodes running inside container.
I recreate the container, now there are three nodes. Then I run tests, which cost 143s.
Then I stop replica nodes and remove slots, and run tests, which cost 82s. It seems faster, but still slow than postgres.
Which phase does this (82-68) seconds come from?
Looks like create database and migration, just ddl
@long2ice OK. In a real benchmark scenario, the time of preparation (table schema creation, data importing) is not calculated, we usually care about the TPS (transaction per second) or QPS (query per second) on CRUD (DML). DDL usually cannot be executed concurrently so that cannot measure the throughput of a system.
The benefit of shutting down replica is because for some DDL, the primary writes a WAL record and must wait until replicas read and replay the WAL record before it can move on. This incurs extra I/O operation and latency. Why I put three nodes in our Docker container is because some cluster level features can be experienced easily, not for performance benchmarking.
OK, thanks for your help!
@mrdrivingduck Hello, after test another found is that the writing speed is slow. Is there anything that can be optimized?
@mrdrivingduck Hello, after test another found is that the writing speed is slow. Is there anything that can be optimized?
We have known writing could be a short board of PolarDB-PG, especially INSERT. If you are importing data, Use PG's COPY grammar instead of INSERT.
I know, thanks for your reply!
Hello, happy new week! Sorry to bother you again. I found another strange phenomenon. After I import data to polardb, If I make select query at once, the query result may be different from the expected result. But if I wait serval minutes and select again, anything work fine. What's the problem? What I did was stop replica nodes and remove replica slots, and set polar_enable_shared_server = off
, polar_enable_shm_aset = off
. I found change the two options can resolve some problems of query timeout.
Hello, happy new week! Sorry to bother you again. I found another strange phenomenon. After I import data to polardb, If I make select query at once, the query result may be different from the expected result. But if I wait serval minutes and select again, anything work fine. What's the problem? What I did was stop replica nodes and remove replica slots, and set
polar_enable_shared_server = off
,polar_enable_shm_aset = off
. I found change the two options can resolve some problems of query timeout.
These two parameters represent the shared server capability. After setting them, it's necessary to restart the database. Please set them to 'off' and restart the database to verify if the inconsistency issue still persists.it seems that the data consistency problem is not closely related to these two parameters. When you observe at two different querying time points, you can check if there is any difference in background processes and whether there is an operation similar to replay taking place.
Yes. of course. I use pg_ctl -D /var/polardb/primary_datadir/ restart
At the moment when the data inconsistency issues arise during the query, could you run the ps -ef command to check the background process activities?
ok,at the moment of data consistency, please also execute the ps -ef command to observe the background process activities.Let's compare to see if there is a process that might be obstructing data visibility.
Looks like a transaction connection.
Yes, it seems the issue is likely caused by this user process. This appears to be a user process,do you know what operation it is executing? It's probable that this process is performing some sort of DML operation on the query data (akin to the data not yet being committed). Alternatively, you could use the gdb command to examine what this operation is doing.
My work flow is create database, seed data ( exec large sql file in transaction) and run migration to keep database table latest ( some DDL). But I think if the program finish, the action in database should also finish. That doesn't seem to be the case.
update: looks like not related to the transaction connection. I tried again, no transaction connection but still failed.
Could you send over the operational process so we can verify it later? now I think the most likely possibility is that the workflow considers the data writing to be completed before the transaction is committed.
@long2ice Any minimum reproduce SQLs and steps based on the container started from the Docker image? We can deploy a docker container from the same image and it is easier to reproduce.
I can't reproduce in minimum program, It seems to happen when there's a lot of data or DDL
Finally we resolve that by turn off preread related settings.
Finally we resolve that by turn off preread related settings.
Which setting did you set exactly? I am curious.