SQLD replica is super slow (due to blocking)?
mattatjff opened this issue · comments
I've got a test setup of LibSQL SQLD running with a primary and a replica in a docker image. The docker-compose.yml is as follows:
version: "3"
services:
lsql-1:
image: ghcr.io/tursodatabase/libsql-server:latest
platform: linux/amd64
container_name: lsql-1
restart: no
ports:
- "8051:8051"
expose:
- 5001
volumes:
- ./data/1:/var/lib/sqld
environment:
- SQLD_NODE=primary
- SQLD_HTTP_AUTH=basic:YWRtaW46YWRtaW4=
- SQLD_GRPC_LISTEN_ADDR=0.0.0.0:5001
- SQLD_HTTP_LISTEN_ADDR=0.0.0.0:8051
lsql-2:
image: ghcr.io/tursodatabase/libsql-server:latest
platform: linux/amd64
container_name: lsql-2
restart: no
ports:
- "8052:8052"
expose:
- 5002
volumes:
- ./data/2:/var/lib/sqld
environment:
- SQLD_NODE=replica
- SQLD_HTTP_AUTH=basic:YWRtaW46YWRtaW4=
- SQLD_PRIMARY_URL=http://lsql-1:5001/
- SQLD_GRPC_LISTEN_ADDR=0.0.0.0:5002
- SQLD_HTTP_LISTEN_ADDR=0.0.0.0:8052
networks:
lsql:
driver: bridge
This is running on a Minisforum EM780 with 32 GB of RAM. Specs can be found here: https://store.minisforum.com/products/minisforum-em680
Using a test script that performs a number of basic read/write operations (visible here: https://github.com/hiraeth-php/turso/blob/master/test/index.php) I get wildly disparate execution times running it against the primary vs. the replica. The latter being an order of magnitude (over 1 second), slower. Here's the results from time
runs:
Running on the Primary
________________________________________________________
Executed in 75.91 millis fish external
usr time 26.64 millis 1.46 millis 25.17 millis
sys time 15.02 millis 0.63 millis 14.38 millis
Running on the Replica
________________________________________________________
Executed in 1.16 secs fish external
usr time 39.42 millis 833.00 micros 38.59 millis
sys time 10.52 millis 0.00 micros 10.52 millis
You can see the replica is over 1 second, however its usr
and sys
times are not wildly different, which suggests the replica is doing a lot of blocking waiting on and/or syncing with the primary. None of this should really be network latency, as both instances are just two docker containers on the local system talking to one another.