vttablet: enable_replication_reporter should be configurable
mdkent opened this issue · comments
Right now in vttablet, enable_replication_reporter defaults to yes in externalDatastoreFlags. This causes issues when we add a rdonly tablet backed by RDS - Vitess tries to check replication on a tablet that's not running it.
In our tests, I believe replication reporter still worked when the tablet was backed by RDS. @PrismaPhonic can you confirm?
I'm not too familiar with RDS myself. Is there a setting that might determine whether or not replication reporter works? If it's more common for RDS to be configured in such a way that replication reporter doesn't work, we should probably change the default for external datastores to off.
In the meantime, you can always override the flag with extraFlags
, in case this is a blocker for you.
When I tested this it did seem to work fine. @mdkent what actual error state are you experiencing with replication reporter on?
Sorry, I should have specified we're seeing this on Aurora.
When I tested this it did seem to work fine. @mdkent what actual error state are you experiencing with replication reporter on?
Sure! Given the following config:
shards:
- keyRange: {}
databaseInitScriptSecret:
name: foo-cluster-config
key: init_db.sql
replication:
enforceSemiSync: false
tabletPools:
- cell: useast1
type: externalmaster
replicas: 1
vttablet:
extraFlags:
db_charset: utf8mb4
queryserver-config-pool-size: "5"
queryserver-config-stream-pool-size: "5"
queryserver-config-transaction-cap: "5"
resources:
requests:
cpu: 250m
memory: 256Mi
limits:
memory: 384Mi
externalDatastore:
user: admin
host: foo-vitess.cluster-xn3akgqwgtt8.us-east-1.rds.amazonaws.com
port: 3306
database: foo_production
credentialsSecret:
name: foo-cluster-config
key: db_creds.json
- cell: useast1
type: externalrdonly
replicas: 1
vttablet:
extraFlags:
db_charset: utf8mb4
queryserver-config-pool-size: "5"
queryserver-config-stream-pool-size: "5"
queryserver-config-transaction-cap: "5"
resources:
requests:
cpu: 250m
memory: 256Mi
limits:
memory: 384Mi
externalDatastore:
user: admin
host: foo-vitess.cluster-ro-xn3akgqwgtt8.us-east-1.rds.amazonaws.com
port: 3306
database: foo_production
credentialsSecret:
name: foo-cluster-config
key: db_creds.json
The externalrdonly tablets that are hitting the read-only endpoint that aurora provides never enter normal service:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
foo-etcd-cf3bf24b-1 1/1 Running 0 4m16s
foo-etcd-cf3bf24b-2 1/1 Running 0 4m16s
foo-etcd-cf3bf24b-3 1/1 Running 0 4m16s
foo-useast1-vtctld-305faec9-7db9949599-tcjmr 1/1 Running 2 4m16s
foo-useast1-vtgate-7ca66dd7-64bd485f47-26skv 1/1 Running 3 4m16s
foo-vttablet-useast1-0876642563-eb6ce985 1/1 Running 2 4m16s
foo-vttablet-useast1-2417040302-e6f2f418 0/1 Running 2 4m16s
foo-vttablet-useast1-2793830880-0ec53a62 0/1 Running 2 4m16s
foo-vttablet-useast1-3998625595-3da2ec65 1/1 Running 2 4m16s
vitess-operator-7f885997cb-ggdbd 1/1 Running 0 4m50s
vtgate says
I0925 21:34:12.242053 1 tablet_health_check.go:110] HealthCheckUpdate(Serving State): tablet: useast1-2417040302 (10.119.15.162) serving false => false for mainunsharded/- (RDONLY) reason: healthCheck update error: vttablet error: no slave status
This read-only endpoint is a bit tricky. In single node operation it just hits the primary, but will spread load to additional replicas as we add them. None of those replicas will ever respond to show slave status though.
With
"enable_replication_reporter": false,
in the operator I'm able to use these read-only endpoints normally.
In the meantime, you can always override the flag with
extraFlags
, in case this is a blocker for you.
Didn't realize I could override a default option like this. Thanks!
Working via
- cell: useast1
type: externalrdonly
replicas: 1
vttablet:
extraFlags:
db_charset: utf8mb4
queryserver-config-pool-size: "5"
queryserver-config-stream-pool-size: "5"
queryserver-config-transaction-cap: "5"
enable_replication_reporter: "false"