Support deployments that have multiple couch2pg writing to a single database
mrjones-plip opened this issue · comments
There is at least one major partner that has many CHT Core instances (>40) and they are running many couch2pg instances, but they're writing all the data to one database in Postgres instead of each CHT Core instance having their own database.
We should see if we can easily have couch2pg backlog monitoring support this set up so that alerting can easily be set up for each discrete CHT Core instance's backlog in such a setup.
As a POC, I:
- set up two docker helper CHT Core instances:
- in the cht-couch2pg repo, made a new compose file that's a copy :
services: cht-couch2pg2: image: medicmobile/cht-couch2pg:main-node-10 environment: COUCHDB_URL: ${COUCHDB_URL2:-https://medic:password@localhost:5988/medic} POSTGRES_USER_NAME: ${COUCH2PG_USER:-cht_couch2pg} POSTGRES_PASSWORD: ${COUCH2PG_USER_PASSWORD:-cht_couch2pg_password} POSTGRES_SERVER_NAME: ${POSTGRES_SERVER_NAME:-postgres} POSTGRES_DB_NAME: ${POSTGRES_DB_NAME:-cht} COUCH2PG_CHANGES_LIMIT: ${COUCH2PG_CHANGES_LIMIT:-100} COUCH2PG_SLEEP_MINS: ${COUCH2PG_SLEEP_MINS:-60} COUCH2PG_DOC_LIMIT: ${COUCH2PG_DOC_LIMIT:-1000} COUCH2PG_RETRY_COUNT: ${COUCH2PG_RETRY_COUNT:-5}
- started two couch2pg instances with:
COUCH2PG_SLEEP_MINS=0.1 \ COUCHDB_URL2=https://medic:password@192-168-68-17.local-ip.medicmobile.org:10460/medic \ COUCHDB_URL=https://medic:password@172-17-0-1.local-ip.medicmobile.org:10464/medic \ docker compose -f docker-compose.yml -f compose.just-couch2pg.yml up -d
- ran the default SQL for getting sequence counts, which failed as expected:
sequence db 132 medic 72 medic-sentinel 232 medic 130 medic-sentinel 3 medic-users-meta 20 medic-logs 6 medic-users-meta 5 _users 19 medic-logs 4 _users - but if we tweak the SQL, we can actually extract the CHT Instance for free (instead of depending on the
cht-instances.yml
orsql-servers.yml
):SELECT split_part(seq,'-',1) as sequence, split_part(source,'/',2) as db, split_part(source,'/',1) as cht_instance FROM couchdb_progress WHERE source like '%/%' and seq like '%-%' ORDER BY cht_instance, db
- this results in a nice table, as expected:
sequence db cht_instance 232 medic 172-17-0-1.local-ip.medicmobile.org:10464 19 medic-logs 172-17-0-1.local-ip.medicmobile.org:10464 130 medic-sentinel 172-17-0-1.local-ip.medicmobile.org:10464 6 medic-users-meta 172-17-0-1.local-ip.medicmobile.org:10464 4 _users 172-17-0-1.local-ip.medicmobile.org:10464 132 medic 192-168-68-17.local-ip.medicmobile.org:10460 20 medic-logs 192-168-68-17.local-ip.medicmobile.org:10460 72 medic-sentinel 192-168-68-17.local-ip.medicmobile.org:10460 3 medic-users-meta 192-168-68-17.local-ip.medicmobile.org:10460 5 _users 192-168-68-17.local-ip.medicmobile.org:10460
Next I'll see if we can update watchdog to natively use this informed approach, instead the prior naive approach, by default. This way the cht_instance
will be ignored most the time because there only ever be one, but it will be forwards compatible in the rare circumstance where there's more than one.
Success! by updating the dashboard JSON to use the new field, we can reduce the amount of configuration in sql_instances.yml
and still be backwards compatible.
So, in grafana needed target
just needed to be updated to cht_instance
:
sum(couch2pg_progress_sequence{cht_instance=~"$cht_instance", db=~"medic|medic-sentinel|medic-users-meta"})
Still outstanding to do is to get fake-cht working with new syntax