Support deployments that have multiple couch2pg writing to a single database

Question

Support deployments that have multiple couch2pg writing to a single database

mrjones-plip opened this issue a month ago · comments

There is at least one major partner that has many CHT Core instances (>40) and they are running many couch2pg instances, but they're writing all the data to one database in Postgres instead of each CHT Core instance having their own database.

We should see if we can easily have couch2pg backlog monitoring support this set up so that alerting can easily be set up for each discrete CHT Core instance's backlog in such a setup.

mrjones · Answer 1 · Sun May 12 2024 01:07:36 GMT+0800 (China Standard Time)

As a POC, I:

set up two docker helper CHT Core instances:
- https://192-168-68-17.local-ip.medicmobile.org:10460/
- https://172-17-0-1.local-ip.medicmobile.org:10464

in the cht-couch2pg repo, made a new compose file that's a copy :

 services:
     cht-couch2pg2:
         image: medicmobile/cht-couch2pg:main-node-10
         environment:
         COUCHDB_URL: ${COUCHDB_URL2:-https://medic:password@localhost:5988/medic}
         POSTGRES_USER_NAME: ${COUCH2PG_USER:-cht_couch2pg}
         POSTGRES_PASSWORD: ${COUCH2PG_USER_PASSWORD:-cht_couch2pg_password}
         POSTGRES_SERVER_NAME: ${POSTGRES_SERVER_NAME:-postgres}
         POSTGRES_DB_NAME: ${POSTGRES_DB_NAME:-cht}
         COUCH2PG_CHANGES_LIMIT: ${COUCH2PG_CHANGES_LIMIT:-100}
         COUCH2PG_SLEEP_MINS: ${COUCH2PG_SLEEP_MINS:-60}
         COUCH2PG_DOC_LIMIT: ${COUCH2PG_DOC_LIMIT:-1000}
         COUCH2PG_RETRY_COUNT: ${COUCH2PG_RETRY_COUNT:-5}

started two couch2pg instances with:

 COUCH2PG_SLEEP_MINS=0.1 \                            
     COUCHDB_URL2=https://medic:password@192-168-68-17.local-ip.medicmobile.org:10460/medic \
     COUCHDB_URL=https://medic:password@172-17-0-1.local-ip.medicmobile.org:10464/medic \
     docker compose -f docker-compose.yml -f compose.just-couch2pg.yml up -d

ran the default SQL for getting sequence counts, which failed as expected:

sequence db

132 medic

72 medic-sentinel

232 medic

130 medic-sentinel

3 medic-users-meta

20 medic-logs

6 medic-users-meta

5 _users

19 medic-logs

4 _users

but if we tweak the SQL, we can actually extract the CHT Instance for free (instead of depending on the cht-instances.yml or sql-servers.yml):

 SELECT
   split_part(seq,'-',1) as sequence,
   split_part(source,'/',2) as db,
   split_part(source,'/',1) as cht_instance
 FROM
   couchdb_progress
 WHERE
   source like '%/%' and
   seq like '%-%'
 ORDER BY
   cht_instance, db

this results in a nice table, as expected:

sequence	db	cht_instance
232	medic	172-17-0-1.local-ip.medicmobile.org:10464
19	medic-logs	172-17-0-1.local-ip.medicmobile.org:10464
130	medic-sentinel	172-17-0-1.local-ip.medicmobile.org:10464
6	medic-users-meta	172-17-0-1.local-ip.medicmobile.org:10464
4	_users	172-17-0-1.local-ip.medicmobile.org:10464
132	medic	192-168-68-17.local-ip.medicmobile.org:10460
20	medic-logs	192-168-68-17.local-ip.medicmobile.org:10460
72	medic-sentinel	192-168-68-17.local-ip.medicmobile.org:10460
3	medic-users-meta	192-168-68-17.local-ip.medicmobile.org:10460
5	_users	192-168-68-17.local-ip.medicmobile.org:10460

Next I'll see if we can update watchdog to natively use this informed approach, instead the prior naive approach, by default. This way the cht_instance will be ignored most the time because there only ever be one, but it will be forwards compatible in the rare circumstance where there's more than one.

mrjones · Answer 2 · Mon May 13 2024 00:15:38 GMT+0800 (China Standard Time)

Success! by updating the dashboard JSON to use the new field, we can reduce the amount of configuration in sql_instances.yml and still be backwards compatible.

So, in grafana needed target just needed to be updated to cht_instance:

sum(couch2pg_progress_sequence{cht_instance=~"$cht_instance", db=~"medic|medic-sentinel|medic-users-meta"})

Still outstanding to do is to get fake-cht working with new syntax