pkdone / MongoDB-AUTO-HA

Easily Demonstrate Fast Failover & Auto-Healing for a MongoDB Replica Set

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool


Project to enable someone to easily demonstrate the fast failover and auto-healing of a MongoDB Replica Set, all run and demonstrated from a single laptop/workstation. Note: For MDB-SAs there is also a demo video.

Demo UI

Demo Prerequisites

Ensure the following software is already installed on the laptop/workstation.

Demo Setup

Using the laptop/workstation's normal OS native terminal/shell, from the base directory of this project launch the Terminator multi-paned terminal application where the whole of the demo will then be executed from (this uses a specific configuration file .terminator_config to show Terminator with the layout structure required to best show this demo):


    NOTE: If you are using MacOS and have issues using Fink to install Terminator then you can just use iTerm instead and then layout 5 instances of iTerm to roughly match the screenshot shown above (i.e. a 1st row of 2 iTerms + a 2nd row of 3 iTerms)

Demo Execution

  1. Using the 3 bottom panes shown in Terminator (or the iTerms), start 3 instances of the monitoring Bash/Mongo-Shell script, one in each pane, which will check the health of the local mongod servers listening on ports 27000, 27001 and 27002 respectively (IMPORTANT: do not change these ports as other scripts assume these specific ports are being used):
./ 27000
./ 27001
./ 27002

    (initially these monitoring scripts will report that the mongod servers are down, because they have not been started yet)

  1. In the top right pane, first show and explain the contents of the shell script which will be used to kill all existing mongod processes and will then start 3 mongod servers, each listening on different local ports, then run it:

    (the 3 monitoring scripts will now report that the servers are up but not initialised as a replica set)

  1. In the top right pane, clear the existing output and first show and explain the contents of the shell script which will configure a replica set using the 3 running mongod servers, then run it:

    (the 3 monitoring scripts will now report that the servers are all now configured, with one shown as the primary and two shown as secondaries, each showing the number of records that have currently been inserted into an arbitrary database collection - currently zero)

  1. In the top left pane, first show and explain the contents of the Python script which will insert new records into the arbitrary database collection, in the replica set, then run it:

    (the 3 monitoring scripts will now report that the number of records contained in the database collections is increasing over time)

  1. In the top right pane, clear the existing output and run the following script to list the 3 running mongod servers alongside their OS process IDs, then abruptly terminate (kill -9) the process corresponding to the mongod which is currently shown as being primary in the bottom 3 panes (replace the 12345 argument with the real process ID):
kill -9 12345

    (the 3 monitoring scripts will report that one of the servers has gone down, and for the remaining 2 servers, for a second or two, both are still secondaries with no more records being inserted, and then one automatically becomes the primary and additional records are automatically inserted continuously again; also notice that in the top left pane the Python script reports a temporary connection problem, before carrying on its work - because retryable reads and writes have not been enabled for it)

  1. In the top right pane, clear the existing output and display the content of shell script and then copy the one line, corresponding to the killed mongod server, then paste and execute the copied command into the same pane terminal, to restart the failed mongod server (the example below shows the command line for starting the first of the 3 mongod servers, which you may need to change if it was one of the other 2 servers which had been killed):
mongod --replSet TestRS --port 27000 --dbpath /tmp/data/r0 --fork --logpath /tmp/data/r0/log/mongod.log

    (the 3 monitoring scripts will now report that all 3 servers are happily running and the recovered server, shown now as a secondary, is catching up on the records it missed when it was down)

  1. In the top left pane, stop the Python running script and then re-start it again with the argument retry passed to it, to instruct the PyMongo driver to now enable retryable reads & writes, to further insulate the client application from the short failover window that occurs when a primary goes down:
# Type CTRL-C / CMD-C
./ retry

    (notice that the output of this Python script now shows that retryable reads & writes are set to TRUE)

  1. In the top right pane, clear the existing output and list the mongod server process IDs again, then terminate the one currently shown as primary (replace the 12345 argument with the real process ID):
kill -9 12345

    (this time, in the top left pane, the Python script will not report any connection problems when the failover occurs - notice the running monitoring scripts in the bottom 3 panes will show a stall in increasing number of inserted records, for a second or two, before increasing again when one of the two remaining mongod servers automatically becomes the primary)


Credits / Thanks

  • Eugene Bogaart for coming up with the original server health detection and colour coding mechanism which I then adapted further in
  • Jim Blackhurst for the original suggestion to use Terminator for displaying the 5 required 'views'
  • Jake McInteer for testing on MacOS + finding and reporting some bugs


Easily Demonstrate Fast Failover & Auto-Healing for a MongoDB Replica Set


Language:Shell 69.0%Language:Python 31.0%