High Availability / Disaster Recovery in SonarW

SonarW provides high system availability through replication of the Master node data to one or multiple Slave nodes through the sonarha package. The replication is asynchronous and the Slave node replicates the data from the Master node at preset time intervals. The recommended synchronization time interval is 1 hour but users can set any time interval to meet specific requirements.

Master: The Master SonarW node is the only write-enabled instance.

Slave: The Slave SonarW node synchronizes data with the Master node and serves as a live read-only instance of the Master node. A sonarha script synchronizes data with Master at specific intervals and merges all changes.

Automatic Redundancy: In case the Master node fails due to a hardware or software failure, one of the Slave nodes can be promoted to become a Master node. This new Master node retains the data until the last synchronization interval and holds the last document ID for each collection in a database.

Installation:

  1. Install SonarW if not already installed. The complete installation steps are described at http://sonarwdocs.jsonar.com/installsonar.html. By default, the package is installed in /usr/lib/sonarw/sonarha directory.

  2. Generate ssh key pairs and copy the keys to the master node. Verify that the slave user can login to master node over ssh without requiring password.

  3. Provide the Slave and Master node credentials in /etc/sonar/sonarha.conf.

  4. Execute the following command to setup Sonarha:

    python sonarha_config.pyc
    
  5. Set the ‘slavemode_on’ flag to ‘true’ in /etc/sonar/sonard.conf to enable slave mode features, including write-lock.

  6. Manually execute the python script to verify complete first sync:

    python /usr/lib/sonarw/sonarha/sonarha.pyc
    

The sync log is stored in $SONAR_HOME/log directory.

  1. Schedule a crontab that executes the sonarha script at the desired frequency.

Depending on the size of the databases, the initial synchronization may take longer time to complete copying all the data to the slave node, depending on the size of databases. Hence, it is recommended to first restore data from an existing backup if one exists. Sonarha node will synchronize only incremental changes in the subsequent calls.

Configuration:

The slave configuration parameters are stored in the slave.conf file. Following is an example of an new configuration file:

[SLAVE_CLIENT]
host = 127.0.0.1
port = 27117

[MASTER_SERVER]
unix_username = sonarw
master_uri = mongodb://[username:password@]host1[:port1][,host2[:port2],...[,hostN[:portN]]][/[database][?options]]

[PRIORITY]
"db1" : ["col2", "col3", "col1"]
"db2" : ["col3", "col2", "col5"]

Users should edit the sonarha.conf file to fill these fields. The Slave node will be unable to complete synchronization with Master node if one or more of these config parameters are incorrect or missing. The priority field is optional where users can provide the synchronization sequence to be followed for the specified list of priority collections.

Installation Tips:

  1. Ensure that either credentials of root sonarw user is provided in the slaveuser.conf file or authentication is disable before executing the slaveuser_installer.py script.
  2. The sonarw user on Master should have roles equivalent to or including : {“userAdminAnyDatabase”, “readWriteAnyDatabase”}.
  3. A common issue is that ssh keys of the Master node are not correctly installed on the Slave node. Ensure that the slave Linux user that executes the sonarha script is able to connect to the Master node without requiring password.
  4. Ensure that authentication has been enabled (“auth” parameter) and the slaveuser is setup before enabling the slave (“slavemode_on” parameter).
  5. Watch out for file permissions i.e. the Slave node unix user that runs sonarha should have write access to the $SONAR_HOME directory and read access to sonarha.conf file.
  6. Ignore the –link-dest warnings that will show might show up in the first sync of data. They occurs because there is no data to compare with in the destination.

Promoting Slave to Master:

In the event Master node fails, one of the Slave nodes needs to be promoted to become the Master node. This conversion is to be performed manually through the following steps:

  1. Remove write-lock from the node by setting the ‘slavemode_on’ flag to ‘false’ in sonard.conf
  2. Restart SonarW

If the previous Master node recovers, the user needs to make the decision either to keep the current Master node or restore the Master status for the previous node. It is not recommended to have two Master nodes running at the same time for the same database as it may cause data consistency issues. Once the Master node is decided, the next step is to manually transfer any missing documents in the current Master node from the previous Master node(s). One of the ways to achieve this is to calculate the range of difference in the documents and then ‘mongoimport’ them manually. The slave node always keeps a file named ‘last_document’ for each collection in a database that stores the document ID of the last synchronized document from the previous synchronization call.

In case of multiple Slave nodes, update the sonarha.conf file for each of the Slaves by providing the URI of the new Master node. If the Slaves can connect to the new Master, they will update their data automatically.