SonarW Trackers

SOT is used to mirror or archive documents stored in MongoDB into SonarW.

SonarW Oplog Tracker for MongoDB

The SonarW Oplog Tracker (SOT) for MongoDB is a utility that track changes to specific namespaces (<db-name>.<collection-name>) on MongoDB clusters, including replica sets and shards, and mirrors them on SonarW. SOT is composed ofa main program that runs one or more agent sub-programs; each agent is tracking specific collections on one or more mongoDB shard by reading the mongo oplog, pulling changed documents from MongoDB and inserting the data to a SonarW server.

For each tracked collection, there are two tracking modes, Archive and Live.

  • Archive mode saves the history of changes to each document.
  • Live mode maintains the most recent document content.

SOT can be used in several of configurations. Any number of MongoDB replica sets, sharded clusters or standalone instances can be mirrored to any number of SonarW warehouses. Some examples of supported configuration (in increasing levels of complexity) are:

  • A single standalone MongoDB instance is mirrored to a single SonarW warehouse.
  • A single MongoDB replica set is mirrored to a single SonarW warehouse.
  • Separate MongoDB instances and replica sets are mirrored to a single SonarW warehouses.
  • A sharded cluster with replica sets configured per shard is mirrored to a SonarW warehouse.
  • Any of the above configurations where instead of the data flowing to a single SonarW warehouse, the data goes to multiple SonarW warehouses, to to support high query load.

SOT runs one main process and multiple agent processes. Each agent is responsible for mirroring one MongoDB replica set into one SonarW server. For example, if you are mirroring 3 replica sets and one sharded cluster with 4 shards into a single SonarW then you will have one main program and 7 agent. If you would like to have two SonarW instances mirroring the data then you will have one main program and 14 agents. All agents are managed by the single main program and by a single configuration file. SOT can only run on Linux and has only been tested with MongoDB instances running on Linux although all access to MongoDB is through the C++ MongoDB driver which is agnostic to which platform MongoDB is running on.

The namespaces to track, as well as the tracking modes, are defined in the SOT configuration file. Some configurations are common to all agents , and in addition there is one configuration section per agent. The number of agents is determined by the number of sections in the configuration file.

Installing SOT

Setup the repository links using the same instructions as installing SonarW.

To install SOT on Ubuntu, run:

sudo apt-get install sonarot

To install SOT on RedHat, run:

sudo yum install sonarot

SOT start up

When you run SOT, it will read the configuration file and launch the agents. If one of the agents will fail to complete the startup checks, an error will be logged and SOT will exit.

The configuration file is in:

/etc/sonar/oplog_tracker.conf

Before you start the service it is a good idea to check the configuration file and verify that all URIs are defined correctly (run as sonarw or root) using:

/usr/bin/sonaroplogtracker --verify=true

To start the service in Ubuntu run:

sudo service sonarot start

To start the service in RH run:

sudo systemctl start sonarot

If the service will not start, sudo to the sonarw user and run the main SOT program (/usr/bin/sonaroplogtracker) from the command line. Check the output for errors (e.g. syntax issue in the config file).

Agent startup checks

Upon startup, the agent will verify that:

  • It can connect to mongo and sonar.
  • For any archived collection, automatic document last-update timestamp generation is set in SonarW .
  • For any live-tracked collection, SonarW is configured to save document signatures. That is, ‘document_signatures_db’ in the sonarw config file includes the databases for which there are tracked (live) collections.

Collection tracking modes

There are two tracking modes that control how data is replicated from mongodb to sonarw.

Archive mode

This is an append-only mode, that accumulate all versions of each document. In this mode,

  • All versions of any document are saved. Even though normal queries will only see the latest version, SonarW support a query modifier that enables analyzing and retrieving a previous version of the document.
  • Delete actions are not tracked. For example, if a doc is inserted on Sunday, removed on Monday and re-inserted on Tuesday, you will have two documents.

Live (Tracking) mode

This mode keeps one copy of the most up-to-date version of each document. In this mode:

  • Insert, update and delete operation are tracked and performed in SonarW.
  • SonarW keeps a signature of each document. This is necessary in case an agent goes out of sync (see below). Without tracking the signatures, re-sync operation would cause the transfer and re-import of all the data on the shard to sonarW.

IMPORTANT: If you plan to use tracking (live) mode using SonarW you must enable the Document Signatures feature in the SonarW database. To do that, stop SonarW, and change the /etc/sonar/sonard.conf conf file to have document_signatures_db include the database where the collection lives. Then restart SonarW. See the SonarW documentation on starting and stopping SonarW and jow to set configuration options.

Firehose

In case there is a very large amount of data coming into SonarW, you can use a cluster of several hosts running as slaves which will pre-process the traffic coming from MongoDB before feeding it to a master SonarW host. That is, write operations are done in a distributed fashion using the slave. Each agent can be defined in the configuration file as master, slave or standalone. A set of one master and many slaves are bound together by having the same cluster name defined for them in the configuration file. Only slave agent are communicating with SonarW.

For example, if there are 15 shards,you might want to use 3 SonarW hosts to process their data and an additional SonarW to process queries. Define 15 agents as slave, each set of 5 agents configured to work with a different SonarW, and one master agent, pointing to the SonarW master.

There is some special configuration required on the master and slaves, and an SSH key needs to be created so that the master and slave can communicate. The collections to be tracked by SOT using firehose mode need to be defines, at the master, as ingested collection, and in the slaves as remote-ingested collection. See the Remote Ingestion section in the SonarW documentation on how to configure the collections to have these properties.

Agent Control

To control agents use the /usr/bin/sonaroplogtracker program. You need to be the sonarw user to use this program - this user is created by the installer.

For every agent you configure you will have a process. For example, if you configure three sections sot1, sot2 and sot3 and you look at ps, you will see three processed running called sot_sot1, sot_sot2 and sot_sot3.

An agent will normally track replica sets but can also track stand-alone servers. In all cases oplog.rs must be enabled in MongoDB so that it can be tracked by the SOT agent. Make sure that the oplog.rs collection is in the local database by setting replSet in mongod.conf, e.g.:

> rs.initiate()
rs0.PRIMARY> use local
rs0.PRIMARY> show collections
me
oplog.rs
startup_log
system.indexes
system.replset

First sync and recovery when getting out of sync

On a system that wasn’t previously tracked there may be a lot of data that needs to be synced and this data won’t have entries in the MongoDB oplog given that the oplog is a capped collection. When you start SOT for the first time, all the agents will be in an out-of-sync (OOS) state.

An agent can also reach an OOS state for different reasons, for example, if the network connection to the mongo server was down for a long time and the oplog.rs collection has rolled over. During the normal operation of an agent, it performs periodic checks, to validate that it is not OOS. If an agent recognize an OOS state, it will write an error message to the log file and exit.

Syncing an agent needs to be activated by the user, and the administrator can decide when to perform this operation since it can be heavy in terms of network load and read load on the mongod servers.

Checking for OOS agents

Calling the main program with the –list=true flag, will return a list of OOS agents. If no agent is in OOS state, an empty list will be returned. For example:

>/usr/bin/sonaroplogtracker --list=true

SOT1

means that the agent named ‘SOT1’ is OOS.

Resyncing

Triggering a resync is done by passing the main program a resync parameter. You can use a list of comma separated agent names, or ‘allout’ to resync all agents that are OOS.

For example, if we have 3 agents, named SOT1,SOT2,SOT3, and all three are OOS:

>/usr/bin/sonaroplogtracker --list=true

SOT1

SOT2

SOT3

We trigger a sync of two of the agents by passing their names:

>/usr/bin/sonaroplogtracker --resync=SOT1,SOT3

Note: The syncing operation may take a long time, depending on the amount of data that needs to be synched, checking the agent state will show OOS until re-sync is done. After sync finish, the agent will be back to normal operation.

After the sync SOT1 and SOT3 are back to normal, SOT2 is listed because we haven’t synched it:

>/usr/bin/sonaroplogtracker --list=true

SOT2

The second sync option is to sync all OOS agents, without specifying names, this can be useful when a lot of agents needs to be synched at once, for example for first-run sync, when all the agents are OOS.

Syntax:

sonaroplogtracker --resync=allout

Re-sync process

Each SOT agent does the following in a sync

  • Connects to mongod and saves the mongod oplog current time-stamp. On failure to get this TS, the agent will exit with an error. In this example, let’s call this time-stamp T0.
  • Tracks ongoing changes that occurred after T0 in the oplog.rs collection without applying them to sonarw (yet).
  • Goes over all documents in tracked collections and checks if sonar have the same data.
  • Inserts to sonarw any document that was not matched. On track mode, previous version of a doc (if exists) will be deleted first.
  • When the bulk sync finishes, the agent will apply the saved ongoing changes (T0 and on) from the oplog.rs.
  • Switches over to normal operation.

Periodic check for OOS condition

If an agent has been disconnected for very long and the oplog.rs has rolled past the last time that the agent has, the agent will need to do a re-sync. Each agent maintains the last timestamp in a file called ‘<agent-name>_initial_cutoff_time.txt. This file is typically located under /var/lib/SOT/ . The file holds the last timestamp that was successfully synched to sonar, as well as the agent PID.

If for example that file includes (1446579246, 467, 28549), then the determination on whether it needs to re-sync or not depends if the following find is empty:

db.oplog.rs.find({ts:{$lt:Timestamp(1446579246, 467)}},{ts:1})

Sharding

Tracking a sharded MongoDB system requires one SOT agent per shard. For example, to track a 3-shard system, each shard being a replica set, you will need three agents, each agent is reading from one RS mongod. All three agents will write to one SonarW. The agents track insert/update/delete operations for the complete shard system. A document may not available to an agent from the mongod (for example, data that was migrated from one shard to another) therefore documents are read through the mongos.

Configuration file: oplog_tracker.conf

The configuration file is composed of a shared part that is relevant to the SOT main process, and one or more sections per agent. Logfile and loglevel are global settings and the rest are per agent section.

logfile

The path of a log file that the main SOT process will write to, or ‘syslog’ to write to the OS syslog file.

default: /tmp/sotlog

loglevel

The log level that should be used corresponding to syslog severity levels:

  • 0 emergency
  • 1 alert
  • 2 critical
  • 3 error
  • 4 warning
  • 5 notice
  • 6 informational
  • 7 debug

For example, if loglevel=4 is set, events that falls into categories 0-4 will be logged.

default: 4

Agent part options

For each agent a separate section should be entered in the conf file; the agent name (affecting process name, agent log file name etc.) is specified in the section header:

[<agent-name>]

firehose

In firehose mode, is this agent a master, slave or standalone?

default:standalone

cluster

In firehose mode, the name of the cluster this agent belongs to.

default:<no cluster defined>

sync.interval

This is the interval in seconds, that the agent sync the actual data between mongo and sonar, based on the logged operations in the oplog.rs.

default: 300

read.lag

Minimal lag in seconds between the time a change is noted in the oplog to the time in which the document is read from MongoDB. Note that the time can be longer due to the interval parameter but this parameter ensures that the document is never read too close to the noted change. The main reason for this parameter is when you configure to read from a secondary in which case you might want to leave time so that it is more likely that the change makes it to the secondary.

read.secondaries

Set to true to make the agent read from the secondaries to reduce load on the primaries. Oplog tracking is still from the primary - but documents are retrieved from the secondaries.

defaut: true

mongo.oplog_uri

Connection to the mongod server for reading oplogs.

Format:

mongodb://[username:password@]host1[:port1][,host2[:port2],...[,hostN[:portN]]][/[database][?options]]

Example:

mongo.uri = mongodb://oplog.local:27050,oplog.local:27051,oplog.local:27052/?replicaSet=rs1

See Mongo URI documentation

The credentials in use must be mapped to an account that has the clusterAdmin and readAnyDatabase roles.

mongo.fetcher_uri

Connection to the mongo server for reading documents. This will be mongos for a sharded system and mongod for un-sharded system.

Format:

mongodb://[username:password@]host1[:port1][,host2[:port2],...[,hostN[:portN]]][/[database][?options]]

Example:

mongo.fetcher_uri = mongodb://oplog.local:27040

See Mongo URI documentation

The credentials in use must be mapped to an account that has the clusterAdmin and readAnyDatabase roles.

sonar.uri

Connection to the SonarW server.

Format:

mongodb://[username:password@]host1[:port1][,host2[:port2],...[,hostN[:portN]]][/[database][?options]]

Example:

sonar.uri = mongodb://yar.local:27117

See Mongo URI documentation

The credentials in use must be mapped to an account that has the readWriteAnyDatabase role.

mongo.namespaces

List of comma-separated namespaces (<db-name>.<collection-name>) on the Mongo server that SOT should track or archive.

Example:

mongo.namespaces = test.A:track,test.B:track,test.C:track,test.archive:archive

Notes:

  • While you can have spaces in collection names and they will be supported by the SOT, SOT will not allow you to pull from collections that have a comma in their name.
  • SOT does not drop or rename collections on SonarW.
  • Do not add spaces between the comma and the database name

Translate

By default collections from database FOO in mongo go to collections in database FOO in sonarw. You can cause a mapping to a different database, e.g. by specifying:

translate= test:new_test, customers:mycustomers

all collections being tracked in test will go to new_test and all collections tracked in customers will go to mycustomers. This is useful when you are tailing more then one mongo environments using a single sonarw and the tracked mongodbs may have databases by the same name.

Default file locations (Ubuntu)

configuration file:

/etc/sonar/oplog_tracker.conf

binary:

/usr/bin/sonaroplogtracker

When SOT runs, it checks for the following files, and create them if not exisiting

/var/lib/SOT/<agent-name>_initial_cutoff_time.txt

/var/lib/SOT/Sonaroplogtracker.pid

SOT Dashboard

If you are using JSON Studio alongside SonarW then you can install and use a dashboard add-on for monitoring your agents.

To install the dashboard all you need to do in use mongorestore to insert the dashboard metadata definitions to your sonar_log database (where the SOT statistics and state information resides).

The SOT dashboard is a part of the SOT installer. After the SOT installer runs a directory names sot_dashboard exists in /usr/share/sonarot. To install the dashboard import the data into your sonar_log database using mongorestore, for example using:

mongorestore -h <your host> --port <your port> -u <your user> -p <your pwd>
   --authenticationDatabase <e.g. admin> -d sonar_log /usr/share/sonarot/dashboard/sonar_log

The user with which you are restoring needs to have readWrite privileges to sonar_log.

At this point you can use JSON Studio and navigate your browser to the dashboard login page at https://<your studio host>:8443/dboard.xhtml and login as shown below. The dashboard is Named SOT and is published by lmrm__sot:

_images/login.jpg

The dashboard comprises of five tabs as shown below. The first two tabs display data for all SOT agents and the last three display more granular data for one agent at a time. You select which agent to display granular data by clicking on the Edit Bind Variables data and entering an agent name. Note that the value you enter is used to match agent names using a regular expression so you can display granular data for multiple agents together. When you first login to the dashboard there is no value for the agent regex. After you save the data once, the next time you login to the dashboard that saved value will be used.

The first tab, labeled Uptime, has three frames. The top frame lists the agents and gives their last configuration and status. The second frame shows a daily view of agent uptime for the past two weeks. Coloring is based on relative number of heartbeats missed from the agent (indicating availability) - therefore green means fewer heartbeats missed and therefore high availability. The last frame is similar but shows an hourly uptime heatmap for the last week.

_images/uptime.jpg

The second tab, labeled Throughput, has four heatmap displays showing relative operation throughput for all agents giving total operations processed, insert operations processed, update operations processed and delete operations processed. Counts are hourly and provided for a week. Heatmap colors are relative within that heatmap - i.e. showing relative load without regard for the other heatmaps on that page. Note that “red” is not bad - it merely denotes higher load than “green”.

_images/throughput.jpg

The third tab shows operation throughput for the selected agent (set in Edit Bind Variables). The chart is a line-with-zoom plotting hourly counts of operation throughput. You can zoom on a subset of time using the bottom control panel by dragging the sliding window.

_images/agent_throughput.jpg

The fourth tab includes two detailed reports per the selected agent. On the right are all messages received by the agent sorted in descending order of time. On the right are all startup messages along with the configuration values for that agent also sorted by time descending.

_images/agent_history.jpg

The fifth tab provides a candlestick chart (HLOC) that shows you the oplog window within MongoDB plus what of that window the SOT agent has read. Thick denotes what the SOT has read and thin denotes the oplog window.

_images/oplog.jpg