SonarW-specific Command Extensions

In addition to standard MongoDB commands, this page describes SonarW-specific commands and extensions.

serverStatus

Use db.serverStatus() for general information such as the SonarW version, pid, host etc. as well as information about each database in the instance. The “databases” field lists all databases and for each presents stats such as memory usage, blocks in the cache, etc. It will also show data about the network, cursors, locks, opcounters, number of connections and more. To run the command:

> db.serverStatus()

blockInfo

The blockInfo command takes a collection name and a column name and outputs details about the column’s blocks. For example:

>  db.runCommand({blockinfo: {collection: "claims", column: "person.first_name"}})
{
   "fields" : [
      {
         "collection_name" : "claims__ARRAY__person",
         "name" : "first_name",
         "full_name" : "person.first_name",
         "array" : false,
         "compound" : false,
         "reference" : false
      }
   ],
   "header" : {
      "collection name" : "claims__ARRAY__person",
      "collection id" : NumberLong(3),
      "column name" : "person.first_name",
      "column id" : NumberLong(50),
      "number of blocks" : NumberLong(103),
      "blocks in memory" : NumberLong(0),
      "gathered" : false
   },
   "ok" : 1
}

Use “verbose: true” to get detailed block data such as:

{
   "first_document" : NumberLong(293601000),
   "last_document" : NumberLong(294300050),
   "number_of_documents" : NumberLong(699050),
   "number_of_null_documents" : NumberLong(0),
   "number_of_undefined_documents" : NumberLong(0),
   "markers_size" : NumberLong(1),
   "document_type" : 2,
   "part_number" : NumberLong(12),
   "file_position_offset" : NumberLong(198455705),
   "block_size" : NumberLong(13981000),
   "created_time" : ISODate("2015-02-10T11:09:20Z"),
   "updated_time" : ISODate("1970-01-01T00:00:00Z"),
   "filter" : {
      "0" : [
         "",
         ""
      ],
      "1" : [
         "",
         ""
      ],
      "2" : [
         "",
         ""
      ],
      "3" : [
         "",
         ""
      ]
   }
}, ...

PartsInfo

Data lives in blocks and blocks live in part files in SonarW. To know how many parts there are per collection use the partsinfo command:

> db.runCommand({partsinfo:"rectest"})
{
   "collection_id" : NumberLong(2),
   "synthetic" : false,
   "parts" : [
      {
         "part" : 0,
         "begin" : NumberLong(0),
         "end" : NumberLong(10000),
         "size" : NumberLong(10000),
         "path" : "/home/ury/sonarw/data/test/2.collection/0",
         "deleted_count" : NumberLong(2000),
         "deleted_ratio" : 0.2
      },
      {
         "part" : 1,
         "begin" : NumberLong(10000),
         "end" : NumberLong(11000),
         "size" : NumberLong(1000),
         "path" : "/home/ury/sonarw/data/test/2.collection/1",
         "deleted_count" : NumberLong(200),
         "deleted_ratio" : 0.2
      }
   ],
   "references" : {
      "subdoc_array" : {
         "column_id" : NumberLong(4),
         "ref_collection_name" : "rectest__ARRAY__subdoc_array",
         "ref_collection_id" : NumberLong(6)
      }
   }
}

This command shows you how many parts exist, the doc ids in them and how much is deleted. It will also show all the columns in the collection that are object arrays (synthetic collections) and you can also run this command on those arrays.

PriorityStatus and Memory Pegging

SonarW will manage block mapping into memory automatically. On occasion you might want to hand-tune for even better performance. One of the options is to peg certain collections, columns or even partitions to the block cache.

To peg to memory, you set a priority by inserting a document into the db.system.cache, for example:

> db.system.cache.insert({
   cache_group:'a',
   collection_name:'claims',
   column_name: 'person.address',
   partition: '123',
   priority: 0})
WriteResult({ "nInserted" : 1 })

Each inserted document has the form:

{
   cache_group: 'Some user level name',
   collection_name: 'regex|.*',
   column_name: 'regex|.*',
   partition: '<document_id>',
   priority: [0-100] // 0 is best
}

To see the contents of the cache priorities run:

> db.runCommand({ PriorityStatus:1 })

Each database has it’s own priority cache. To see which blocks are currently in the block cache run the serverStatus command and look for the information within the database collection. For example, the following shows that 220 blocks in the cache are for the justOne collection and the t column, that 26 are for country and 10 for city:

"claims" : {
         "maxMemoryLimit" : NumberLong("8589934592"),
         "currentMemoryUsed" : NumberLong("3164660857"), "maxBlockCount" :
         NumberLong(256), "currentBlockCount" : NumberLong(256),
         "claims__ARRAY__person" : {
            "person.country" : {
               "count" : NumberLong(26),
               "size" : NumberLong(315038688),
               "highPriority" : false
            }, "person.city" : {
               "count" : NumberLong(10),
               "size" : NumberLong(120512316),
               "highPriority" : false
            }
         },
         "justOne" : {
            "t" : {
               "count" : NumberLong(220),
               "size" : NumberLong("2729109853"),
               "highPriority" : true
            }
         }
      }

Use caution when using these commands as this could have an adverse effect on other query types since the block cache becomes effectively smaller for general queries.

Cache Management

Data in SonarW is maintained as compressed columnar blocks. When you run queries the storage subsystem loads data blocks into a block cache subsystem. When queries run they first look at the cache to see if blocks are there and thus save I/O time.

You can define per database which collections/columns/partition and which blocks are priority blocks.

The system keeps priority blocks permanently in RAM (if sufficient space is available). This is defined in the collection system.cache:

{
   cache_group: 'Some user level name',
   collection_name: 'regex|.*',
   column_name: 'regex|.*',
   partition: [<value>,<value>]'
   partition_columns: ["col1", "col2"]
   priority: [0-100] // 0 is highest
}

The usage of cache_group is for describing/naming the rule, for example for updating the rule later. When caching only on collection/column, the partition and partition_columns fields may be omitted.

When using partitioning, sometimes it is useful to cache a database block when certain values of a column appear. List each of the values in the partition field if you want to ensure that some partitions have a higher priority than others and more likely to remain in memory. Note that the collection must be partitioned for this to be possible.

For example: partition: [123, 788, 999]

The type of the partition value may be one of int/long/double/string/OID/date.

You can also list the columns to be cached when this value is detected (in the column identified by the column_name field).

For example::
partition_columns: [“price”, “account_number”]

In this version of SonarW only priority 0 affects the system - i.e. blocks are either pegged to memory or behave normally.

The cache has a size limit (as a hard limit) and a limit on the number of blocks (as a soft limit).

You can view the current cache settings of a database using db.system.cache.find() or by using the db.runCommand({priority_status: 1}) command.

Note: Changes to the system.cache configuration parameter only take affect after a server restart.

Memory Stats

MemoryStats is an internal command used for troubleshooting and tuning that returns statistics for SonarW memory usage.

Example:

> db.runCommand({memorystats:1})
{
"cursors" : {
        "number_of_cursors" : NumberLong(5),
        "cursor" : "78.984375 KB"
},
"operations" : {
        "13" : "1.445312 KB"
},
"databases" : {
        "admin" : {
                "cache" : "234.000000 B",
                "number_of_blocks" : NumberLong(19),
                "blocks" : "2.226562 KB",
                "size_per_block" : "120.000000 B",
                "index" : "320.000000 B",
                "metadata" : "23.957031 KB",
                "total" : "26.724609 KB"
        },
        "test" : {
                "cache" : "1.520692 GB",
                "number_of_blocks" : NumberLong(20149),
                "blocks" : "2.305870 MB",
                "size_per_block" : "120.000000 B",
                "index" : "320.000000 B",
                "metadata" : "4.418102 MB",
                "total" : "1.527259 GB"
        }
},
"Summary" : {
        "blocks" : "2.308044 MB",
        "number_of_blocks" : NumberLong(20168),
        "size_per_block" : "120.000000 B",
        "cache" : "1.520692 GB",
        "cursor" : "78.984375 KB",
        "operations" : "1.445312 KB",
        "index" : "640.000000 B",
        "metadata" : "4.441498 MB",
        "large_buffer" : "18.000000 MB",
        "total" : "1.544939 GB"
},
"ok" : 1
}

The above output shows five open cursors that take 79KB of RAM, 1.5GB being used by the SonarW block cache, most of it for the ‘test’ DB. The cursors and operations statistics are global, other parameters are listed per DB. In the example, test DB has 20149 total blocks on disk. For each of these blocks we keep 120 Bytes in memory, so a total of 2.3MB in memory is used for the blocks.

Statistics

Use the cacheStats command to get valuable information on cache runtime usage.

Example:

> db.runCommand({cacheStats: 1})
{
"per priority" : {
        "0" : {
                "label" : "Vancouver branch speedup",
                "collection name" : "orders",
                "column name" : "branch",
                "hit" : NumberLong(0),
                "miss" : NumberLong(2),
                "replacement" : NumberLong(0)
        },
        "total" : {
                "hit" : NumberLong(0),
                "miss" : NumberLong(2),
                "replacement" : NumberLong(0)
        }
},
"no priority" : {
        "hit" : NumberLong(14),
        "miss" : NumberLong(36),
        "replacement" : NumberLong(0)
},
"database" : {
        "hit" : NumberLong(14),
        "miss" : NumberLong(38),
        "replacement" : NumberLong(0)
},
"ok" : 1
}

Clearing the Cache

To clear the cache on runtime use the cacheReset command. For example:

> db.runCommand({cacheReset: 1})
{ "ok" : 1 }
> db.runCommand({cacheStats: 1})
{
"per priority" : {
        "0" : {
                "label" : "Vancouver branch speedup",
                "collection name" : "orders",
                "column name" : "branch",
                "hit" : NumberLong(0),
                "miss" : NumberLong(0),
                "replacement" : NumberLong(0)
        },
        "total" : {
                "hit" : NumberLong(0),
                "miss" : NumberLong(0),
                "replacement" : NumberLong(0)
        }
},
"no priority" : {
        "hit" : NumberLong(0),
        "miss" : NumberLong(14),
        "replacement" : NumberLong(0)
},
"database" : {
        "hit" : NumberLong(0),
        "miss" : NumberLong(14),
        "replacement" : NumberLong(0)
},
"ok" : 1
}

debugQueries

This admin command displays the queries and/or query results on the server. You can specify printing debugging info for any or all of ‘msg’, ‘update’, ‘insert’, ‘query’, ‘getmore’, ‘delete’, or ‘killcursor’ requests by issuing a debugQueries command that specifies one or more of those options, or using the word ‘all’ to indicate that you want to set all of them at once. Setting to a value of 1 means that inputs will be printed, setting to a value of 2 means that outputs will be printed, setting to a value of 3 means that both inputs and outputs will be printed, and setting to 0 turns off printing.

Example:

db.runCommand({debugQueries:{all:3}})

would cause all input and output to the server from each client to be displayed on server output for debugging purposes.

Syntax:

db.runCommand({debugQueries: debug-level (a number 0-3) })

0 will stop debugging output. When the level is 3, the maximum debug information is printed. For example, at level 3, SonarW will explain the range of documents it scans during a find(), and within each range, what blocks were read from disk, and which did not.

Information is written to the standard sonard.log file.

You need to have dbAdmin privileges to turn debugging on.

Stop Collection

The stopCollection command stops the specified collection and adds the collection to the list of stopped collections. While a collection is stopped, it is unavailable for further queries and can be seen as stopped in the list of collections. For example:

> show collections
bonds
> db.runCommand ({ stopCollection : "bonds" })
{ "ok" : true }
> show collections
bonds (stopped)

Start Collection

The startCollection command initiates and loads data from the specified collection either from the list of stopped collections or as an entirely new collection if its data is already copied to the database directory. The collection should exist along with its metadata in the selected database. Once the collection is started, it becomes available for queries. For example:

> show collections
bonds (stopped)
> db.runCommand ( { startCollection : "bonds" } )
{ "ok" : true }
> show collections
bonds

> show collections
> db.runCommand ( { startCollection : "bonds" } )
{ "ok" : true }
> show collections
bonds

Enabling GPU-Based Execution

The setGPU command takes a boolean or numeric value and enables or disables the use of GPU for subsequent operations (a compatible GPU must be installed on the SonarW host):

 > db.runCommand ( {  setGPU : true } )
{ "gpu" : true, "ok" : 1 }

Union Command

Use the aggregation_union database command to “stitch together” results from two or more aggregation pipelines into a single result. This command runs multiple pipelines and then aggregates the results into a single stream. For example:

db.runCommand(
   {aggregation_union:[
      {collection:"harolds_courses",pipeline:[{$project:{course_id:1}}]},
      {collection:"maudes_courses",pipeline:[{$project:{course_id:1}}]}],
   out:"harold_and_maudes_courses",
   distinct:true} )

This command runs aggregations on the specified collections, and accrues the combined distinct results into the collection ‘harold_and_maudes_courses’.

Note that a distinct value of true will be much slower than a distinct value of false. Distinct can be true, false or a number; a number means only a certain number of distinct documents will be in the output. The _id is not considered when assessing the distinct.

Get and Set Parameters

The getParameter commands returns the currently set value of SonarW parameters. These parameters are read from the sonard.conf file at the start of SonarW. If the parameter is not set in the config file, the default value for the parameter is used. Using “*” as the name of the parameter prints the complete list of parameters and their currently set values. For example:

> db.runCommand ( { getParameter : "authSchemaVersion" } )
{ "ok" : 1, "value" : 3 }

Most parameter changes require a restart of the server and setParameter will usually return an error. For those parameters that can be changed at runtime no error is produced by setParameter but the change will not persist a server restart and should also be changed in sonard.conf.

Query Priorities

SonarW assigns three levels of priority to all the queries from a connection, ranging between 1 as the highest and 3 as the lowest priority. By default, a priority of 2 is assigned for all connections. The setPriority command can be used to modify the priority for the current connection - affecting all queries issued by that connection. It is recommended to only modify this priority only in special cases. For example:

> db.runCommand({setPriority:1})
{ "priority" : 1, "was" : 2, "ok" : 1 }

To get the current priority for the connection:

> db.runCommand({ getPriority : 0 })
{ "priority" : 2, "ok" : 1 }

You need have a managePriorities role to use setPriority. The dbAdmin role has this privilege as well. The managePriorities role is defined as:

> db.createRole({role: "managePriorities", privileges: [{resource: {db: 'admin',collection: ''},actions: ["setPriority"]} ], roles: [] } )
{
    "role" : "managePriorities",
    "privileges" : [
        {
            "resource" : {
                "db" : "admin",
                "collection" : ""
            },
            "actions" : [
                "setPriority"
            ]
        }
    ],
    "roles" : [ ]
}

Pipeline Optimizer

By default the optimizer is used on all aggregation queries to improve performance (unless specified to be off by the Optimizer flag or in the configuration file). If you wish to see how this affects your pipeline you are running, you can use the optimize run command. The format is:

db.runCommand({optimize:{pipeline:<your pipeline>}})

As an example:

db.runCommand({optimize:{pipeline:[{$project:{rectangle_area:{$multiply:['$base','$height']}}},{$project:{rectangle_area:1,triangle_area:{$divide:['$rectangle_area',2]}}}]}})
{
        "ok" : 1,
        "result" : {
        "pipeline" : [
        {
                "$project" : {
                "rectangle_area" : {
                        "$multiply" : ["$base","$height"]
                },
                "triangle_area" : {
                        "$divide" : [{"$multiply" : ["$base","$height"]},2]
                }
                }
        }
        ]
        }
}

In this example the original pipeline had two projects in a row, meaning we had to materialize documents twice. The optimizer combined the two pipelines into a single more efficient stage using replacement.

PID for the sonard Process

This command verifies the current sonar instance and shows pid of current process

Syntax:

db.runCommand({sonar: 1})

Returns:

isSonar: <boolean>
pid: current process pid

Generate data for debug purposes

It is sometimes required to get both the data as well as SonarW information in order to be able to review and investigate issues. SonarW has a special command that will extract the relevant information into a compressed file that can then be used for investigating and reviewing.

The command is “dump_collection”, it is an admin command and is executed through the “db.runCommand” mechanism.

Syntax::
db.runCommand( { dump_collection: { collection: <collection-name>, output: <output-filename>, allow_big_collections: <boolean> } })

collection-name - (Mandatory) the name of the collection to be dumped output-filename - (Mandatory) full path name of the target file Allow_big_collections- (Optional - default is ‘false’) to allow dump of collections > 1GB

Example:

db.runCommand({ dump_collection: { collection: "fact", output: "/tmp/fact.tbz2" } })

Note: In order to complete the process the output file should be located in a directory where the ‘sonarw’ user (on the SonarW server) has write permissions.

If the collection is bigger than 1GB (the size value defined by the configuration parameter ‘dump_collection_limit_size’ and is set to 1GB by default), the command will not create the file. This behavior can be overridden by setting the boolean parameter allow_big_collections to true.