Views in SonarW

SonarW supports two types of views. Read-only views supported by MongoDB as describe in https://docs.mongodb.com/manual/core/views/ are supported by SonarW as part of compatibility. In addition, Javascript based views are supported as described below.

With SonarW you can create views on collections. Like views in an RDBMS, a view is a “virtual collection” that is computed from an underlying collection (or another view - i.e. you can have a view that is based on a collection and then have a second view that is based on that view). You can even have views that are based on a join between two collections (since SonarW supports joins).

Views are based on Javascript functions that you write and that usually create a cursor. You iterate over a view using a cursor in the same way that you iterate over a collection using a cursor. The Javascript method you write that defines the view needs to yield documents and these are the documents that “appear” on the cursor.

A view that is based on a collection is normally a transient object - i.e. you open a cursor on it and receive computed documents back.

Note - in all the examples in this section be aware that if you copy paste code into the shell the view code needs to be entered as one line or as strings with a ‘+’ concatenating the different lines. The mongo shell does not allow multi-line strings.

Creating and Using Views

Create a view by inserting the Javascript source code to the system.views collection. The function should follow a generator pattern and yield documents to be used by a view’s cursor.

For example, if you have a collection that has documents of the form:

{
  "_id": {
    "$oid": "53c94abbb60d47606c5665c6"
    },
    "Year": 1987,
    "Month": 10,
    "DayofMonth": 14,
    "DayOfWeek": 3,
    "DepTime": 741,
    "CRSDepTime": 730,
    "ArrTime": 912,
    "CRSArrTime": 849,
    "UniqueCarrier": "PS",
    "FlightNum": 1451,
    "TailNum": 0,
    "ActualElapsedTime": 91,
    "CRSElapsedTime": 79,
    "AirTime": 0,
    "ArrDelay": 23,
    "DepDelay": 11,
    "Origin": "SAN",
    "Dest": "SFO",
    "Distance": 447,
    "TaxiIn": 0,
    "TaxiOut": 0,
    "Cancelled": 0,
    "CancellationCode": 0,
    "Diverted": 0,
    "CarrierDelay": 0,
    "WeatherDelay": 0,
    "NASDelay": 0,
    "SecurityDelay": 0,
    "LateAircraftDelay": 0
  }

you can create a view that shows only those documents where the difference between ActualElapsedTime and CRSElapsedTime is larger than a parameter by inserting:

db.system.views.insert (
{_id:'elapsed_time_diff',
 code:'function *(arg) {
    var d = arg["diff"];
    var cursor = db.flights.find();
    while(cursor.hasNext()) {
        result = cursor.next();
        if (result.ActualElapsedTime - result.CRSElapsedTime > d)
            yield result;
    }
   }'
 })

At this point you have a new view that you can use as you would use a standard collection. E.g. to iterate and get all flights with a difference between actual elapased time and CRS elapsed time more than 11 run:

db.elapsed_time_diff.find({$params:{diff: 11}})

Everything you can do on a collection you can do on a view. For example, to add projections to the find:

db.elapsed_time_diff.find({$params:{diff: 11}},{FlightNum:1, TailNum:1, ActualElapsedTime:1, CRSElapsedTime:1})

To get a count of how many flights there are with a difference over 11 do:

db.elapsed_time_diff.find({$params:{diff: 11}}).count()

or:

db.elapsed_time_diff.count({$params:{diff: 11}})

To get all the distinct flight numbers of this view:

db.elapsed_time_diff.distinct("FlightNum", {$params:{diff: 11}})

View performance depends on the complexity of the Javascript code, the machine on which the database is running on, and on the underlying documents. In terms of order-of-magnitude you can expect 10K/sec-50K/sec documents to be produced by a view on a small server with 2 quad-core CPUs. In any case, aggregation pipelines on views will never be nearly as fast as on native columnar collections since all the execution optimizations that are so much a part of SonarW cannot be used.

Views as collections

Views can be made to look very similar to collections. For example, when you do show collections in the shell you will see your views as well. When you do db.<your view name>.stats() you will get an indicator that this is a collection.

By far the most important thing in making a view look as close to a collection as possible is writing your Javascript code in a way that will pass-through any filter conditions (query). There are two ways to do that - explicitly or implicitly and your users will always prefer the implicit method.

When you make a call such as:

db.elapsed_time_diff.find({$params:{diff: 11, FlightNum: 1234}})

there is a difference between the diff and the FlightNum. The diff is used to compute the view and the FligthNum is something you want to pass to the underlying cursor. This call is an example of the explicit method and in your code you would write something like:

db.system.views.insert (
{_id:'elapsed_time_diff',
 code:'function *(arg) {
    var d = arg["diff"];
    var fn = arg["FlightNum"];
    var cursor = db.flights.find({"FlightNum": fn});
    while(cursor.hasNext()) {
        result = cursor.next();
        if (result.ActualElapsedTime - result.CRSElapsedTime > d)
            yield result;
    }
   }'
 })

This is both ugly and, worse, if the underlying collection has 30 fields you would need to pass this down to all 30. Instead, use the implicit $query subdocument [1]. For example, if you write your code so:

db.system.views.insert (
{_id:'elapsed_time_diff',
 code:'function *(arg) {
    var d = arg["diff"];
    var q = arg["$query"];
    var cursor = db.flights.find(q);
    while(cursor.hasNext()) {
        result = cursor.next();
        if (result.ActualElapsedTime - result.CRSElapsedTime > d)
            yield result;
    }
   }'
 })

Then you can make the call to the view look like:

db.elapsed_time_diff.find({FlightNum: 1234, $params:{diff: 11}})

More importantly, any additional filter you add will be passed right through to the underlying cursor, so you can also call:

db.elapsed_time_diff.find({FlightNum: 1234,  Carrier: 'DL', Delay: {$gt: 10}, $params:{diff: 11}})

In this case you are still distinguishing between the pass-through parameters and the view parameter(s).

You can even go one step further and blur that line completely and make a call that looks like:

db.elapsed_time_diff.find({FlightNum: 1234,  Carrier: 'DL', Delay: {$gt: 10}, diff: 11})

But in this case you must make sure to remove the diff from the pass-through query or the cursor might be empty or you’ll get the wrong results. For example your code might look like:

db.system.views.insert (
{_id:'elapsed_time_diff',
 code:'function *(arg) {
    var q = arg["$query"];
    var d = q["diff"];
    delete q["diff"];
    var cursor = db.flights.find(q);
    while(cursor.hasNext()) {
        result = cursor.next();
        if (result.ActualElapsedTime - result.CRSElapsedTime > d)
            yield result;
    }
   }'
 })

The same is true for projection - i.e. your Javascript code will always include a $projection subdocument which you can use to pass-through to the underlying cursor(s).

Example - Implementing a JOIN using a view

SonarW natively supports JOINs so this example is artificial and merely used to show usage of views.

Assume two collections - flights1 and carriers with respective document structures of:

> db.flights1.findOne()
   "ActualElapsedTime" : 256,
   "AirTime" : 177,
   "ArrDelay" : 52,
   "ArrTime" : 2122,
   "CRSArrTime" : 2030,
   "CRSDepTime" : 1500,
   "CRSElapsedTime" : 210,
   "CancellationCode" : "",
   "Cancelled" : 0,
   "CarrierDelay" : 6,
   "DayOfWeek" : 4,
   "DayofMonth" : 3,
   "DepDelay" : 6,
   "DepTime" : 1506,
   "Dest" : "MDW",
   "Distance" : 1521,
   "Diverted" : 0,
   "FlightNum" : 74,
   "LateAircraftDelay" : 0,
   "Month" : 1,
   "NASDelay" : 46,
   "Origin" : "LAS",
   "SecurityDelay" : 0,
   "TailNum" : "N764SW",
   "TaxiIn" : 62,
   "TaxiOut" : 17,
   "UniqueCarrier" : "WN",
   "WeatherDelay" : 0,
   "Year" : 2008,
   "_id" : ObjectId("5408ddd7cdcd5e43000000dc")
}
> db.carriers.findOne()
{
   "Code" : "02Q",
   "Description" : "Titan Airways",
   "_id" : ObjectId("534208031c479f6149b0ed59")
}

Using the following view definition:

db.system.views.insert(
   {_id: 'view_join',
   code: 'function *(arg) {
      var cursor = db.flights.find();
      while (cursor.hasNext()) {
         result=cursor.next();
         var car = db.carriers.findOne({"Code": result["UniqueCarrier"]},{"Description":1});
         result["Description"] = car["Description"];
         yield result;
      }
   }'})

we can get the carrier description to enrich the data that is in the flights collection:

> db.view_join.findOne()
{
   "ActualElapsedTime" : 128,
   "AirTime" : 116,
   "ArrDelay" : -14,
   "ArrTime" : 2211,
   "CRSArrTime" : 2225,
   "CRSDepTime" : 1955,
   "CRSElapsedTime" : 150,
   "CancellationCode" : "",
   "Cancelled" : 0,
   "CarrierDelay" : 0,
   "DayOfWeek" : 4,
   "DayofMonth" : 3,
   "DepDelay" : 8,
   "DepTime" : 2003,
   "Dest" : "TPA",
   "Distance" : 810,
   "Diverted" : 0,
   "FlightNum" : 335,
   "LateAircraftDelay" : 0,
   "Month" : 1,
   "NASDelay" : 0,
   "Origin" : "IAD",
   "SecurityDelay" : 0,
   "TailNum" : "N712SW",
   "TaxiIn" : 4,
   "TaxiOut" : 8,
   "UniqueCarrier" : "WN",
   "WeatherDelay" : 0,
   "Year" : 2008,
   "_id" : ObjectId("54466e8127433a490000002b"),
   "Description" : "Southwest Airlines Co."
}

Example - Fine-grained access control using views

Views are often used to provide incerased security. For example, suppose that you have a collection that has some sensitive data and you want some users to be able to view this data while others should not. You can use a view to provide an encapsulation of the data and require users to access data through the view. Within the view you can then determine which documents to return based on the security privileges of the currently calling user. This section shows a simple such implementation and includes the following steps:

  • Define a view that checks the current user and determines what fields of the documents to return for that user. In the real-world you might do this with role or some other credentials mechanism - here we will just use two users.
  • Define the appropriate run-as privileges for the view.
  • Define a custom role that gives acecss to the view only (so that users cannot access the data directly from the collection.

Let’s look at an example. Suppose we have a collection called customers with documents that look like:

> db.customers.findOne()
{
   "_id" : ObjectId("536d11854b3db2427e79d6a5"),
   "addressLine1" : "54, rue Royale",
   "city" : "Nantes",
   "contactFirstName" : "Carine ",
   "contactLastName" : "Schmitt",
   "country" : "France",
   "creditLimit" : 21000,
   "customerName" : "Atelier graphique",
   "customerNumber" : 103,
   "phone" : "40.32.2555",
   "postalCode" : 44000,
   "salesRepEmployeeNumber" : 1370
}

And suppose we want to create a role called “sales” that can only view the contactFirstName, contactLastName, phone, country, city and postalCode fields. Furthermore, each seller should only be able to see their customers and the mapping between the logged-in user and the employee number is in another collection in the database called sellers with documents such as:

> db.sellers.findOne()
{
   "_id" : ObjectId("5453e761dd39c8419f791244"),
   "user" : "seller1",
   "empNo" : 1370
}

First create a custom role so that you will be able to create users that can read the view but not the underlying collection:

> use admin
switched to db admin
> db.createRole({role: "seller", privileges: [{resource: {db: "flights", collection: "customersView"}, actions: ["find"]}], roles: []})
{
   "role" : "seller",
   "privileges" : [
      {
         "resource" : {
            "db" : "flights",
            "collection" : "customersView"
         },
         "actions" : [
            "find"
         ]
      }
   ],
   "roles" : [ ]
}

And make sure that the seller1 user has this role:

> db.createUser({user: "seller1", pwd: "seller1", roles: ["seller"]})
Successfully added user: { "user" : "seller1", "roles" : [ "seller" ] }

Now test to make sure your authorizations are correct. Insert a document into a collection by the name of customersView:

> db.customersView.insert({a:1})
WriteResult({ "nInserted" : 1 })

Now exit and login using the seller1 user:

$ ./mongo qa1.jsonar.com:27117/admin -u seller1 -p seller1
MongoDB shell version: 2.6.1
connecting to: bigdell:27117/admin
Server has startup warnings:
SonarW Server Started.
> use flights
switched to db flights

And make sure you can read from customersView but not from customers:

> db.customersView.find()
{ "_id" : ObjectId("5453eb5bdd39c8419f791245"), "a" : 1 }
> db.customers.find()
error: {
   "$err" : "Authorization failure",
   "errmsg" : "Authorization failure",
   "code" : -1,
   "ok" : 0
}

Now re-login using your full-privileged user and drop the dummy collection:

> db.customersView.drop()
true

Now let’s define the view:

db.system.views.insert({_id: "customersView", code: 'function *(arg)
{
   var q = arg["$query"];
   var currentUser = db.$cmd.findOne({connectionStatus: 1})["authInfo"]["authenticatedUsers"][0]["user"];
   var empNo = db.sellers.findOne({"user": currentUser})["empNo"];
   q["salesRepEmployeeNumber"] = empNo;
   var cursor = db.customers.find(q, {"contactFirstName":1, "contactLastName":1, "phone":1, "country":1, "postalCode":1});
   while (cursor.hasNext()) {
      yield cursor.next();
   }
}',
runAs: "ron2"
})

Then when logging in as seller 1:

> db.customersView.findOne()
{
   "_id" : ObjectId("536d11854b3db2427e79d6a5"),
   "contactFirstName" : "Carine ",
   "contactLastName" : "Schmitt",
   "country" : "France",
   "phone" : "40.32.2555",
   "postalCode" : 44000
}

> db.customers.findOne()
2014-10-31T17:04:15.213-0400 error: {
   "$err" : "Authorization failure",
   "errmsg" : "Authorization failure",
   "code" : -1,
   "ok" : 0
} at src/mongo/shell/query.js:131

Two important things to note about the view definition:

  • The view is defined using runAs. This is important because the view accessed the customers collection. If runAs is not used then the view runs under the caller’s privileges and access to customers is denied. Using runAs: “admin” ensures that access to the underlying collection is allowed - but only from within the view. IMPORTANT: The runAs user MUST be defined as a user in the database - it is not enough for the user to be defined in the admin database.
  • The first part of the view computes the currently logged-in user to add the additional filter that ensures that only the appropriate contacts are shown to seller1.

Example - INFORMATION_SCHEMA Using Views

SonarW has a metadata collection called lmrm__medata2 where every field is described. Each such document has a structure that describes a field within a collection and what type it is, for example:

>  db.lmrm__metadata2.findOne()
{
   "_id" : ObjectId("5540241814d4f87e0000002a"),
   "field_name" : "KRB",
   "docs_sampled" : NumberLong(14),
   "types" : {
      "<type 'bool'>" : 1
   },
   "name" : "logins",
   "last_updated" : ISODate("2015-04-29T00:21:44Z")
}

In this case the logins collection has a field (column) called KRB which always has a boolean value.

SonarSQL, the SQL-enabling layer of the jSonar Analytics Platform, lets you access SonarW using mysql’s SQL language. You can do DESCRIBE per table to see it’s structure (and under the covers, SonarSQL looks at lmrm__metadata2).

But mysql (and other databases) also have an INFORMATION_SCHEMA database that is used for metadata. As an example, the TABLES and COLUMNS tables in https://dev.mysql.com/doc/refman/5.0/en/information-schema.html describe the table and columns. In this example a simple view is created to mimic the important aspects of these tables using lmrm__metadata2 so that you can make queries that make it seem like these tables also exist in SonarW.

Two simplistic views that simulate these tables are shown below:

db.system.views.insert ({_id:'information_schema.tables', code:'function *(arg)
   {
      var q = arg["$query"];
      q2={};
      q2["table_name"]=q.table_name;
      var cursor = db.lmrm__metadata2.find(q2);
      while(cursor.hasNext()) {
         result = cursor.next();
         var res2 = {};
         res2["table_name"]=result.name;
         res2["table_comment"]="";
         res2["table_type"]="BASE TABLE";
         res2["table_schema"]=db._name;
         yield res2;
      }
   }'
})


db.system.views.insert ({_id:'information_schema.columns',code:'function *(arg)
   {
      var q = arg["$query"];
      q2={};
      q2["name"]=q.table_name;
      q2["field_name"]=q.column_name;
      var cursor = db.lmrm__metadata2.find(q2);
      var ordinal = 1;
      while(cursor.hasNext()) {
         result = cursor.next();
         var res2 = {};
         res2["table_name"]=result.name;
         res2["column_name"]=result.field_name;
         res2["column_comment"]="";
         res2["table_schema"]=db._name;
         res2["ordinal_position"]=ordinal++;
         res2["is_nullable"]=true;
         res2["column_default"]=null;
         yield res2;
      }
   }'
})

These views do not implement the full functionality of the information schema tables; they are meant to exemplify a usage of views. Once these views are defined you can run queries of the form:

> db.information_schema.columns.find({column_name: /a/}).pretty()
{
   "table_name" : "logins",
   "column_name" : "acceptAnyCert",
   "column_comment" : "",
   "table_schema" : "test",
   "ordinal_position" : 1,
   "is_nullable" : true,
   "column_default" : null
}
{
   "table_name" : "logins",
   "column_name" : "app",
   "column_comment" : "",
   "table_schema" : "test",
   "ordinal_position" : 2,
   "is_nullable" : true,
   "column_default" : null
}
...

> db.information_schema.tables.distinct("table_name")
[
   "t5",
   "t30",
   "q1",
   "fx",
   "tree",
   "t2",
   "lmrm__audit_trail",
   "t1",
   "t4",
   "t3"
]

> db.information_schema.columns.aggregate({$group: {_id: "$table_name", number_of_columns: {$sum: 1}}})
{ "_id" : "t5", "number_of_columns" : 3 }
{ "_id" : "t4", "number_of_columns" : 5 }
{ "_id" : "t30", "number_of_columns" : 3 }
{ "_id" : "t3", "number_of_columns" : 4 }
{ "_id" : "q1", "number_of_columns" : 1 }
{ "_id" : "fx", "number_of_columns" : 3 }
{ "_id" : "tree", "number_of_columns" : 7 }
{ "_id" : "t2", "number_of_columns" : 3 }
{ "_id" : "lmrm__audit_trail", "number_of_columns" : 17 }
{ "_id" : "t1", "number_of_columns" : 5 }

Best Practices in Writing Views

  1. Always have defaults for parameters. For example, rather than use:

    db.system.views.insert (
    {_id:'elapsed_time_diff',
     code:'function *(arg) {
        var d = arg["diff"];
        var cursor = db.flights.find();
        while(cursor.hasNext()) {
            result = cursor.next();
            if (result.ActualElapsedTime - result.CRSElapsedTime > d)
                yield result;
        }
       }'
     })
    

prefer to use:

db.system.views.insert (
{_id:'elapsed_time_diff',
 code:'function *(arg) {
    var d = arg["diff"];
    if (d == null) d = 0;
    var cursor = db.flights.find();
    while(cursor.hasNext()) {
        result = cursor.next();
        if (result.ActualElapsedTime - result.CRSElapsedTime > d)
            yield result;
    }
   }'
 })

This way if someone simply does a db.elapsed_time_diff.find() results will come back.

2. Always make views look as similar to collections for queries - i.e. use implicit pass-through for conditions and projections as described above.

3. Always pass-through as much to the underlying cursor. For example, even though views will do projections for you, if you use the $projection variable in your underlying cursor then SobarDB will have far fewer columns to build and therefore your view will operate faster (when your caller projects certain fields).

Debugging and Troubleshooting

SonarW has a general debugging facility for all constructs that use Javascript code (such as views, streams, stored procedures etc.). If you use the print function from within your code, anything passed into the print function is written to a collection under your name in the lmrm__debug database. For example, assume you are logged in as the user qa1 and you print a document such as:

result = cursor.next();
print(result);

then in the lmrm__debug database you will gave a collection named qa1 with a new document such as:

> db.qa1.find()
{ "_id" : ObjectId("541a0dde8176d758000000f5"),
"log" : "{ActualElapsedTime:128,AirTime:116,ArrDelay:-14,ArrTime:2211,CRSArrTime:2225,CRSDepTime:1955,
CRSElapsedTime:150,CancellationCode:\"\",Cancelled:0,CarrierDelay:0,DayOfWeek:4,DayofMonth:3,DepDelay:8,
DepTime:2003,Dest:\"TPA\",Distance:810,Diverted:0,FlightNum:335,LateAircraftDelay:0,Month:1,NASDelay:0,
Origin:\"IAD\",SecurityDelay:0,TailNum:\"N712SW\",TaxiIn:4,TaxiOut:8,UniqueCarrier:\"WN\",
WeatherDelay:0,Year:2008,_id:ObjectId('5408ddd7cdcd5e430000003a')}" }

You can print whatever you want - whether static or based on variables. If you do not have authentication and not logged in as a user then the collection name will be none i.e.:

> db.none.find()
{ "_id" : ObjectId("541a0dde8176d758000000f5"),
"log" : "{ActualElapsedTime:128,AirTime:116,ArrDelay:-14,ArrTime:2211,CRSArrTime:2225,CRSDepTime:1955,
CRSElapsedTime:150,CancellationCode:\"\",Cancelled:0,CarrierDelay:0,DayOfWeek:4,DayofMonth:3,DepDelay:8,
DepTime:2003,Dest:\"TPA\",Distance:810,Diverted:0,FlightNum:335,LateAircraftDelay:0,Month:1,NASDelay:0,
Origin:\"IAD\",SecurityDelay:0,TailNum:\"N712SW\",TaxiIn:4,TaxiOut:8,UniqueCarrier:\"WN\",
WeatherDelay:0,Year:2008,_id:ObjectId('5408ddd7cdcd5e430000003a')}" }

In addition, you can explicitly refer to these collection and do whatever you require, not through the print function. If you use the collection called lmrm__debug from within your code you are in effect using this collection in the other database. For example, if your code does:

lmrm__debug.insert({a:1});

then your collection in the lmrm__debug database will have:

> db.qa1.find() { "_id" : ObjectId("541a0dde8176d758000000f5"), "log" :
"{ActualElapsedTime:128,AirTime:116,ArrDelay:-14,ArrTime:2211,CRSArrTime:2225,CRSDepTime:1955,
CRSElapsedTime:150,CancellationCode:\"\",Cancelled:0,CarrierDelay:0,DayOfWeek:4,DayofMonth:3,DepDelay:8,
DepTime:2003,Dest:\"TPA\",Distance:810,Diverted:0,FlightNum:335,LateAircraftDelay:0,Month:1,NASDelay:0,
Origin:\"IAD\",SecurityDelay:0,TailNum:\"N712SW\",TaxiIn:4,TaxiOut:8,UniqueCarrier:\"WN\",
WeatherDelay:0,Year:2008,_id:ObjectId('5408ddd7cdcd5e430000003a')}" }
{ "_id" : ObjectId("541a0dde8176d758000000f6"), "a" : 1 }

These per-user collections have a (configurable) cap on them in order not to fill up the disk too much and if you exceed this cap the collection is dropped.

These collections are “born” with privileges assigned to the user you are logged in so that only you can see the debug data. But these are standard collections so your DBA can assign additional privileges to the collection.

Footnotes

[1]

When a query is decorated with a comment or a sort, the implicit query subdocument will be further nested with another query or $query key. Use something like the following snippet at the top of the view function to handle nested queries:

var query = arg["$query"];
/* if query is decorated, it will be nested */
if ("$query" in query) {
  query = query["$query"];
}  else if ("query" in query) {
  query = query["query"];
}
...