Window Functions

Window functions provide the ability to perform calculations across sets of documents that are related to the current document. This is comparable to the type of calculation that can be done with an aggregate function. But unlike regular aggregate functions, use of a window function does not cause documents to become grouped into a single output document — the documents retain their separate identities. Behind the scenes, the window function is able to access more than just the current document.

Note

Great explanations of the general concepts of window functions can be found at PostgreSQL’s docs and at Oracle’s docs.

Syntax

Full Syntax:

"$window" : {
    "$analyze" : [
        {
            "field_name1" : {
                "$function1" : [args]|1
            },
            "field_name2" : {
                "$function2" : [args]|1
            },
            "$over" : {
                "$partitionBy" : <group _id syntax>,
                "$orderBy" : [["$field1", 1|-1], ["$field2", 1|-1]],
                "$rows"|"$range": [
                    <expr|"$unbounded"|"$currentRow">,
                    <expr|"$unbounded"|"$currentRow">
                ]
            }
            "$ignoreNulls": 1,
            "$maxDocsForRam": Natural
        }
    ],
    "$project" : <project syntax>,
    "$having": <match syntax>
}

Minimal Syntax:

"$window" : {
    "$analyze" : [
        {
            "field_name" : {
                "$function" : [args]|1
            }
        }
    ]
}
field_name
Defines the output field name for the calculation made in the window function.
$function
Defines the function used for the calculation. Any function from the List of window functions can be used. more than one function can be used per window.
[args]|1
Every window function is given its arguments in an array, If no arguments are expected then 1 is used.
$project

Defines the fields that get passed onto the next stage of the pipeline.

The _id field is added by default. To remove the _id field you must add “_id”: 0 to the $project stage.

“*”: 1 syntax does not work withing the window project, every column to project needs to specified explicitly.

If omitted or empty:
Only the _id field and the field calculated from the window function will continue to the next stage of the pipeline.
$having

Defines restrictions to the output data from the $window stage. Similar to adding a $match stage as the next stage in the pipeline.

If omitted:
No filtering is done on the results.
$over

Defines the window in which the functions will operate

If omitted:
Window function is calculated on all the documents, without a partition or an order.
$partitionBy

Defines a separation of the documents to groups that share the same value in the $partitionBy expression. All functions are calculated within their partition and ignore the documents outside of the partition. The syntax for the partition is the same as the $group _id syntax.

If omitted:
The window function will be calculated over all documents in the collection, without separation.
$orderBy

Defines the order of the documents in which the window functions will be calculated. If the order is nondeterministic and the $rows frame is used, the results may be nondeterministic.

Data types:
  • Data Types:
    • Numeric
    • Dates
    • Other: only if calculated in ram. see $maxDocsForRam.
  • When using $range, only numerical types and dates will work as expected. anything else will be viewed as null.
  • When using dates with $range, the offset must be set in milliseconds.
Syntax:
$orderBy can be defined in the following ways:
  • Can be an array of arrays: [[expr1, 1], [expr2, -1]]
  • Can be one long array: [expr1, 1, expr2, -1]

The second parameter: 1|-1 represent the direction of the order. 1 means ascending, -1 means descending.

If omitted:
For frame functions: the frame will be the whole partition. For non-frame functions: calculation will be nondeterministic.
$rows

Defines the frame limits in which the window functions will be calculated. The first variable in the array defines the frame starting point. The second variable in the array defines the frame ending point. The variables need to be one of the following: numeric offset, $unbounded or $currentRow The numeric offset can be constant or an expression, it represents the offset from the current document.

Examples:

$rows:[-2, 2] Describes a frame that includes all the documents two rows before the current document until two rows after the current document. A total of 5 documents.

$rows:[-3, -1] Describes a frame that includes all the documents 3 rows before the current document until 1 row before the current document. A total of 3 documents. the current document will not be included in this frame.

$rows:[“$unbounded”, -1] Describes a frame that includes all of the previous documents in the partition.

$rows:[-1, -1] Describes a frame that includes only the previous document.

$rows:[“$unbounded”, “$unbounded”] Describes a frame that includes all the documents in the partition.

$range

Defines the frame limits in which the window functions will be calculated. Unlike the $rows that correspond with the physical documents, the $range corresponds with the values from the first ordering expression specified in the $orderBy stage. $orderBy clause is mandatory for using $range.

The first variable in the array defines the frame starting point. The second variable in the array defines the frame ending point.

The variables need to be one of the following: numeric offset, $unbounded or $currentRow The numeric offset can be constant or an expression, it represents the offset from the current document’s value in the ordering expression.

Examples:

For the next examples we are assuming an $orderBy:[“$field1”] and the current document’s field1 value is 300:

$range:[-100, 100]: Describes a frame that include all the documents whose field1’s value is between 200 and 400.

$range:[-300, -100] Describes a frame that include all the documents whose field1’s value is between 0 and 200. the current document will not be included in this frame.

$rows:[“$unbounded”, 0] Describes a frame that includes all of the previous documents in the partition and all the documents that have the same order value as the current document (300).

$rows:[-1, -1] Describes a frame that includes only the documents documents whose field1’s value is 299.

$rows:[“$unbounded”, “$unbounded”] Describes a frame that includes all the documents in the partition.

If omitted:
  • If there is no $orderBy, the frame defaults to $rows: [“$unbounded”, “$unbounded”] which is the whole partition.
  • If there is an $orderBy and no $range the frame defaults to $range:[“$unbounded”, “$currentRow”] which is all the documents from the start of the partition through the last peer of the current document.
$unbounded
Can be used as a frame limit. If specified as the first argument then it represents the first document of the partition as the frame’s starting point. If specified as the second argument then it represents the last document of the partition as the frame’s ending point.
$currentRow

Can be used as a frame limit.

For $rows:
If specified as the first argument then it represents the current document as the frame’s starting point. If specified as the second argument then it represents the current document as the frame’s ending point.
For $range:
If specified as the first argument then it represents the first document with the current document’s sorting value as the frame’s starting point. If specified as the second argument then it represents the last document with the current document’s sorting value as the frame’s ending point.

Note

  • If the window function used does not support frames, the frame will be ignored.
  • $rows and $range cannot both be used together, since they would describe conflicting frame limits.
  • If the frame starting point is after the frame’s ending point the results of the functions will be null.
  • If frame limits are Omitted and there is no $orderBy the defaulted frame will be: $rows:[“$unbounded”, “$unbounded”] which is the whole partition.
  • If frame limits are Omitted and there is an $orderBy the defaulted frame will be: $range:[“$unbounded”, “$currentRow”] which is all the documents from the start of the partition through the last peer of the current document.
$maxDocsForRam

When calculating the window function a copy of the documents need to be stored locally. If the collection is small enough then it is stored in memory and the command runs faster. If the collection is too big it is stored on disk and takes longer to calculate. The number you would use as the argument will represent the maximum number of documents used in the window command you would still like to store in ram while running. Any number of documents higher than this number will automatically be saved to disk for the calculation.

Defaults to 2*1024*1024.

Note

To permanently change this number, the following global variable can be set: window_function_max_rows_for_ram_usage

$ignoreNulls
Relevant for the following functions:
  • $first : Will return the first non-null value in the frame. If none exist, returns null.
  • $last : Will return the last non-null value in the frame. If none exist, returns null.
  • $nthValue : Will return the nth non-null value in the frame. If there aren’t n, returns null.
  • $lag : Will return the nth non-null value behind the current doc. If there aren’t n, returns null.
  • $lead : Will return the nth non-null value ahead of the current doc. If there aren’t n, returns null.

Example

Here is an example that shows how to compare each employee’s salary with the average salary in his or her department:

Command:

"$window" : {
    "$analyze" : [
        {
            "avg_per_dept" : {
                "$avg" : ["$salary"]
            },
            "$over" : {
                "$partitionBy" : "$department"
            }
        }
    ],
    "$project" : {
        "_id": 0,
        "department" : 1,
        "salary" : 1,
        "employee_number": 1
    }
}

The results will look like:

{department: 1, employee_number: 11, salary: 5200, avg_per_dept: 5020}
{department: 1, employee_number:  7, salary: 4200, avg_per_dept: 5020}
{department: 1, employee_number:  9, salary: 4500, avg_per_dept: 5020}
{department: 1, employee_number:  8, salary: 6000, avg_per_dept: 5020}
{department: 1, employee_number: 10, salary: 5200, avg_per_dept: 5020}
{department: 2, employee_number:  5, salary: 3500, avg_per_dept: 3700}
{department: 2, employee_number:  2, salary: 3900, avg_per_dept: 3700}
{department: 3, employee_number:  3, salary: 4800, avg_per_dept: 4866.66666}
{department: 3, employee_number:  1, salary: 5000, avg_per_dept: 4866.66666}
{department: 3, employee_number:  4, salary: 4800, avg_per_dept: 4866.66666}

List of window functions

* Function supports frames

$avg

Supports frames

The $avg window function returns the average (arithmetic mean) of the input expression values. The $avg function works with numeric values and ignores null values.

If the frame is empty or contains only nulls, the calculation returns null.

Syntax:

$avg:[expr]

Example

Average sale per product

Command:

db.sale.aggregate({
    "$window" : {
        "$analyze" : [
            {
                "avg_sale_amount" : {
                    "$avg" : ["$amount"]
                },
                "$over" : {
                    "$partitionBy" : "$productid"
                }
            }
        ],
        "$project" : {
            "_id": 0,
            "amount" : 1,
            "productid" : 1
        }
    }
})

Results:

{ "avg_sale_amount" : 60,                "amount" : 15,  "productid" : 1 }
{ "avg_sale_amount" : 60,                "amount" : 15,  "productid" : 1 }
{ "avg_sale_amount" : 60,                "amount" : 150, "productid" : 1 }
{ "avg_sale_amount" : 10,                "amount" : 10,  "productid" : 2 }
{ "avg_sale_amount" : 37.5,              "amount" : 95,  "productid" : 3 }
{ "avg_sale_amount" : 37.5,              "amount" : 45,  "productid" : 3 }
{ "avg_sale_amount" : 37.5,              "amount" : 5,   "productid" : 3 }
{ "avg_sale_amount" : 37.5,              "amount" : 5,   "productid" : 3 }
{ "avg_sale_amount" : 66.66666666666667, "amount" : 50,  "productid" : 4 }
{ "avg_sale_amount" : 66.66666666666667, "amount" : 100, "productid" : 4 }
{ "avg_sale_amount" : 66.66666666666667, "amount" : 50,  "productid" : 4 }

Data used for examples

$count

Supports frames

The $count window function counts the documents defined by the expression.

The $count function has two variations. $count:1 counts all the documents in the frame whether they include nulls or not. “$count”:[expr] computes the number of documents with non-null values in a specific field or expression.

Syntax:

$count:[expr]|1

Example

Count sales per day

Command:

db.sale.aggregate({
    "$window" : {
        "$analyze" : [
            {
                "sales_per_day" : {
                    "$count" : 1
                },
                "$over" : {
                    "$partitionBy" : {"$dateToString": {format: "%Y-%m-%d", date: "$saledate"}}
                }
            }
        ],
        "$project" : {
            "_id": 0,
            "date" : {"$dateToString": {format: "%Y-%m-%d", date: "$saledate"}},
            "saledate" : 1,
            "productid": 1
        }
    }
})

Results:

{ "sales_per_day" : 4, "date" : "2016-12-31", "productid" : 2, "saledate" : ISODate("2016-12-31T11:00:00Z") }
{ "sales_per_day" : 4, "date" : "2016-12-31", "productid" : 1, "saledate" : ISODate("2016-12-31T12:00:00Z") }
{ "sales_per_day" : 4, "date" : "2016-12-31", "productid" : 1, "saledate" : ISODate("2016-12-31T15:30:00Z") }
{ "sales_per_day" : 4, "date" : "2016-12-31", "productid" : 3, "saledate" : ISODate("2016-12-31T12:00:00Z") }
{ "sales_per_day" : 5, "date" : "2016-12-30", "productid" : 4, "saledate" : ISODate("2016-12-30T17:00:00Z") }
{ "sales_per_day" : 5, "date" : "2016-12-30", "productid" : 3, "saledate" : ISODate("2016-12-30T17:00:00Z") }
{ "sales_per_day" : 5, "date" : "2016-12-30", "productid" : 4, "saledate" : ISODate("2016-12-30T16:00:00Z") }
{ "sales_per_day" : 5, "date" : "2016-12-30", "productid" : 1, "saledate" : ISODate("2016-12-30T12:10:00Z") }
{ "sales_per_day" : 5, "date" : "2016-12-30", "productid" : 3, "saledate" : ISODate("2016-12-30T12:10:00Z") }
{ "sales_per_day" : 2, "date" : "2017-01-01", "productid" : 3, "saledate" : ISODate("2017-01-01T11:30:00Z") }
{ "sales_per_day" : 2, "date" : "2017-01-01", "productid" : 4, "saledate" : ISODate("2017-01-01T12:10:00Z") }

Data used for examples

$cumeDist

does not support frames

The $cumeDist window function calculates the cumulative distribution of a value within a partition.

The formula for determining the cumulative distribution is:

cumulative_count / total_count

  • total_count: total documents in partition.
  • cumulative_count: count of documents with sort values that are smaller or equal to current sort value.

Note

If no $orderBy is specified the return value is 1 for all documents.

Syntax:

$cumeDist:1

Example

Calculate cumulative distribution of totalUSD per year

Command:

db.sale.aggregate({
    "$window" : {
        "$analyze" : [
            {
                "cume_dist" : {
                    "$cumeDist" : 1
                },
                "$over" : {
                    "$partitionBy" : {"$year": "$saledate"},
                    "$orderBy": ["$totalUSD", 1]
                }
            }
        ],
        "$project" : {
            "_id": 0,
            "year" : {"$year": "$saledate"},
            "totalUSD" : 1,
            "productid": 1
        }
    }
})

Results:

{ "cume_dist" : 0.1111111111111111, "productid" : 1, "totalUSD" : null,    "year" : 2016 }
{ "cume_dist" : 0.3333333333333333, "productid" : 3, "totalUSD" : 3500,    "year" : 2016 }
{ "cume_dist" : 0.3333333333333333, "productid" : 3, "totalUSD" : 3500,    "year" : 2016 }
{ "cume_dist" : 0.4444444444444444, "productid" : 2, "totalUSD" : 25000,   "year" : 2016 }
{ "cume_dist" : 0.5555555555555556, "productid" : 3, "totalUSD" : 56000,   "year" : 2016 }
{ "cume_dist" : 0.6666666666666666, "productid" : 4, "totalUSD" : 100000,  "year" : 2016 }
{ "cume_dist" : 0.7777777777777778, "productid" : 1, "totalUSD" : 150000,  "year" : 2016 }
{ "cume_dist" : 0.8888888888888888, "productid" : 4, "totalUSD" : 200000,  "year" : 2016 }
{ "cume_dist" : 1,                  "productid" : 1, "totalUSD" : 1500000, "year" : 2016 }
{ "cume_dist" : 0.5,                "productid" : 4, "totalUSD" : null,    "year" : 2017 }
{ "cume_dist" : 1,                  "productid" : 3, "totalUSD" : 25000,   "year" : 2017 }

Data used for examples

$denseRank

does not support frames

The $denseRank window function determines the rank of a value in a partition, based on the $orderBy expression. Documents with equal values for the ranking criteria receive the same rank.

The $denseRank function differs from $rank in one respect: If two or more documents tie, there is no gap in the sequence of ranked values. For example, if two documents are ranked 1, the next rank is 2.

Note

If no $orderBy is specified the return value is 1 for all documents.

Syntax:

$denseRank:1

Example

Calculate denseRank of productid and sale year

Command:

db.sale.aggregate({
    "$window" : {
        "$analyze" : [
            {
                "group_number_year_product" : {
                    "$denseRank" : 1
                },
                "$over" : {
                    "$orderBy": ["$productid", -1, {"$year": "$saledate"}, 1]
                }
            }
        ],
        "$project" : {
            "_id": 0,
            "year" : {"$year": "$saledate"},
            "productid": 1
        }
    }
})

Results:

{ "group_number_year_product" : NumberLong(1), "productid" : 4, "year" : 2016 }
{ "group_number_year_product" : NumberLong(1), "productid" : 4, "year" : 2016 }
{ "group_number_year_product" : NumberLong(2), "productid" : 4, "year" : 2017 }
{ "group_number_year_product" : NumberLong(3), "productid" : 3, "year" : 2016 }
{ "group_number_year_product" : NumberLong(3), "productid" : 3, "year" : 2016 }
{ "group_number_year_product" : NumberLong(3), "productid" : 3, "year" : 2016 }
{ "group_number_year_product" : NumberLong(4), "productid" : 3, "year" : 2017 }
{ "group_number_year_product" : NumberLong(5), "productid" : 2, "year" : 2016 }
{ "group_number_year_product" : NumberLong(6), "productid" : 1, "year" : 2016 }
{ "group_number_year_product" : NumberLong(6), "productid" : 1, "year" : 2016 }
{ "group_number_year_product" : NumberLong(6), "productid" : 1, "year" : 2016 }

Data used for examples

$first

Supports frames

The $first window function returns the evaluated expression from the first document in the frame.

The specified expression can return any data type.

When ignoreNulls is specified the function returns the first non-null evaluated expression, if it exists.

Syntax:

$first:[expr]

Example

Find the first sale in the last 4 hours with totalUSD that is not null

Command:

db.sale.aggregate({
    "$window" : {
        "$analyze" : [
            {
                "prev_4_hours" : {
                    "$first" : ["$totalUSD"]
                },
                "$over" : {
                    "$orderBy": ["$saledate", 1],
                    "$range":[-4 * 60 * 60 * 1000 , 0] // 4 hours
                },
                "$ignoreNulls":1
            }
        ],
        "$project" : {
            "_id": 0,
            "productid": 1,
            "saledate": 1,
            "totalUSD": 1
        }
    }
})

Results:

{ "prev_4_hours" : 1500000, "productid" : 1, "saledate" : ISODate("2016-12-30T12:10:00Z"), "totalUSD" : 1500000 }
{ "prev_4_hours" : 1500000, "productid" : 3, "saledate" : ISODate("2016-12-30T12:10:00Z"), "totalUSD" : 3500 }
{ "prev_4_hours" : 1500000, "productid" : 4, "saledate" : ISODate("2016-12-30T16:00:00Z"), "totalUSD" : 200000 }
{ "prev_4_hours" : 200000,  "productid" : 4, "saledate" : ISODate("2016-12-30T17:00:00Z"), "totalUSD" : 100000 }
{ "prev_4_hours" : 200000,  "productid" : 3, "saledate" : ISODate("2016-12-30T17:00:00Z"), "totalUSD" : 56000 }
{ "prev_4_hours" : 25000,   "productid" : 2, "saledate" : ISODate("2016-12-31T11:00:00Z"), "totalUSD" : 25000 }
{ "prev_4_hours" : 25000,   "productid" : 1, "saledate" : ISODate("2016-12-31T12:00:00Z"), "totalUSD" : null }
{ "prev_4_hours" : 25000,   "productid" : 3, "saledate" : ISODate("2016-12-31T12:00:00Z"), "totalUSD" : 3500 }
{ "prev_4_hours" : 3500,    "productid" : 1, "saledate" : ISODate("2016-12-31T15:30:00Z"), "totalUSD" : 150000 }
{ "prev_4_hours" : 25000,   "productid" : 3, "saledate" : ISODate("2017-01-01T11:30:00Z"), "totalUSD" : 25000 }
{ "prev_4_hours" : 25000,   "productid" : 4, "saledate" : ISODate("2017-01-01T12:10:00Z"), "totalUSD" : null }

Data used for examples

$last

Supports frames

The $last window function returns the evaluated expression from the last document in the frame.

The specified expression can return any data type.

When ignoreNulls is specified the function returns the last non-null evaluated expression, if it exists.

Syntax:

$last:[expr]

Example

Find the last sale in the next 4 hours with totalUSD that is not null

Command:

db.sale.aggregate({
    "$window" : {
        "$analyze" : [
            {
                "next_4_hours" : {
                    "$last" : ["$totalUSD"]
                },
                "$over" : {
                    "$orderBy": ["$saledate", 1],
                    "$range":[0, 4 * 60 * 60 * 1000 ]
                },
                "$ignoreNulls":1
            }
        ],
        "$project" : {
            "_id": 0,
            "productid": 1,
            "saledate": 1,
            "totalUSD": 1
        }
    }
})

Results:

{ "next_4_hours" : 1500000, "productid" : 1, "saledate" : ISODate("2016-12-30T12:10:00Z"), "totalUSD" : 1500000 }
{ "next_4_hours" : 1500000, "productid" : 3, "saledate" : ISODate("2016-12-30T12:10:00Z"), "totalUSD" : 3500 }
{ "next_4_hours" : 200000,  "productid" : 4, "saledate" : ISODate("2016-12-30T16:00:00Z"), "totalUSD" : 200000 }
{ "next_4_hours" : 100000,  "productid" : 4, "saledate" : ISODate("2016-12-30T17:00:00Z"), "totalUSD" : 100000 }
{ "next_4_hours" : 100000,  "productid" : 3, "saledate" : ISODate("2016-12-30T17:00:00Z"), "totalUSD" : 56000 }
{ "next_4_hours" : 25000,   "productid" : 2, "saledate" : ISODate("2016-12-31T11:00:00Z"), "totalUSD" : 25000 }
{ "next_4_hours" : 3500,    "productid" : 1, "saledate" : ISODate("2016-12-31T12:00:00Z"), "totalUSD" : null }
{ "next_4_hours" : 3500,    "productid" : 3, "saledate" : ISODate("2016-12-31T12:00:00Z"), "totalUSD" : 3500 }
{ "next_4_hours" : 150000,  "productid" : 1, "saledate" : ISODate("2016-12-31T15:30:00Z"), "totalUSD" : 150000 }
{ "next_4_hours" : 25000,   "productid" : 3, "saledate" : ISODate("2017-01-01T11:30:00Z"), "totalUSD" : 25000 }
{ "next_4_hours" : null,    "productid" : 4, "saledate" : ISODate("2017-01-01T12:10:00Z"), "totalUSD" : null }

Data used for examples

$lag

does not support frames

The $lag window function returns the values for a document at a given offset above (before) the current document in the partition.

If not offset is given, the offset will default to 1.

When ignoreNulls is specified, the $lag function counts the non-null evaluated expressions and when it reaches the given offset, It returns that expression if it exists.

Syntax:

$lag:[expr, offset] | [expr]

Example

Find the previous sale amount of the same day that is not null

Command:

db.sale.aggregate({
    "$window" : {
        "$analyze" : [
            {
                "prev_total" : {
                    "$lag" : ["$totalUSD", 1]
                },
                "$over" : {
                    "$orderBy": ["$saledate", 1],
                    "$partitionBy": {"$dateToString": {format: "%Y-%m-%d", date: "$saledate"}}
                },
                "$ignoreNulls":1
            }
        ],
        "$project" : {
            "_id": 0,
            "productid": 1,
            "saledate": 1,
            "totalUSD": 1
        }
    }
})

Results:

{ "prev_total" : null,    "productid" : 2, "saledate" : ISODate("2016-12-31T11:00:00Z"), "totalUSD" : 25000 }
{ "prev_total" : 25000,   "productid" : 1, "saledate" : ISODate("2016-12-31T12:00:00Z"), "totalUSD" : null }
{ "prev_total" : 25000,   "productid" : 3, "saledate" : ISODate("2016-12-31T12:00:00Z"), "totalUSD" : 3500 }
{ "prev_total" : 3500,    "productid" : 1, "saledate" : ISODate("2016-12-31T15:30:00Z"), "totalUSD" : 150000 }
{ "prev_total" : null,    "productid" : 1, "saledate" : ISODate("2016-12-30T12:10:00Z"), "totalUSD" : 1500000 }
{ "prev_total" : 1500000, "productid" : 3, "saledate" : ISODate("2016-12-30T12:10:00Z"), "totalUSD" : 3500 }
{ "prev_total" : 3500,    "productid" : 4, "saledate" : ISODate("2016-12-30T16:00:00Z"), "totalUSD" : 200000 }
{ "prev_total" : 200000,  "productid" : 4, "saledate" : ISODate("2016-12-30T17:00:00Z"), "totalUSD" : 100000 }
{ "prev_total" : 100000,  "productid" : 3, "saledate" : ISODate("2016-12-30T17:00:00Z"), "totalUSD" : 56000 }
{ "prev_total" : null,    "productid" : 3, "saledate" : ISODate("2017-01-01T11:30:00Z"), "totalUSD" : 25000 }
{ "prev_total" : 25000,   "productid" : 4, "saledate" : ISODate("2017-01-01T12:10:00Z"), "totalUSD" : null }

Data used for examples

$lead

does not support frames

The $lead window function returns the values for a document at a given offset below (after) the current document in the partition.

If not offset is given, the offset will default to 1.

When ignoreNulls is specified, the $lead function counts the non-null evaluated expressions and when it reaches the given offset, It returns that expression if it exists.

Syntax:

$lead:[expr, offset] | [expr]

Example

Find the next sale amount of the same day that is not null

Command:

db.sale.aggregate({
    "$window" : {
        "$analyze" : [
            {
                "prev_total" : {
                    "$lead" : ["$totalUSD", 1]
                },
                "$over" : {
                    "$orderBy": ["$saledate", 1],
                    "$partitionBy": {"$dateToString": {format: "%Y-%m-%d", date: "$saledate"}}
                },
                "$ignoreNulls":1
            }
        ],
        "$project" : {
            "_id": 0,
            "productid": 1,
            "saledate": 1,
            "totalUSD": 1
        }
    }
})

Results:

{ "prev_total" : 3500,   "productid" : 2, "saledate" : ISODate("2016-12-31T11:00:00Z"), "totalUSD" : 25000 }
{ "prev_total" : 150000, "productid" : 1, "saledate" : ISODate("2016-12-31T12:00:00Z"), "totalUSD" : null }
{ "prev_total" : 150000, "productid" : 3, "saledate" : ISODate("2016-12-31T12:00:00Z"), "totalUSD" : 3500 }
{ "prev_total" : null,   "productid" : 1, "saledate" : ISODate("2016-12-31T15:30:00Z"), "totalUSD" : 150000 }
{ "prev_total" : 3500,   "productid" : 1, "saledate" : ISODate("2016-12-30T12:10:00Z"), "totalUSD" : 1500000 }
{ "prev_total" : 200000, "productid" : 3, "saledate" : ISODate("2016-12-30T12:10:00Z"), "totalUSD" : 3500 }
{ "prev_total" : 100000, "productid" : 4, "saledate" : ISODate("2016-12-30T16:00:00Z"), "totalUSD" : 200000 }
{ "prev_total" : 56000,  "productid" : 4, "saledate" : ISODate("2016-12-30T17:00:00Z"), "totalUSD" : 100000 }
{ "prev_total" : null,   "productid" : 3, "saledate" : ISODate("2016-12-30T17:00:00Z"), "totalUSD" : 56000 }
{ "prev_total" : null,   "productid" : 3, "saledate" : ISODate("2017-01-01T11:30:00Z"), "totalUSD" : 25000 }
{ "prev_total" : null,   "productid" : 4, "saledate" : ISODate("2017-01-01T12:10:00Z"), "totalUSD" : null }

Data used for examples

$listAgg

does not support frames

The $listAgg function returns an array of all the values evaluated by the expression in the partition.

Syntax:

$listAgg:[expr]

Example

List all the dates that a product was sold

Command:

db.sale.aggregate({
    "$window" : {
        "$analyze" : [
            {
                "dates_per_product" : {
                    "$listAgg" : [{$dateToString:{date:"$saledate", format: "%Y-%m-%d %H:%M"}}]
                },
                "$over" : {
                    "$orderBy": ["$saledat", 1],
                    "$partitionBy": "$productid"
                }
            }
        ],
        "$project" : {
            "_id": 0,
            "productid": 1,
            "saledate": 1,
            "totalUSD": 1
        }
    }
})

Results:

{
        "dates_per_product" : [
                "2016-12-31 12:00",
                "2016-12-31 15:30",
                "2016-12-30 12:10"
        ],
        "productid" : 1,
        "saledate" : ISODate("2016-12-31T12:00:00Z"),
        "totalUSD" : null
}
{
        "dates_per_product" : [
                "2016-12-31 12:00",
                "2016-12-31 15:30",
                "2016-12-30 12:10"
        ],
        "productid" : 1,
        "saledate" : ISODate("2016-12-31T15:30:00Z"),
        "totalUSD" : 150000
}
{
        "dates_per_product" : [
                "2016-12-31 12:00",
                "2016-12-31 15:30",
                "2016-12-30 12:10"
        ],
        "productid" : 1,
        "saledate" : ISODate("2016-12-30T12:10:00Z"),
        "totalUSD" : 1500000
}
{
        "dates_per_product" : [
                "2016-12-31 11:00"
        ],
        "productid" : 2,
        "saledate" : ISODate("2016-12-31T11:00:00Z"),
        "totalUSD" : 25000
}
{
        "dates_per_product" : [
                "2016-12-30 17:00",
                "2017-01-01 11:30",
                "2016-12-30 12:10",
                "2016-12-31 12:00"
        ],
        "productid" : 3,
        "saledate" : ISODate("2016-12-30T17:00:00Z"),
        "totalUSD" : 56000
}
{
        "dates_per_product" : [
                "2016-12-30 17:00",
                "2017-01-01 11:30",
                "2016-12-30 12:10",
                "2016-12-31 12:00"
        ],
        "productid" : 3,
        "saledate" : ISODate("2017-01-01T11:30:00Z"),
        "totalUSD" : 25000
}
{
        "dates_per_product" : [
                "2016-12-30 17:00",
                "2017-01-01 11:30",
                "2016-12-30 12:10",
                "2016-12-31 12:00"
        ],
        "productid" : 3,
        "saledate" : ISODate("2016-12-30T12:10:00Z"),
        "totalUSD" : 3500
}
{
        "dates_per_product" : [
                "2016-12-30 17:00",
                "2017-01-01 11:30",
                "2016-12-30 12:10",
                "2016-12-31 12:00"
        ],
        "productid" : 3,
        "saledate" : ISODate("2016-12-31T12:00:00Z"),
        "totalUSD" : 3500
}
{
        "dates_per_product" : [
                "2016-12-30 17:00",
                "2016-12-30 16:00",
                "2017-01-01 12:10"
        ],
        "productid" : 4,
        "saledate" : ISODate("2016-12-30T17:00:00Z"),
        "totalUSD" : 100000
}
{
        "dates_per_product" : [
                "2016-12-30 17:00",
                "2016-12-30 16:00",
                "2017-01-01 12:10"
        ],
        "productid" : 4,
        "saledate" : ISODate("2016-12-30T16:00:00Z"),
        "totalUSD" : 200000
}
{
        "dates_per_product" : [
                "2016-12-30 17:00",
                "2016-12-30 16:00",
                "2017-01-01 12:10"
        ],
        "productid" : 4,
        "saledate" : ISODate("2017-01-01T12:10:00Z"),
        "totalUSD" : null
}

Data used for examples

$max

Supports frames

The $max function returns the maximum evaluated expression within a given frame.

Works with the same data types as $max $group function.

Syntax:

$max:[expr]

Example

Calculate the latest saledate per product in a running frame by amount

Command:

db.sale.aggregate({
    "$window" : {
        "$analyze" : [
            {
                "max_running_date" : {
                    "$max" : ["$saledate"]
                },
                "$over" : {
                    "$orderBy": ["$amount", 1],
                    "$partitionBy": "$productid"
                }
            }
        ],
        "$project" : {
            "_id": 0,
            "productid": 1,
            "saledate": 1,
            "totalUSD": 1
        }
    }
})

Results:

{ "max_running_date" : ISODate("2016-12-31T15:30:00Z"), "productid" : 1, "saledate" : ISODate("2016-12-31T12:00:00Z"), "totalUSD" : null }
{ "max_running_date" : ISODate("2016-12-31T15:30:00Z"), "productid" : 1, "saledate" : ISODate("2016-12-31T15:30:00Z"), "totalUSD" : 150000 }
{ "max_running_date" : ISODate("2016-12-31T15:30:00Z"), "productid" : 1, "saledate" : ISODate("2016-12-30T12:10:00Z"), "totalUSD" : 1500000 }
{ "max_running_date" : ISODate("2016-12-31T11:00:00Z"), "productid" : 2, "saledate" : ISODate("2016-12-31T11:00:00Z"), "totalUSD" : 25000 }
{ "max_running_date" : ISODate("2016-12-31T12:00:00Z"), "productid" : 3, "saledate" : ISODate("2016-12-30T12:10:00Z"), "totalUSD" : 3500 }
{ "max_running_date" : ISODate("2016-12-31T12:00:00Z"), "productid" : 3, "saledate" : ISODate("2016-12-31T12:00:00Z"), "totalUSD" : 3500 }
{ "max_running_date" : ISODate("2017-01-01T11:30:00Z"), "productid" : 3, "saledate" : ISODate("2017-01-01T11:30:00Z"), "totalUSD" : 25000 }
{ "max_running_date" : ISODate("2017-01-01T11:30:00Z"), "productid" : 3, "saledate" : ISODate("2016-12-30T17:00:00Z"), "totalUSD" : 56000 }
{ "max_running_date" : ISODate("2017-01-01T12:10:00Z"), "productid" : 4, "saledate" : ISODate("2016-12-30T17:00:00Z"), "totalUSD" : 100000 }
{ "max_running_date" : ISODate("2017-01-01T12:10:00Z"), "productid" : 4, "saledate" : ISODate("2017-01-01T12:10:00Z"), "totalUSD" : null }
{ "max_running_date" : ISODate("2017-01-01T12:10:00Z"), "productid" : 4, "saledate" : ISODate("2016-12-30T16:00:00Z"), "totalUSD" : 200000 }

Data used for examples

$median

does not support frames

The $median function returns the calculated median of the evaluated expression within a given partition.

The $median function is only calculated for numeric types and ignores nulls.

If you want to find the discrete median of non numeric types: use the $percentDisc function.

Syntax:

$median:[expr]

Example

Calculate the median amount per product

Command:

db.sale.aggregate({
    "$window" : {
        "$analyze" : [
            {
                "median_amount" : {
                    "$median" : ["$amount"]
                },
                "$over" : {
                    "$orderBy": ["$saledate", 1],
                    "$partitionBy": "$productid"
                }
            }
        ],
        "$project" : {
            "_id": 0,
            "productid": 1,
            "amount": 1,
            "saledate": 1
        }
    }
})

Results:

{ "median_amount" : 15, "amount" : 150, "productid" : 1, "saledate" : ISODate("2016-12-30T12:10:00Z") }
{ "median_amount" : 15, "amount" : 15,  "productid" : 1, "saledate" : ISODate("2016-12-31T12:00:00Z") }
{ "median_amount" : 15, "amount" : 15,  "productid" : 1, "saledate" : ISODate("2016-12-31T15:30:00Z") }
{ "median_amount" : 10, "amount" : 10,  "productid" : 2, "saledate" : ISODate("2016-12-31T11:00:00Z") }
{ "median_amount" : 25, "amount" : 5,   "productid" : 3, "saledate" : ISODate("2016-12-30T12:10:00Z") }
{ "median_amount" : 25, "amount" : 95,  "productid" : 3, "saledate" : ISODate("2016-12-30T17:00:00Z") }
{ "median_amount" : 25, "amount" : 5,   "productid" : 3, "saledate" : ISODate("2016-12-31T12:00:00Z") }
{ "median_amount" : 25, "amount" : 45,  "productid" : 3, "saledate" : ISODate("2017-01-01T11:30:00Z") }
{ "median_amount" : 50, "amount" : 100, "productid" : 4, "saledate" : ISODate("2016-12-30T16:00:00Z") }
{ "median_amount" : 50, "amount" : 50,  "productid" : 4, "saledate" : ISODate("2016-12-30T17:00:00Z") }
{ "median_amount" : 50, "amount" : 50,  "productid" : 4, "saledate" : ISODate("2017-01-01T12:10:00Z") }

Data used for examples

$min

Supports frames

The $min function returns the minimum evaluated expression within a given frame.

Works with the same data types as $max $group function.

Syntax:

$min:[expr]

Example

Calculate the latest saledate per product in a running frame by amount

Command:

db.sale.aggregate({
    "$window" : {
        "$analyze" : [
            {
                "min_running_date" : {
                    "$min" : ["$saledate"]
                },
                "$over" : {
                    "$orderBy": ["$amount", 1],
                    "$partitionBy": "$productid"
                }
            }
        ],
        "$project" : {
            "_id": 0,
            "productid": 1,
            "saledate": 1,
            "totalUSD": 1
        }
    }
})

Results:

{ "min_running_date" : ISODate("2016-12-31T12:00:00Z"), "productid" : 1, "saledate" : ISODate("2016-12-31T12:00:00Z"), "totalUSD" : null }
{ "min_running_date" : ISODate("2016-12-31T12:00:00Z"), "productid" : 1, "saledate" : ISODate("2016-12-31T15:30:00Z"), "totalUSD" : 150000 }
{ "min_running_date" : ISODate("2016-12-30T12:10:00Z"), "productid" : 1, "saledate" : ISODate("2016-12-30T12:10:00Z"), "totalUSD" : 1500000 }
{ "min_running_date" : ISODate("2016-12-31T11:00:00Z"), "productid" : 2, "saledate" : ISODate("2016-12-31T11:00:00Z"), "totalUSD" : 25000 }
{ "min_running_date" : ISODate("2016-12-30T12:10:00Z"), "productid" : 3, "saledate" : ISODate("2016-12-30T12:10:00Z"), "totalUSD" : 3500 }
{ "min_running_date" : ISODate("2016-12-30T12:10:00Z"), "productid" : 3, "saledate" : ISODate("2016-12-31T12:00:00Z"), "totalUSD" : 3500 }
{ "min_running_date" : ISODate("2016-12-30T12:10:00Z"), "productid" : 3, "saledate" : ISODate("2017-01-01T11:30:00Z"), "totalUSD" : 25000 }
{ "min_running_date" : ISODate("2016-12-30T12:10:00Z"), "productid" : 3, "saledate" : ISODate("2016-12-30T17:00:00Z"), "totalUSD" : 56000 }
{ "min_running_date" : ISODate("2016-12-30T17:00:00Z"), "productid" : 4, "saledate" : ISODate("2016-12-30T17:00:00Z"), "totalUSD" : 100000 }
{ "min_running_date" : ISODate("2016-12-30T17:00:00Z"), "productid" : 4, "saledate" : ISODate("2017-01-01T12:10:00Z"), "totalUSD" : null }
{ "min_running_date" : ISODate("2016-12-30T16:00:00Z"), "productid" : 4, "saledate" : ISODate("2016-12-30T16:00:00Z"), "totalUSD" : 200000 }

Data used for examples

$nthValue

Supports frames

The $nthValue window function returns the expression value of the specified document in the window frame relative to the first document of the frame.

The specified expression can return any data type.

When ignoreNulls is specified the function returns the nth non-null evaluated expression, if it exists.

Syntax:

$nthValue:[expr, n]

Example

Find the second sale in the last 4 hours with totalUSD that is not null

Command:

db.sale.aggregate({
    "$window" : {
        "$analyze" : [
            {
                "second_sale_in_last_4_hours" : {
                    "$nthValue" : ["$totalUSD", 2]
                },
                "$over" : {
                    "$orderBy": ["$saledate", 1],
                    "$range":[-4 * 60 * 60 * 1000 , 0] // 4 hours
                },
                "$ignoreNulls":1
            }
        ],
        "$project" : {
            "_id": 0,
            "productid": 1,
            "saledate": 1,
            "totalUSD": 1
        }
    }
})

Results:

{ "second_sale_in_last_4_hours" : 3500,   "productid" : 1, "saledate" : ISODate("2016-12-30T12:10:00Z"), "totalUSD" : 1500000 }
{ "second_sale_in_last_4_hours" : 3500,   "productid" : 3, "saledate" : ISODate("2016-12-30T12:10:00Z"), "totalUSD" : 3500 }
{ "second_sale_in_last_4_hours" : 3500,   "productid" : 4, "saledate" : ISODate("2016-12-30T16:00:00Z"), "totalUSD" : 200000 }
{ "second_sale_in_last_4_hours" : 100000, "productid" : 4, "saledate" : ISODate("2016-12-30T17:00:00Z"), "totalUSD" : 100000 }
{ "second_sale_in_last_4_hours" : 100000, "productid" : 3, "saledate" : ISODate("2016-12-30T17:00:00Z"), "totalUSD" : 56000 }
{ "second_sale_in_last_4_hours" : null,   "productid" : 2, "saledate" : ISODate("2016-12-31T11:00:00Z"), "totalUSD" : 25000 }
{ "second_sale_in_last_4_hours" : 3500,   "productid" : 1, "saledate" : ISODate("2016-12-31T12:00:00Z"), "totalUSD" : null }
{ "second_sale_in_last_4_hours" : 3500,   "productid" : 3, "saledate" : ISODate("2016-12-31T12:00:00Z"), "totalUSD" : 3500 }
{ "second_sale_in_last_4_hours" : 150000, "productid" : 1, "saledate" : ISODate("2016-12-31T15:30:00Z"), "totalUSD" : 150000 }
{ "second_sale_in_last_4_hours" : null,   "productid" : 3, "saledate" : ISODate("2017-01-01T11:30:00Z"), "totalUSD" : 25000 }
{ "second_sale_in_last_4_hours" : null,   "productid" : 4, "saledate" : ISODate("2017-01-01T12:10:00Z"), "totalUSD" : null }

Data used for examples

$nTile

does not support frames

The $nTile function splits the partition to n buckets of document of equal size. When equal size is not possible, the size will deviate by one.

Syntax:

$nTile:[n]

Example

Split sales into 4 buckets of equal size

Command:

db.sale.aggregate({
    "$window" : {
        "$analyze" : [
            {
                "bucket_year" : {
                    "$nTile" : [4]
                },
                "$over" : {
                    "$orderBy": ["$saledate", -1],
                    "$partitionBy":{$year: "$saledate"}
                }
            }
        ],
        "$project" : {
            "_id": 0,
            "year": {$year: "$saledate"},
            "saledate": 1
        }
    }
})

Results:

{ "bucket_year" : NumberLong(1), "saledate" : ISODate("2016-12-31T15:30:00Z"), "year" : 2016 }
{ "bucket_year" : NumberLong(1), "saledate" : ISODate("2016-12-31T12:00:00Z"), "year" : 2016 }
{ "bucket_year" : NumberLong(1), "saledate" : ISODate("2016-12-31T12:00:00Z"), "year" : 2016 }
{ "bucket_year" : NumberLong(2), "saledate" : ISODate("2016-12-31T11:00:00Z"), "year" : 2016 }
{ "bucket_year" : NumberLong(2), "saledate" : ISODate("2016-12-30T17:00:00Z"), "year" : 2016 }
{ "bucket_year" : NumberLong(3), "saledate" : ISODate("2016-12-30T17:00:00Z"), "year" : 2016 }
{ "bucket_year" : NumberLong(3), "saledate" : ISODate("2016-12-30T16:00:00Z"), "year" : 2016 }
{ "bucket_year" : NumberLong(4), "saledate" : ISODate("2016-12-30T12:10:00Z"), "year" : 2016 }
{ "bucket_year" : NumberLong(4), "saledate" : ISODate("2016-12-30T12:10:00Z"), "year" : 2016 }
{ "bucket_year" : NumberLong(1), "saledate" : ISODate("2017-01-01T12:10:00Z"), "year" : 2017 }
{ "bucket_year" : NumberLong(2), "saledate" : ISODate("2017-01-01T11:30:00Z"), "year" : 2017 }

Data used for examples

$percentRank

does not support frames

The $percentRank function calculated the percent rank of a row based on the following formula:

(rank - 1) / (total_count - 1)

  • total_count: total documents in partition.
  • rank: count of documents with sort values that are smaller or equal to current sort value.

Note

If no $orderBy is specified the return value is 0 for all documents.

Syntax:

$percentRank:1

Example

Calculate percent rank of totalUSD per year

Command:

db.sale.aggregate({
    "$window" : {
        "$analyze" : [
            {
                "percent_rank" : {
                    "$percentRank" : 1
                },
                "$over" : {
                    "$partitionBy" : {"$year": "$saledate"},
                    "$orderBy": ["$totalUSD", 1]
                }
            }
        ],
        "$project" : {
            "_id": 0,
            "year" : {"$year": "$saledate"},
            "totalUSD" : 1,
            "productid": 1
        }
    }
})

Results:

{ "percent_rank" : 0,     "productid" : 1, "totalUSD" : null,    "year" : 2016 }
{ "percent_rank" : 0.125, "productid" : 3, "totalUSD" : 3500,    "year" : 2016 }
{ "percent_rank" : 0.125, "productid" : 3, "totalUSD" : 3500,    "year" : 2016 }
{ "percent_rank" : 0.375, "productid" : 2, "totalUSD" : 25000,   "year" : 2016 }
{ "percent_rank" : 0.5,   "productid" : 3, "totalUSD" : 56000,   "year" : 2016 }
{ "percent_rank" : 0.625, "productid" : 4, "totalUSD" : 100000,  "year" : 2016 }
{ "percent_rank" : 0.75,  "productid" : 1, "totalUSD" : 150000,  "year" : 2016 }
{ "percent_rank" : 0.875, "productid" : 4, "totalUSD" : 200000,  "year" : 2016 }
{ "percent_rank" : 1,     "productid" : 1, "totalUSD" : 1500000, "year" : 2016 }
{ "percent_rank" : 0,     "productid" : 4, "totalUSD" : null,    "year" : 2017 }
{ "percent_rank" : 1,     "productid" : 3, "totalUSD" : 25000,   "year" : 2017 }

Data used for examples

$percentCont

does not support frames

$percentCont is an inverse distribution function that assumes a continuous distribution model. It takes a percentile value and a sort specification, and returns an interpolated value that would fall into the given percentile value with respect to the sort specification.

The percentile value of each document is calculated similarly to $percentRank, The difference is that the $rowNumber is used instead of the $rank value for the calculation. This means that two documents with the same sort value can have different percentile values.

When the percentile value given does not fall on a percent value calculated from the sort value, the value is calculated via a linear interpolated.

$orderBy with numerical values is required for this function.

Note

If no $orderBy is specified the return value is null for all documents.

Syntax:

$percentCont:[p]

p should be between 0 and 1.
If p < 0 we match it to 0 and if p is > 1 we match it to 1.

Example

Calculate different percentiles of sale amount per year

Command:

db.sale.aggregate({
    "$window" : {
        "$analyze" : [
            {
                "total" : {
                    "$count" : 1
                },
                "row_number": {
                    "$rowNumber": 1
                },
                "quarter": {
                     "$percentCont": [0.25]
                },
                "third": {
                     "$percentCont": [{$divide: [1, 3]}]
                },
                "half": {
                     "$percentCont": [0.5]
                },
                "two_thirds": {
                     "$percentCont": [{$divide: [2, 3]}]
                },
                "three_quarters": {
                     "$percentCont": [0.75]
                },
                "$over" : {
                    "$partitionBy" : {"$year": "$saledate"},
                    "$orderBy": ["$amount", 1],
                    "$rows":["$unbounded", "$unbounded"]
                }
            }
        ],
        "$project" : {
            "_id": 0,
            "year" : {"$year": "$saledate"},
            "amount" : 1,

        }
    }
},
{
    "$project": {
        "year": 1,
        "amount": 1,
        "percentile": {
            "$divide":[
                {"$subtract": ["$row_number", 1]},
                {"$subtract": ["$total", 1]}
            ]
        },
        "quarter": 1,
        "third": 1,
        "half": 1,
        "two_thirds": 1,
        "three_quarters":1
    }
})

Results:

{ "year" : 2016, "amount" : 5,   "percentile" : 0,     "quarter" : 10,    "third" : 13.333333333333332, "half" : 15,   "two_thirds" : 64.99999999999999,  "three_quarters" : 95 }
{ "year" : 2016, "amount" : 5,   "percentile" : 0.125, "quarter" : 10,    "third" : 13.333333333333332, "half" : 15,   "two_thirds" : 64.99999999999999,  "three_quarters" : 95 }
{ "year" : 2016, "amount" : 10,  "percentile" : 0.25,  "quarter" : 10,    "third" : 13.333333333333332, "half" : 15,   "two_thirds" : 64.99999999999999,  "three_quarters" : 95 }
{ "year" : 2016, "amount" : 15,  "percentile" : 0.375, "quarter" : 10,    "third" : 13.333333333333332, "half" : 15,   "two_thirds" : 64.99999999999999,  "three_quarters" : 95 }
{ "year" : 2016, "amount" : 15,  "percentile" : 0.5,   "quarter" : 10,    "third" : 13.333333333333332, "half" : 15,   "two_thirds" : 64.99999999999999,  "three_quarters" : 95 }
{ "year" : 2016, "amount" : 50,  "percentile" : 0.625, "quarter" : 10,    "third" : 13.333333333333332, "half" : 15,   "two_thirds" : 64.99999999999999,  "three_quarters" : 95 }
{ "year" : 2016, "amount" : 95,  "percentile" : 0.75,  "quarter" : 10,    "third" : 13.333333333333332, "half" : 15,   "two_thirds" : 64.99999999999999,  "three_quarters" : 95 }
{ "year" : 2016, "amount" : 100, "percentile" : 0.875, "quarter" : 10,    "third" : 13.333333333333332, "half" : 15,   "two_thirds" : 64.99999999999999,  "three_quarters" : 95 }
{ "year" : 2016, "amount" : 150, "percentile" : 1,     "quarter" : 10,    "third" : 13.333333333333332, "half" : 15,   "two_thirds" : 64.99999999999999,  "three_quarters" : 95 }
{ "year" : 2017, "amount" : 45,  "percentile" : 0,     "quarter" : 46.25, "third" : 46.66666666666667,  "half" : 47.5, "two_thirds" : 48.333333333333336, "three_quarters" : 48.75 }
{ "year" : 2017, "amount" : 50,  "percentile" : 1,     "quarter" : 46.25, "third" : 46.66666666666667,  "half" : 47.5, "two_thirds" : 48.333333333333336, "three_quarters" : 48.75 }

Data used for examples

$percentDisc

does not support frames

$percentDisc is an inverse distribution function that assumes a discrete cumulative distribution distribution model. It takes a percentile value and a sort specification, and returns a sort value from the given set.

The percentile value of each document is calculated similarly to $cumeDist.

When the percentile value given (p) does not fall on a percent value calculated from the sort value, the value is given is the sort value that matches the smallest cumulative distribution value that is grater than p.

$orderBy with numerical values is required for this function.

Note

If no $orderBy is specified the return value is null for all documents.

Syntax:

$percentDisc:[p]

p should be between 0 and 1.
If p < 0 we match it to 0 and if p is > 1 we match it to 1.

Example

Calculate different percentiles of sale amount per year

Command:

db.sale.aggregate({
    "$window" : {
        "$analyze" : [
            {
                "quarter": {
                     "$percentDisc": [0.25]
                },
                "third": {
                     "$percentDisc": [{$divide: [1, 3]}]
                },
                "half": {
                     "$percentDisc": [0.5]
                },
                "two_thirds": {
                     "$percentDisc": [{$divide: [2, 3]}]
                },
                "three_quarters": {
                     "$percentDisc": [0.75]
                },
                "percentile" : {
                    "$cumeDist" : 1
                },
                "$over" : {
                    "$partitionBy" : {"$year": "$saledate"},
                    "$orderBy": ["$amount", 1],
                    "$rows":["$unbounded", "$unbounded"]
                }
            }
        ],
        "$project" : {
            "_id": 0,
            "year" : {"$year": "$saledate"},
            "amount" : 1,

        }
    }
})

Results:

{ "quarter" : 10, "third" : 10, "half" : 15, "two_thirds" : 50, "three_quarters" : 95, "percentile" : 0.2222222222222222, "amount" : 5,   "year" : 2016 }
{ "quarter" : 10, "third" : 10, "half" : 15, "two_thirds" : 50, "three_quarters" : 95, "percentile" : 0.2222222222222222, "amount" : 5,   "year" : 2016 }
{ "quarter" : 10, "third" : 10, "half" : 15, "two_thirds" : 50, "three_quarters" : 95, "percentile" : 0.3333333333333333, "amount" : 10,  "year" : 2016 }
{ "quarter" : 10, "third" : 10, "half" : 15, "two_thirds" : 50, "three_quarters" : 95, "percentile" : 0.5555555555555556, "amount" : 15,  "year" : 2016 }
{ "quarter" : 10, "third" : 10, "half" : 15, "two_thirds" : 50, "three_quarters" : 95, "percentile" : 0.5555555555555556, "amount" : 15,  "year" : 2016 }
{ "quarter" : 10, "third" : 10, "half" : 15, "two_thirds" : 50, "three_quarters" : 95, "percentile" : 0.6666666666666666, "amount" : 50,  "year" : 2016 }
{ "quarter" : 10, "third" : 10, "half" : 15, "two_thirds" : 50, "three_quarters" : 95, "percentile" : 0.7777777777777778, "amount" : 95,  "year" : 2016 }
{ "quarter" : 10, "third" : 10, "half" : 15, "two_thirds" : 50, "three_quarters" : 95, "percentile" : 0.8888888888888888, "amount" : 100, "year" : 2016 }
{ "quarter" : 10, "third" : 10, "half" : 15, "two_thirds" : 50, "three_quarters" : 95, "percentile" : 1,                  "amount" : 150, "year" : 2016 }
{ "quarter" : 45, "third" : 45, "half" : 45, "two_thirds" : 50, "three_quarters" : 50, "percentile" : 0.5,                "amount" : 45,  "year" : 2017 }
{ "quarter" : 45, "third" : 45, "half" : 45, "two_thirds" : 50, "three_quarters" : 50, "percentile" : 1,                  "amount" : 50,  "year" : 2017 }

Data used for examples

$rank

does not support frames

The $rank window function determines the rank of a value in a partition, based on the $orderBy expression. Documents with equal values for the ranking criteria receive the same rank.

The $rank function differs from $denseRank in one respect: If two or more documents tie, there is a gap in the sequence of ranked values. For example, if two documents are ranked 1, the next rank is 3.

Note

If no $orderBy is specified the return value is 1 for all documents.

Syntax:

$rank:1

Example

Calculate rank of productid and sale year

Command:

db.sale.aggregate({
    "$window" : {
        "$analyze" : [
            {
                "rank_year_product" : {
                    "$rank" : 1
                },
                "$over" : {
                    "$orderBy": ["$productid", -1, {"$year": "$saledate"}, 1]
                }
            }
        ],
        "$project" : {
            "_id": 0,
            "year" : {"$year": "$saledate"},
            "productid": 1
        }
    }
})

Results:

{ "rank_year_product" : NumberLong(1), "productid" : 4, "year" : 2016 }
{ "rank_year_product" : NumberLong(1), "productid" : 4, "year" : 2016 }
{ "rank_year_product" : NumberLong(3), "productid" : 4, "year" : 2017 }
{ "rank_year_product" : NumberLong(4), "productid" : 3, "year" : 2016 }
{ "rank_year_product" : NumberLong(4), "productid" : 3, "year" : 2016 }
{ "rank_year_product" : NumberLong(4), "productid" : 3, "year" : 2016 }
{ "rank_year_product" : NumberLong(7), "productid" : 3, "year" : 2017 }
{ "rank_year_product" : NumberLong(8), "productid" : 2, "year" : 2016 }
{ "rank_year_product" : NumberLong(9), "productid" : 1, "year" : 2016 }
{ "rank_year_product" : NumberLong(9), "productid" : 1, "year" : 2016 }
{ "rank_year_product" : NumberLong(9), "productid" : 1, "year" : 2016 }

Data used for examples

$ratioToReport

does not support frames

The $ratioToReport function finds the ratio of an expression to the sum of that expression across the partition.

Syntax:

$ratioToReport:[expr]

Example

Calculate the ration between the sale and the total sales per year per product

Command:

db.sale.aggregate({
    "$window" : {
        "$analyze" : [
            {
                "sales_ratio" : {
                    "$ratioToReport" : ["$totalUSD"]
                },
                "total": {
                    "$sum": ["$totalUSD"]
                },
                "$over" : {
                    "$partitionBy": {"year": {"$year": "$saledate"}, "product": "$productid"},
                    "$rows": ["$unbounded", "$unbounded"]
                }
            }
        ],
        "$project" : {
            "_id": 0,
            "year" : {"$year": "$saledate"},
            "productid": 1,
            "totalUSD": 1
        }
    }
},
{
    $sort: {
        "year": -1,
        "productid": 1
    }
})

Results:

{ "sales_ratio" : 1,                   "total" : 25000,   "productid" : 3, "totalUSD" : 25000,   "year" : 2017 }
{ "sales_ratio" : null,                "total" : null,    "productid" : 4, "totalUSD" : null,    "year" : 2017 }
{ "sales_ratio" : null,                "total" : 1650000, "productid" : 1, "totalUSD" : null,    "year" : 2016 }
{ "sales_ratio" : 0.09090909090909091, "total" : 1650000, "productid" : 1, "totalUSD" : 150000,  "year" : 2016 }
{ "sales_ratio" : 0.9090909090909091,  "total" : 1650000, "productid" : 1, "totalUSD" : 1500000, "year" : 2016 }
{ "sales_ratio" : 1,                   "total" : 25000,   "productid" : 2, "totalUSD" : 25000,   "year" : 2016 }
{ "sales_ratio" : 0.8888888888888888,  "total" : 63000,   "productid" : 3, "totalUSD" : 56000,   "year" : 2016 }
{ "sales_ratio" : 0.05555555555555555, "total" : 63000,   "productid" : 3, "totalUSD" : 3500,    "year" : 2016 }
{ "sales_ratio" : 0.05555555555555555, "total" : 63000,   "productid" : 3, "totalUSD" : 3500,    "year" : 2016 }
{ "sales_ratio" : 0.3333333333333333,  "total" : 300000,  "productid" : 4, "totalUSD" : 100000,  "year" : 2016 }
{ "sales_ratio" : 0.6666666666666666,  "total" : 300000,  "productid" : 4, "totalUSD" : 200000,  "year" : 2016 }

Data used for examples

$rowNumber

does not support frames

The $rowNumber function returns the count of how many documents have been so far in the partition including the current document.

Syntax:

$rowNumber:1

Example

Calculate the ration between the sale and the total sales per year per product

Command:

db.sale.aggregate({
    "$window" : {
        "$analyze" : [
            {
                "row_number": {
                    "$rowNumber": 1
                },
                "$over" : {
                    "$partitionBy" : {
                        "year": {"$year": "$saledate"},
                        "month": {"$month": "$saledate"},
                        "day": {"$dayOfMonth": "$saledate"}
                    }
                }
            }
        ],
        "$project" : {
            "_id": 0,
            "date" : {"$dateToString": {"date": "$saledate", "format":"%Y-%m-%d"}},
            "productid": 1,
            "totalUSD": 1
        }
    }
})

Results:

{ "row_number" : 1, "date" : "2016-12-31", "productid" : 2, "totalUSD" : 25000 }
{ "row_number" : 2, "date" : "2016-12-31", "productid" : 1, "totalUSD" : null }
{ "row_number" : 3, "date" : "2016-12-31", "productid" : 1, "totalUSD" : 150000 }
{ "row_number" : 4, "date" : "2016-12-31", "productid" : 3, "totalUSD" : 3500 }
{ "row_number" : 1, "date" : "2017-01-01", "productid" : 3, "totalUSD" : 25000 }
{ "row_number" : 2, "date" : "2017-01-01", "productid" : 4, "totalUSD" : null }
{ "row_number" : 1, "date" : "2016-12-30", "productid" : 4, "totalUSD" : 100000 }
{ "row_number" : 2, "date" : "2016-12-30", "productid" : 3, "totalUSD" : 56000 }
{ "row_number" : 3, "date" : "2016-12-30", "productid" : 4, "totalUSD" : 200000 }
{ "row_number" : 4, "date" : "2016-12-30", "productid" : 1, "totalUSD" : 1500000 }
{ "row_number" : 5, "date" : "2016-12-30", "productid" : 3, "totalUSD" : 3500 }

Data used for examples

$stddev

Supports frames

The $stddev window function returns the sample standard deviation of the input expression values. The $stddev function works with numeric values and ignores null values.

If the frame contains less than two document with non-null values, the calculation returns null.

Syntax:

$stddev:[expr]

Example

Running standard deviation of amount per product

Command:

db.sale.aggregate({
    "$window" : {
        "$analyze" : [
            {
                "stddev_amount" : {
                    "$stddev" : ["$amount"]
                },
                "$over" : {
                    "$partitionBy" : "$productid",
                    "$orderBy": ["$saledate", 1]
                }
            }
        ],
        "$project" : {
            "_id": 0,
            "date": {$dateToString: {"date": "$saledate", "format": "%Y-%m-%d %H:%M"}},
            "amount" : 1,
            "productid" : 1
        }
    }
})

Results:

{ "stddev_amount" : 0,                  "amount" : 150, "date" : "2016-12-30 12:10", "productid" : 1 }
{ "stddev_amount" : 95.45941546018392,  "amount" : 15,  "date" : "2016-12-31 12:00", "productid" : 1 }
{ "stddev_amount" : 77.94228634059948,  "amount" : 15,  "date" : "2016-12-31 15:30", "productid" : 1 }
{ "stddev_amount" : 0,                  "amount" : 10,  "date" : "2016-12-31 11:00", "productid" : 2 }
{ "stddev_amount" : 0,                  "amount" : 5,   "date" : "2016-12-30 12:10", "productid" : 3 }
{ "stddev_amount" : 63.63961030678928,  "amount" : 95,  "date" : "2016-12-30 17:00", "productid" : 3 }
{ "stddev_amount" : 51.96152422706632,  "amount" : 5,   "date" : "2016-12-31 12:00", "productid" : 3 }
{ "stddev_amount" : 42.720018726587654, "amount" : 45,  "date" : "2017-01-01 11:30", "productid" : 3 }
{ "stddev_amount" : 0,                  "amount" : 100, "date" : "2016-12-30 16:00", "productid" : 4 }
{ "stddev_amount" : 35.35533905932738,  "amount" : 50,  "date" : "2016-12-30 17:00", "productid" : 4 }
{ "stddev_amount" : 28.867513459481287, "amount" : 50,  "date" : "2017-01-01 12:10", "productid" : 4 }

Data used for examples

$stddevPop

Supports frames

The $stddevPop window function returns the population standard deviation of the input expression values. The $stddevPop function works with numeric values and ignores null values.

If the frame contains no document with non-null values, or the frame is empty, the calculation returns null.

Syntax:

$stddevPop:[expr]

Example

Running population standard deviation of amount per product

Command:

db.sale.aggregate({
    "$window" : {
        "$analyze" : [
            {
                "stddev_pop_amount" : {
                    "$stddevPop" : ["$amount"]
                },
                "$over" : {
                    "$partitionBy" : "$productid",
                    "$orderBy": ["$saledate", 1]
                }
            }
        ],
        "$project" : {
            "_id": 0,
            "date": {$dateToString: {"date": "$saledate", "format": "%Y-%m-%d %H:%M"}},
            "amount" : 1,
            "productid" : 1
        }
    }
})

Results:

{ "stddev_pop_amount" : 0,                  "amount" : 150, "date" : "2016-12-30 12:10", "productid" : 1 }
{ "stddev_pop_amount" : 67.5,               "amount" : 15,  "date" : "2016-12-31 12:00", "productid" : 1 }
{ "stddev_pop_amount" : 63.63961030678928,  "amount" : 15,  "date" : "2016-12-31 15:30", "productid" : 1 }
{ "stddev_pop_amount" : 0,                  "amount" : 10,  "date" : "2016-12-31 11:00", "productid" : 2 }
{ "stddev_pop_amount" : 0,                  "amount" : 5,   "date" : "2016-12-30 12:10", "productid" : 3 }
{ "stddev_pop_amount" : 45,                 "amount" : 95,  "date" : "2016-12-30 17:00", "productid" : 3 }
{ "stddev_pop_amount" : 42.42640687119285,  "amount" : 5,   "date" : "2016-12-31 12:00", "productid" : 3 }
{ "stddev_pop_amount" : 36.99662146737185,  "amount" : 45,  "date" : "2017-01-01 11:30", "productid" : 3 }
{ "stddev_pop_amount" : 0,                  "amount" : 100, "date" : "2016-12-30 16:00", "productid" : 4 }
{ "stddev_pop_amount" : 25,                 "amount" : 50,  "date" : "2016-12-30 17:00", "productid" : 4 }
{ "stddev_pop_amount" : 23.570226039551585, "amount" : 50,  "date" : "2017-01-01 12:10", "productid" : 4 }

Data used for examples

$sum

Supports frames

The $sum window function returns the sum of the input expression values. The $sum function works with numeric values and ignores null values.

If the frame contains no document with non-null values, or the frame is empty, the calculation returns null.

Syntax:

$sum:[expr]

Example

Sum the total sales from the last 6 hours

Command:

db.sale.aggregate({
    "$window" : {
        "$analyze" : [
            {
                "sales_last_6_hours" : {
                    "$sum" : ["$totalUSD"]
                },
                "$over" : {
                    "$orderBy": ["$saledate", 1],
                    "$range": [ -6 * 60 * 60 * 1000 , "$currentRow"]
                }
            }
        ],
        "$project" : {
            "_id": 0,
            "date": {$dateToString: {"date": "$saledate", "format": "%Y-%m-%d %H:%M"}},
            "totalUSD" : 1
        }
    }
})

Results:

{ "sales_last_6_hours" : 1503500, "date" : "2016-12-30 12:10", "totalUSD" : 1500000 }
{ "sales_last_6_hours" : 1503500, "date" : "2016-12-30 12:10", "totalUSD" : 3500 }
{ "sales_last_6_hours" : 1703500, "date" : "2016-12-30 16:00", "totalUSD" : 200000 }
{ "sales_last_6_hours" : 1859500, "date" : "2016-12-30 17:00", "totalUSD" : 100000 }
{ "sales_last_6_hours" : 1859500, "date" : "2016-12-30 17:00", "totalUSD" : 56000 }
{ "sales_last_6_hours" : 25000,   "date" : "2016-12-31 11:00", "totalUSD" : 25000 }
{ "sales_last_6_hours" : 28500,   "date" : "2016-12-31 12:00", "totalUSD" : null }
{ "sales_last_6_hours" : 28500,   "date" : "2016-12-31 12:00", "totalUSD" : 3500 }
{ "sales_last_6_hours" : 178500,  "date" : "2016-12-31 15:30", "totalUSD" : 150000 }
{ "sales_last_6_hours" : 25000,   "date" : "2017-01-01 11:30", "totalUSD" : 25000 }
{ "sales_last_6_hours" : 25000,   "date" : "2017-01-01 12:10", "totalUSD" : null }

Data used for examples

$var

Supports frames

The $var window function returns the sample variance of the input expression values. The $var function works with numeric values and ignores null values.

If the frame contains less than two document with non-null values, the calculation returns null.

Syntax:

$var:[expr]

Example

Running sample variance of amount per product

Command:

db.sale.aggregate({
    "$window" : {
        "$analyze" : [
            {
                "var_amount" : {
                    "$var" : ["$amount"]
                },
                "$over" : {
                    "$partitionBy" : "$productid",
                    "$orderBy": ["$saledate", 1]
                }
            }
        ],
        "$project" : {
            "_id": 0,
            "date": {$dateToString: {"date": "$saledate", "format": "%Y-%m-%d %H:%M"}},
            "amount" : 1,
            "productid" : 1
        }
    }
})

Results:

{ "var_amount" : null,              "amount" : 150, "date" : "2016-12-30 12:10", "productid" : 1 }
{ "var_amount" : 9112.5,            "amount" : 15,  "date" : "2016-12-31 12:00", "productid" : 1 }
{ "var_amount" : 6075,              "amount" : 15,  "date" : "2016-12-31 15:30", "productid" : 1 }
{ "var_amount" : null,              "amount" : 10,  "date" : "2016-12-31 11:00", "productid" : 2 }
{ "var_amount" : null,              "amount" : 5,   "date" : "2016-12-30 12:10", "productid" : 3 }
{ "var_amount" : 4050,              "amount" : 95,  "date" : "2016-12-30 17:00", "productid" : 3 }
{ "var_amount" : 2700,              "amount" : 5,   "date" : "2016-12-31 12:00", "productid" : 3 }
{ "var_amount" : 1825,              "amount" : 45,  "date" : "2017-01-01 11:30", "productid" : 3 }
{ "var_amount" : null,              "amount" : 100, "date" : "2016-12-30 16:00", "productid" : 4 }
{ "var_amount" : 1250,              "amount" : 50,  "date" : "2016-12-30 17:00", "productid" : 4 }
{ "var_amount" : 833.3333333333334, "amount" : 50,  "date" : "2017-01-01 12:10", "productid" : 4 }

Data used for examples

$varPop

Supports frames

The $varPop window function returns the population variance of the input expression values. The $varPop function works with numeric values and ignores null values.

If the frame contains no document with non-null values, or the frame is empty, the calculation returns null.

Syntax:

$varPop:[expr]

Example

Running population variance of amount per product

Command:

db.sale.aggregate({
    "$window" : {
        "$analyze" : [
            {
                "var_pop_amount" : {
                    "$varPop" : ["$amount"]
                },
                "$over" : {
                    "$partitionBy" : "$productid",
                    "$orderBy": ["$saledate", 1]
                }
            }
        ],
        "$project" : {
            "_id": 0,
            "date": {$dateToString: {"date": "$saledate", "format": "%Y-%m-%d %H:%M"}},
            "amount" : 1,
            "productid" : 1
        }
    }
})

Results:

{ "var_pop_amount" : 0,                 "amount" : 150, "date" : "2016-12-30 12:10", "productid" : 1 }
{ "var_pop_amount" : 4556.25,           "amount" : 15,  "date" : "2016-12-31 12:00", "productid" : 1 }
{ "var_pop_amount" : 4050,              "amount" : 15,  "date" : "2016-12-31 15:30", "productid" : 1 }
{ "var_pop_amount" : 0,                 "amount" : 10,  "date" : "2016-12-31 11:00", "productid" : 2 }
{ "var_pop_amount" : 0,                 "amount" : 5,   "date" : "2016-12-30 12:10", "productid" : 3 }
{ "var_pop_amount" : 2025,              "amount" : 95,  "date" : "2016-12-30 17:00", "productid" : 3 }
{ "var_pop_amount" : 1800,              "amount" : 5,   "date" : "2016-12-31 12:00", "productid" : 3 }
{ "var_pop_amount" : 1368.75,           "amount" : 45,  "date" : "2017-01-01 11:30", "productid" : 3 }
{ "var_pop_amount" : 0,                 "amount" : 100, "date" : "2016-12-30 16:00", "productid" : 4 }
{ "var_pop_amount" : 625,               "amount" : 50,  "date" : "2016-12-30 17:00", "productid" : 4 }
{ "var_pop_amount" : 555.5555555555555, "amount" : 50,  "date" : "2017-01-01 12:10", "productid" : 4 }

Data used for examples

Data used for examples

Date Collection:

{ "saledate" : ISODate("2016-12-30T17:00:00Z"), "productid" : 4, "amount" : 50,  "totalUSD" : 100000  }
{ "saledate" : ISODate("2016-12-30T17:00:00Z"), "productid" : 3, "amount" : 95,  "totalUSD" : 56000   }
{ "saledate" : ISODate("2016-12-30T16:00:00Z"), "productid" : 4, "amount" : 100, "totalUSD" : 200000  }
{ "saledate" : ISODate("2016-12-31T11:00:00Z"), "productid" : 2, "amount" : 10,  "totalUSD" : 25000   }
{ "saledate" : ISODate("2016-12-31T12:00:00Z"), "productid" : 1, "amount" : 15,  "totalUSD" : null    }
{ "saledate" : ISODate("2016-12-31T15:30:00Z"), "productid" : 1, "amount" : 15,  "totalUSD" : 150000  }
{ "saledate" : ISODate("2017-01-01T11:30:00Z"), "productid" : 3, "amount" : 45,  "totalUSD" : 25000   }
{ "saledate" : ISODate("2017-01-01T12:10:00Z"), "productid" : 4, "amount" : 50,  "totalUSD" : null    }
{ "saledate" : ISODate("2016-12-30T12:10:00Z"), "productid" : 1, "amount" : 150, "totalUSD" : 1500000 }
{ "saledate" : ISODate("2016-12-30T12:10:00Z"), "productid" : 3, "amount" : 5,   "totalUSD" : 3500    }
{ "saledate" : ISODate("2016-12-31T12:00:00Z"), "productid" : 3, "amount" : 5,   "totalUSD" : 3500    }