Purge and Recompress

Purge

Purge is used to delete parts of a collection that have not been modified in X time, The main use for this feature is when the data in SonarW is governed by a specific retention period and you want the system to automatically get rid of it.

Purge can be ran as a single One-off command, or be scheduled into a cron job.

Running a single purge: The purge command has 3 parameters:

  1. Collection name - the name of the collection in the “current” db

  2. Interval - the time period before “now” that governs the time for parts to be purged

    The time interval could be defined in 2 ways:

    1. As an ISODate - which will define a specific time
    2. As a string defining a time delta in seconds,hours, days, weeks, months or years (for example: ‘360 s’, ‘5 h’, ‘60 d’, ‘8 w’, ‘2 m’ or ‘1 y’)
  3. Purge Level - what data to purge, the default is ‘local’. If this is a local collection then there is one option for purging the collection and that’s The default local option. If this is a cloud collection the data is stored both locally and on the cloud. Purging with the default “local” level will delete the local data but will keep the cloud copy. And querying the collection after the purge will be the same as before albeit a little slower. Purging with the “cloud” level will delete the local data and the cloud data. And querying the collection after the purge will show fewer records.

The command syntax:

db.runCommand({"sonar_purge": <collection_name>,"interval":<time> [, "level": "cloud" or "local"] })

Examples:

  1. remove part that older than 01 Jan 2017 using ISODate : db.runCommand({“sonar_purge”: collection”,”interval”:ISODate(“2017-01-01T00:00:00Z”) })
  2. remove part that older than year using string: db.runCommand({“sonar_purge”: “collection”,”interval”:”1 y”})
  3. remove part that older than 10 minutes using string: db.runCommand({“sonar_purge”: “collection”,”interval”:”600 s”})
  4. remove part that older than 10 minutes using string on the cloud as well as locally: db.runCommand({“sonar_purge”: “collection”,”interval”:”600 s”, “level”: “cloud”})

Note: The last part of a collection will never be purged.

Using the command: db.runCommand({‘partsInfo’:’<collection name>’}) before and after the purge will provide the relevant information on the parts.

Recompress

Depending on the application, data size can grow over time and consume relative disk space. In normal operation, SonarW utilizes LZ4 compression algorithm to compress column data, the algorithm provides a good balance between preserving disk space while providing high throughput compression and decompression operations. Sonar provides the ability to re-compress data which has been not been modified for a specified amount of time using a better compression algorithm, such as BZ2. This restores disk space utilized by the older data while still being accessible within SonarW. There is no change in the expected results or the way queries execute after recompression.

Recompress is executed in a similar way to Purge. It also can be run as a one-off command or scheduled as a recurring job. The command syntax uses the same type of parameters:

  1. Collection name - the name of the collection in the “current” db
  2. Interval - the time period before “now” that governs the time for parts to be purged

As described in the Purge section above.

Examples:

  1. Recompress data that older than 01 Jan 2017 using ISODate: db.runCommand({“sonar_recompress”: “collection”,”interval”:ISODate(“2017-01-01T00:00:00Z”) })
  2. Recompress data that that older than 4 weeks using string: db.runCommand({“sonar_recompress”: “collection”,”interval”:”4 w”})
  3. Recompress data that older than 6 month using string: db.runCommand({“sonar_recompress”: “collection”,”interval”:”6 m”})

Note: The last part of a collection will never be recompressed.

Scheduling of Purge and Recompress

Sonar provides the ability to schedule job of purge and/or recompress process.

The adding, changing and removing schedules can be performed by any users that have access to dropping those collections. Executing these recurring tasks is only allowed by users that have storageAdmin role. This role is added to “root” users by default.

Scheduling purge & recompress is done by defining “jobs” and “Tasks”. “Task” - represent a collection of one or more jobs, that are triggered by a single command. A single “Task” can perform multiple “jobs”.

A “job” is defined and represented as a document in one of 2 dedicated collections:

  • Purge job definitions are stored in the “admin” DB in the “system.purge” collection
  • Recompress job definitions are stored in the “admin” DB in the “system.recompress” collection

Adding a recurring task requires the definition of 3 parameters:

  1. Task - the task “name” (String)
  2. Interval - the time period before “now” that governs the time as described above.
  3. Collection - The collection the job would be executed on

When defining the collection there are 3 options:

  1. collection-name - In this case Sonar will assume the DB is the “current” DB.
  2. db.collection-name - this option identifies a collection on a specific DB.
  3. db - this option defines that the job is to be performed on all the collections in the specified DB.

Adding and removing jobs

Purge

Add:

db.runCommand({ addRecurringPurge: {task: <task_name>, interval: <time_interval>, collection: <collection_name> or ns: <db>.<collection> or ns: <db> } })

Remove:

db.runCommand({ removeRecurringPurge: {task:<task_name>, ns:'<db>.<collection>' or '<db>' or '<db>.*'}})

Note: no need to define <interval> in the remove.

As described above there are 3 options to specify the collection parameter to be “removed”

  • ‘db.collection’ - this is a full path to a specified collection
  • ‘db’ - this is the case where a task is defined on the DB
  • ‘db.*’ - this option is for removing all tasks on the “DB” (like the previous one) and also all tasks defined on specific collections in the specified DB.

Example: We can have the next 3 tasks:

  1. db.runCommand({ addRecurringPurge: {task: “Task1”, interval: ‘3 h’ , ns: “mydb.collection1” } })
  2. db.runCommand({ addRecurringPurge: {task: “Task1”, interval: ‘1 d’ , ns: “mydb” } })
  3. db.runCommand({ addRecurringPurge: {task: “Task1”, interval: ‘2 m’ , ns: “mydb.collection2” } })

Then when removing we have 3 options:

  1. Deleting only job #1: db.runCommand({ removeRecurringPurge: {task: “Task1”, ns:“mydb.collection1”}})
  2. Deleting only job #2: db.runCommand({ removeRecurringPurge: {task: “Task1”, ns:“mydb”}})
  3. Deleting all 3 jobs: db.runCommand({ removeRecurringPurge: {task: “Task1”, ns:“mydb.*”}})

Note: Dropping a collection or database will also remove the recurring task corresponding to said collection or db.

Dropping collection: ‘collection1’ in ‘mydb’ will run a task similar to: db.runCommand({ removeRecurringPurge: {task: “Task1”, ns:“mydb.collection1”}}) Dropping database ‘mydb’ will run a task similar to: db.runCommand({ removeRecurringPurge: {task: “Task1”, ns:“mydb.*”}})

Recompress

Uses similar syntax to Purge.

Add:

db.runCommand({ addRecurringRecompress: {task: <task_name>, interval: <time_interval>, collection: <collection_name> or ns: <db>.<collection> or ns: <db> } })

Remove:

db.runCommand({removeRecurringRecompress: {task:<task_name>, ns:'<db>.<collection>' or '<db>' or '<db>.*'}})

Removing whole tasks

To remove all the jobs in a specific task run the following command:

db.runCommand({ removeRecurringTask: <task_name> })

This will remove all the purge and recompress jobs with that task.

Scheduling a cron job for recurring Purg/Recompress

Only users with the “storageAdmin” role are allowed to execute a “Task”. These users must be created in the “admin” DB.

The command for running a “Task”:

db.runCommand({ runRecurringTask: <task_name> })

The command can be executed manually from the shell.

Steps for creating a cron job

  1. Ensure the user (linux user) has the appropriate permissions to create, and schedule the job on the server, as well as the relevant access to the mongo shell.
  2. Create a *.js file that will contain the task-command. Make sure the file is saved in a location where the user from the previous step has access to and with appropriate permissions. (For example: create the file /home/user1/SonarTask1.js)
  3. In the js file add the command to be executed,

for example:

db.runCommand({“runRecurringTask”:"DailyPurge"})
  1. Create a cron job: $ crontab -e
(will edit the crontab)
  1. Using standard vi command add the full cron job definition:

<schedule> mongo localhost:27117/admin -u <user-name> -p <Password> <full-path-name-to-js-file>

  • <schedule> - a cron string
  • <user-name> - user with “storageAdmin” role
  • <Password> - password for the user
  • <full-path-name-to-js-file> - full path name for the js file
  1. Save and exit

Example:

**0 15 03 * * mongo localhost:27117/admin -uStorAdmin -pStoreAdminPW /home/user1/SonarTask1.js**