Cloud Storage

Collections are “born” on mounted volumes but they can be also stored on a cloud storage service. The collections’ lifecycle policy defines what data is stored locally and what data is stored on a cloud service. It also defines the cloud storage class required, such as hot, cold or off-line storage.

For example, a collection can be defined to be backed by the Amazon Web Service S3 using different storage classes as data ages. Its lifecycle policy can require that data older than 6 month will be stored as Standard S3 objects, “colder” data older than a year should be stored as S3-IA objects, data older than 2 years should be stored “offline” using Amazon Glacier and data older than 4 years should be deleted completely.

Regardless of where the data is stored SonarW operations remain the same with two exceptions. First, when data reaches it’s “offline” stage in its lifecycle, the user needs to ask to bring it online again before the data is accessible to queries. Second, data access speed characteristics such as read speed and latency will match those provided by the cloud service instead of those provided by the mounted volume.

Once a collection is defined to have cloud backed storage all the data is copied to the cloud service on an ongoing basis as the collection grows. When a block of data ages enough to meet the lifecycle policy definition, SonarW verifies that the data exists on the cloud service and then removes the data from the local storage.

If you have configured encryption for a database containing a cloud-backed collection the data on the cloud service will also be encrypted using the same key. Note that this is different than using the cloud service itself to maintain encrypted data.

When a collection is dropped, its cloud-backed data is deleted.

SonarW supports the following cloud services, with the associated storage classes:

  Hot Cold Offline
Amazon AWS S3 S3 Standard Storage S3 IA-Standard Storage Glacier
Microsoft Azure Azure Blob Storage - Hot/Cool storage is set per storage account
Google Object Storage Region/Multi-Region Nearline Coldline
Softlayer Object Store Standard N/A N/A

To define a collection as cloud-backed use the collMod admin command. If the collection does not exists, it will be created. If it does exists, all data will be sent to the cloud service. It may take a while for large collections to be sent to the cloud service.

For example, to convert a collection named “salary” in the “employees” database to be backed by a cloud do:

use employees
db.runCommand(
   {
     collMod: salary,
     sonarStorage: {
           URI : Cloud-specific connection information
          }
     }
)

The connection information is stored encrypted by SonarW. We recommend that you configure SonarW to use a KMIP appliance or at least set up a passphrase-protected key. Even if you do not want to encrypt your data these keys will be used to encrypt the cloud service connection information.

If the connection information such as a key or a password have changed issue the collMod command again with new keys or connection information.

To check if a collection is cloud-backed, run the db.<collection>.stats() command and check for the “sonarStorage” field. If it appears, the collection is cloud-backed.

URI By Cloud Provider

Amazon S3

The value for the URI in the collMod command is an object with one field, “fs”:

“s3://ACCESS_KEY_ID:SECRET_ACCESS_KEY”

ACCESS_KEY_ID is a 20 byte string and SECRET_ACCESS_KEY is a 40 byte string. These are provided by Amazon at http://docs.aws.amazon.com/general/latest/gr/aws-sec-cred-types.html#access-keys-and-secret-access-keys

For Example:

{
"fs" : "s3://KEYIDVKEYIDGL12KEYID:SECRETKEY+nberMqrQqQvoR9R6ONj2-ZA42aH002"
}

SoftLayer

The format is the same as Amazon S3, except that the first word in the “fs” field is “sl” instead of “s3”. For example:

{
"fs" : "sl://KEYIDVKEYIDGL12KEYID:SECRETKEY+nberMqrQqQvoR9R6ONj2-ZA42aH002"
}

Microsoft Azure Blob Storage

SonarW uses Azure Blob Storage in a storage account. A storage account is associated with keys. A key provides full access to a specific storage account. We recommend using a separate storage account for SonarW data since this will provide better access control. It will also enable Azure metrics, usage and diagnostics for SonarW data separate from other data you may have on Azure.

The value for the URI in the collMod command is an object with one field, “fs”, which is a string of this format:

“ms://STORAGE_ACCOUNT//KEY”

Where STORAGE_ACCOUNT is the name of the storage account and KEY is any one of the two keys furnished by Azure for the account. To see the keys, login to the Azure portal, select “Storage Account” icon on the left side of the portal, then select the account, then click on “Access keys”.

For example:

{
 fs: "ms://jsonar//Jn3W9fQdzMW/oQMP49bBkMi43zS+YjhjsdExHDHexhKxS2j81yaMulc4LJKeyCiXX4IxGeOeTzKEYQ=="
}

Google Cloud Services (GCS)

To access Google Cloud Services from a service you need to create a Service Account for that service in the GCS Console. The account should have the role “Storage Administrator” for the GCS project. We recommend using a separate project for storing SonarW data. Service Accounts are identified by special gmail addresses and are authenticated using private keys.

Once you create a Service Account, the GCS console will provide you with the gmail address for the account and a private-public key pair.

The value for the URI field in the CollMod command is an object with four string fields:

{
    "fs" : "gcs://",
           "gc_project_id" : project_id,
            "gc_email" : client_email,
            "gc_private_key" : private_key,

}
  • fs is always the string “gcs://”
  • gc_project_id is the project ID in GCS
  • gc_email is the GCS Service Account email
  • gc_private_key is the private key for that email address.

Setting Per-DB Defaults

You can set the cloud defaults for new collections in the database. Each DB can have a collection called “system.cloud” with a set of rules controlling the cloud-backed storage of new collections. Each rule has a numerical rule ID, and the rules are processed in order until a rule match. The rule can result in either using cloud storage or the decision not to use cloud storage.

This mechanism will never cause any system collection to have cloud storage.

The user running the cloudDefaults command must have the ilmAction permission on the database.

For example, this command will make sure that the collection “salaries” is not backed in the cloud:

> db.runCommand({cloudDefaults:{ruleid:1, name:"salaries", sonarStorage: null}} )
{ "ok" : 1 }

This rule will make sure that the collection “addresses” will be backed by S3 storage:

> db.runCommand({cloudDefaults:{ruleid:2, name:"addresses", sonarStorage: { URI :{ fs: "KEYIDVKEYIDGL12KEYID:SECRETKEY+nberMqrQqQvoR9R6ONj2-ZA42aH002” }} }})
{ "ok" : 1 }

This rule will make sure that any collection ending with “_monthly” will be backed in the cloud. Notice the use of “regex” instead of “name”:

> db.runCommand({cloudDefaults:{ruleid:3, regex:".*_monthly$", sonarStorage: { URI :{ fs: "s3://KEYIDVKEYIDGL12KEYID:SECRETKEY+nberMqrQqQvoR9R6ONj2-ZA42aH002" }} }})
{ "ok" : 1 }

The overall syntax for the cloudDefaults admin command is:

'cloudDefaults': { ruleid: rule_number,
                           [ name: collection_name |  regex: collection_name_regex ],
                           sonarStorage :  { cloud_specification } | null
                         }

Note:

The field ruleid must be unique number and sets the order of the rule processing. You must specify either the field “name” or “regex” but cannot specify both. When a collection is created, the name of the collection is matched against this value to check if the collection should use cloud storage and the cloud storage parameters. Set sonarStorage to null if you don’t want the collection to have cloud storage, or use the same syntax for the sonarStorage field as shown above.

You can delete one rule by using db.system.cloud.remove({..}) or you can also drop the entire collection. You can also update a rule using db.system.cloud.update(). Note that collections that were already created by the time the rule was changed will not be affected by the rule change.

These rules would look like (note that the cloud parameters are encrypted):

> db.system.cloud.find()
{ "_id" : ObjectId("594455603848c33e0000002c"), "ruleid" : NumberLong(1), "name" : "salaries", "regex" : "", "defaults" : null }
{ "_id" : ObjectId("594455863848c33e0000002d"), "ruleid" : NumberLong(2), "name" : "addressed", "regex" : "", "defaults" : "Ofm6IQRLxZFJHap52elgUhjqUgQ/1ecWnHOAYh5A5MCN/SdwBexUaHN+q/ZirURv+xmBCtC4wclbMzXTXSbW77T5QgCG2DC2BxzvoNcxME0Xd/0jleutXWYkKykaRu9rbjoyXCbSkyd09bVfqzqTR4qHMj2t2Bans6Cu2kuYIWIN9fyKeutaQ+LxohpXEfszVaxbw3zUKXKTbdEgncrGOzcEdkS1n/nAw/LIU1pvPqHKuk3sttRNofuVq8qYttR5OURg/RgoTqOrcTWb3GA+YM0BtNrHuHo/2xzWXgV9bZpWWSt2LRzP9p+MkV8cdrxAL1ngsITr6CAK/FvbVkWjI5AodWgpr42lLaxRQG/IoYy4U4P/9gCvQBcdEy0CPIUangrFkIoRpNNZ6FuV99X/7w==" }
{ "_id" : ObjectId("594455943848c33e0000002f"), "ruleid" : NumberLong(3), "name" : "", "regex" : ".*_MONTHLY$", "defaults" : "y8ELkoVO0TLw8xC3kHCEVBKuDNnJlH4u45vAQ/OVa4r24bC3zlh9uM/RK9/4u0VC7VUmVu//TKUYa5TWtr8KevfW3LlieT2NckHd/PP3JUiczrWxOPgTafNfa1Tz96Ox7h7kJuz1/+DRQp7zbgecLk++iZXtzRknnNAp/OmRT74hlwj42L4Xh/NB17CQ9TlnvZz+HzhgZor207JvPR8kp/c3NLbCFPvW/L6zSVCBUG+Jrmvd+5grBfnI5gSeOKHqZq7bZnqIE4fOZVLA1v919ElAKj9SnETxcB0360aXv8+CUAHsBFStCc0RRrGKsBX1cgQ1MSU+b3/vVJJKA6R0gCG8tJbc3G9N0lqNjo1+/iHvHpl59h66eEX0QKsMHONeQY9QPh2G/rKguAw2E7Rh0g==" }

Lifecycle Policy Commands

The setLifecycle policy command asks the cloud service to change the storage class of old data files specified in days - but only if it is supported (currently Azure, Google, and AWS support it). For example the following command:

db.runCommand({ setLifecycle: { "salary": { cold: 30, offline: 60 }, tag: "my_policy" } })
{ "response" : "", "ok" : 1 }

will cause SonarW to move data to cold storage 30 days after the data was inserted into SonarW and move it offline (Glacier or ColdLine) after 60 days. A policy has a tag which has no effect but helps in managing the various policies. In this case the tag is “my_policy”. If you work with the GUI all collections which share the same policy will have the same tag.

The command will return an error if the cloud provider doesn’t support the cold/offline storage class or if the offline days are fewer than the cold days.

To return the lifecycle policy, use:

> db.runCommand({ getLifecycle: "salary" })
{
        "salary" : [
                {
                        "tag" : "my_policy",
                        "cold" : 30,
                        "offline" : 60
                }
        ],
        "ok" : 1
}

Sonarw does not save the policies internally. The reported policy is the real-world policy actually used by the cloud service. SonarW queries the cloud service to get the current policy.

Default database policies are maintained in system.cloud. You can configure both purge and cloud policies in this collection. The collection may be specified by name or using a regex match. The policy may specify that collection(s) be included or excluded.