Woocommerce Menu

Is Cosmos DB an option for you?

Introduction

Cosmos DB is a complex resource in Azure cloud which has a dense billing model to research and analyze. Therefore, you should accurately understand the resource and billing model prior to considering Cosmos DB as an option for your solution. You may need to read various documents in order to understand this model. Due to complexity, I am writing this article to support Solution Architects/Developers to make decisions. This article links you to very import documents to read and understand before making any decision.

Solution Architects look for NoSQL database, when considering advantages that it can provides over traditional relational databases. Those advantages are:

  • To store large volumes of data that might have little to no structure.
  • To make the most of cloud computing and storage.
  • To speed up the development.
  • To boost horizontal scalability.

When considering Cosmos DB as a NoSQL database as IaaS, additionally we have few more advantages, when considering below features.

  • Ability to elastically scale:
  • Economies of scale
  • Optimized for the cloud
  • Doesn’t require provisioning VMs to scale:
  • You automatically get high availability, with at least 10-20 fault domains by default

Things to be considered

When it comes to cloud computing, having a cost-effective solution is utmost critical for a customer since adding any resource to cloud has a direct impact towards the monthly billing . Therefore, before moving into Cosmos DB, there are few important factors to be considered.

1.Cost

This is the first thing that you must understand before moving into cosmos DB as it has been designed to cater for performance critical applications with high volume of usage. In order to provide throughput requirements for the application, it elastically scales up automatically adding more servers to your cluster. As it is horizontally scalable, the platform can add more and more servers to your cluster. Therefore, cost for cosmos DB is directly associated with maximum throughput that you used within an hour. As an example if you use 4000 RU (Request Unit) in any given occasion within an hour, you will be charged for 4000RU for complete hour. Which means that this billing model is suitable for the application where you may experience more traffic continuously within an hour. Thus, the cost for 100RU would be $0.008. In this case, the cost for an hour is $0.32(0.008 * 40). More examples of billing, can be found here. Now you might think of many areas which would lead to the below questions:

I. How much of RUs should you allocate for your application?

It totally depends on your application traffic patterns. You could set maximum throughput that your cosmos DB account should support and set it to auto scale to maximum throughput when your application needs it. Minimum throughput that you could set is 400RU and minimum throughput always should be 10% of maximum throughput. Which means you should set maximum throughput to 4000RU or more. If it’s 4000RU, even though you didn’t use resources in Cosmos DB account, you should pay $0.032/hour for minimum throughput. If you set maximum throughput to 10000RU, your minimum throughput should be set to 1000RU. In this case, your minimum cost will be $0.08/hour and maximum cost would be $0.8/hour.

Also, note that minimum throughput that you set depends on the number of containers that you have in DB and storage that has been used by each container. It indicates that, unlike the SQL server, as you add more and more containers to database and as containers use more and more GBs, the default amount which you pay would increase. Please refer this document for more information.  

You could also set provision throughput to manual and set to 400RU. In this case, your application could go for only maximum of 400RU and your requests may get time-out for even small burst in the traffic. Therefore, I would recommend to go for at least a maximum auto scale up to 4000RU.

II. Is there any other billing modal, which suits for application with light traffic?

Yes of course. When creating Cosmos DB account, you should set capacity mode to Provision Throughput or Serverless. All what was elaborated above is on Provision Throughput because if the application is performance critical, you should consider the Provision Throughput model. If your application is not performance critical, serverless mode is suited for you as it charges based on the number of total requests which you sent to Cosmos DB. But Microsoft recommends it for below types of applications.

  1. Development
  2. Testing
  3. Prototyping
  4. Proof of concept
  5. Non-critical application with light traffic

Please note that it is in preview mode as of 5th of Sep 2020. You can refer this document for comparison between Provision Throughput mode and Serverless mode.

2. Performance Requirements

Azure Cosmos DB is Microsoft’s globally distributed, high performance, multi-model database service. With a click of a button, Cosmos DB enables you to elastically and independently scale throughput and storage across any number of Azure regions worldwide. You could elastically scale throughput and storage and take advantage of fast, single-digit-millisecond data access. This means it is always best suited for performance critical applications. It transparently replicates your data wherever your users are, so your users can interact with a replica of the data that is closest to them.

Designed with transparent horizontal partitioning and multi-master replication, it offers unprecedented elastic scalability for your writes and reads, all around the globe. You can elastically scale up from thousands to hundreds of millions of requests/sec around the globe, with a single API call and pay only for the throughput (and storage) you need.

When considering all capabilities of Cosmos DB above, it is best suited for performance critical, always on application with high volume of usage.

3. Data Structure

As it is document DB, it does not limit the type of data that it can store together. It enables you to add new data types, as you need changes. This will speed up your development and deployment process immensely. If your denormalized data can be held in a single document, without making documents too big, data query time would drastically improve compared to querying the same amount of data from relational database.

Even though it says NoSQL databases are good for storing large volumes of data that might have little to no structure, when we think about real-world real-life use cases that we are dealing with, they are more or less relational. Therefore, you need to look into how we can modal and partition data in Cosmos DB NoSQL database. If you have access to Plural site, I strongly recommend you to follow this course “Data Modelling and Partitioning in Azure Cosmos DB: What Every Relational Database User Needs to Know”. It explains you in detail, how you can modal your relational data in NoSQL database.

In Cosmos DB, there is a concept call partitioning. When storing JSON data in cosmos DB, they are stored in logical partitions. Data is stored in particular logical partition, depending on partition key that you specified, when you are creating Cosmos DB container.

In Cosmos DB, maximum size of document that it supports is 2MB, and single logical partition can hold only 10GB of documents. When you query a single partition, query performance is so high compared to cross partition queries. At the same time, when querying on single partitions, number of Request Units (RU) that you consume is low compared to cross partition queries. If you had used lots of cross partition queries, cost for queries may get increased. If you cannot find the best partition key within JSON document with high cardinality, you can go  to Synthetic Partition key.

 You need to accurately select partition key when creating Cosmos DB container, otherwise it may increase your monthly cost immensely.

4. Type of queries that you are going to execute.

Even though we stored denormalized data in a container, we cannot hold lots of data in a single document. As an example, if a particular customer is having thousands of orders, we cannot hold all the orders in a single customer document. You may need to have two containers for Customers and Orders. If you wanted to get a list of orders with customer detail in a single query, you cannot do that in Cosmos DB as it is not allowing a query between containers. In this case, you may need to send two queries and it could be an additional cost for you.

Therefore, if your application requires lots of complex queries in between containers, Cosmos DB would not be good choice for you.

Conclusion

Cosmos DB was originally designed for performance critical applications and later they introduced Serverless capacity mode, which is still in preview mode for non-critical applications with light traffic. It removes your burden of managing your cluster depending on application traffic. If you want to replicate data across the regions, you can do that in few easy steps. Your Cosmos DB account itself handles all that for you.

 When you move into Cosmos DB, cost analysis is a must. Prior to that, you must analyze your performance requirement and traffic patterns of your application. If your application/product is a performance critical application with strong user base with high volume of usage, Cosmos DB is a good  option for you, if all the above 4 elements that I mentioned above are feasible with your customer needs.

References

https://docs.microsoft.com/en-us/azure/cosmos-db/
https://app.pluralsight.com/library/courses/microsoft-ignite-session-65/table-of-contentshttps://support.rackspace.com/how-to/reasons-to-use-a-nosql-db/#:~:text=NoSQL%20databases%20do%20not%20limit,of%20cloud%20computing%20and%20storage.
https://support.rackspace.com/how-to/reasons-to-use-a-nosql-db/#:~:text=NoSQL%20databases%20do%20not%20limit,of%20cloud%20computing%20and%20storage

Leave a Reply

Your email address will not be published. Required fields are marked *