CosmosDB vs SQL Server – Which is the right option for your data?

Introduction

I have been working with SQL Server for over 20 years now, and it is still going strong having evolved from on-premise only to both on-premise and in the cloud in the form of SQL Azure DB, SQL Managed Instances and SQL Server on a VM.

With the massive growth in data over the last decade, one of the largest questions remains: a company’s data journey. Are there certain types of data that do not require relational database structures? Examples could be game save data from my console, or let us say I am going for a walk, should the storage of the telematics from my lunchtime walk be stored in a SQL Server Database or is there another more suitable option?

CosmosDB/NOSQL database – Could this be the answer?

Whereas SQL Server is a relational database which has tables, keys and foreign keys with normalised data. ‘Not Only SQL’, commonly called NoSQL, is the opposite; it uses a flat structure that stores data in non-tabular formats and is de-normalised. NoSQL has been on the rise over the last couple of decades as an alternative to relational databases, one of the most popular options has been MongoDB (Released in 2009 and there is an API for this for CosmosDB).

In 2017, Microsoft did not want to miss this growing market and so released its own NoSQL version. It was initially known as DocumentDB and has since been renamed CosmosDB.

One of the reasons for the name change is that they now have different flavours of this NoSQL database and want to project the global redundancy as a key selling point, hence Cosmos. These API’s are as follows:

NoSQL – Native SQL queries with automatic indexing and schema flexibility for document-based workloads. This is the default option.
MongoDB – Migrate existing MongoDB applications seamlessly with full wire protocol compatibility and familiar MongoDB tools.
Apache Cassandra – Build highly scalable applications using the Cassandra Query Language (CQL) with automatic global distribution.
Apache Gremlin – Create graph-based applications using the Gremlin graph traversal language for complex relationship queries.
Table – Modernize Azure Table Storage applications with premium capabilities and global distribution.
PostgresSQL – Build distributed relational applications using the familiar PostgreSQL wire protocol with horizontal scaling.

For this article, we are going to be concentrating SQL vs the NoSQL offering of CosmosDB as this is the default and most widely used option. The other options are there to make migrations from these technologies easier for customers into CosmosDB. For more information on these you can look at following – https://learn.microsoft.com/en-us/azure/cosmos-db/

The JSON of it all

For CosmosDB and their NoSQL version of the database it stores the data in JSON (JavaScript Object Notation) files. These JSON documents are platform independent and use a human-readable text to store and transmit data. Where this really comes into its own is that it does not have a set structure unlike SQL Server. So let look at some examples JSON files:

File 1

{
"employeeId": 1001,
"firstName": "Brian",
"lastName": "Thomas",
"Age": 44,
"Gender": "Male",
],
"phoneNumbers": [
{ "type": "work",
"number": "+44 7890123456"
},
{
"type": "home",
"number": "+44 7890123457"
}
],
"Address": {
"streetAddress": "25",
"city": "Plymouth",
"County": "Devon",
"postalArea": "PL"
}
}

File 2

{ "employeeId": 1002,
"firstName": "Martina",
"lastName": "Franks",
"Age": 35,
"Gender": "Female",
"phoneNumbers": [{
"type": "work",
"number": "+44 7890123458"
}],
"Address": {
"streetAddress": "36",
"city": "Taunton",
" County ": "Somerset"
}
}

File 3

{
"employeeId": 1003,
"firstName": "Mark",
"lastName": "Smith",
"Age": 28,
"Gender": "Male",
"Hobbies": [
"Walking",
"Tennis"
],
"phoneNumbers": [
{
"type": "Personal",
"number": "+1 78901234569"
}
],
"Address": {
"city": "San Francisco",
" State ": "California",
}}

We can see all 3 files are slightly different. They all have the employeeId, firstname, lastname, age and gender. After this we can see the differences starting to show in that although some have multiple phones numbers that are different and indeed employeeID 1003 lives in the USA as we can see from his address details and phone number and so one of the field names here is State rather than County. Plus, we are also listing his hobbies, whereas this is not included for the other two employee files. We could put these hobbies into a different file with the same ‘employeeid’. These choices are known as Embedding vs Referencing. For more information on this please look at the following Microsoft article – https://learn.microsoft.com/en-us/azure/cosmos-db/modeling-data

NoSQL databases cope with these kinds of differences with no issues at least in terms of storage. Though when querying these files, any differences would need to be considered. This is where structure of relational database would be more rigid and would require much of this data to be normalised into different tables.

Partitioning Your Data is key for NoSQL

What is particularly important in CosmosDB, is to make sure you choose the right Partition Key. This key is used to spread your data across logical partitions (These are data stores that sit on physical partitions). Choose incorrectly and this can cause Hot Partitions, which can lead expensive queries in terms of RU usage. Looking at the above files if we were to choose ‘city’ as a key, then San Francisco is more likely to have hotter partition due to the population of the city when compared to Taunton. Or if we chose ‘lastname’ there will be more people called Smith than Franks. The ‘employeeId’ would be a best option here. A high number of logical partitions is not a problem for CosmosDB. You can also have documents that have different structures related to the same key e.g. ‘employeeid’, within the same logical partition.

Is CosmosDB better than SQL Server?

This is perhaps too binary a question. It depends on the types of data in question and what you want to do with this data. Financial data, such as your bank account information and the related customer data is the kind of information which we could call High Value Transaction data, and for this data integrity (ACID) is important. This is where SQL Server and Relational Databases are still king. However, NoSQL comes into its own for high-volume transactions or rapidly changing schemas, such as social media posts or IoT (Internet of Things) data.

It is important to remember that it does not have to be one or the other for an organisation/department. You can have both and tailor the right tool (database technology) for the right job. Here are some examples of when we should use either technology.

Choose SQL Server/Azure SQL Database if:

Your application requires a structured, predefined schema and strict data integrity e.g. as mentioned above for financial data.
You need robust support for complex queries, joins, and transactions that span multiple tables e.g. inventory systems.
Your team has existing expertise with the Microsoft SQL stack and tools. The retraining costs can be significant. It’s important to weigh up the cost of training and the costs of moving to CosmosDB.
Your workload is predictable and does not require global, low-latency distribution out of the box.

Choose Azure Cosmos DB if:

You need a schema-agnostic, flexible data model (e.g. for IoT data, social media feeds, or mobile apps).
Your application requires global distribution with multi-master writes and guaranteed low latency. Structure can slow the data down. If speed is of the essence CosmosDB could well be the answer.
You need elastic and automatic scaling to manage massive amounts of data and unpredictable traffic spikes. CosmosDB comes into its own here with how the data via a partition key, can horizontally scale the data. But it can also vertically scale by adding available Request Units (RU’s)
You are building modern cloud applications that benefit from a serverless approach.

CosmosDB – How is it different?

We can now see when to pick CosmosDB (and just as importantly, when not too) and the main drivers will usually be performance and flexibility. We all know that Microsoft is not a charity, so how does this work in the real world? CosmosDB is costed by as mentioned above by Request Units or RUs, think DTU’s in Azure SQL Server and is best compared to CPU within an On-Premises environment. Each query, insert, update, or delete that you run against your database will have an RU cost.

This is why when you query CosmosDB, an inefficient query can be costly in terms of RU usage. When on premise this can mean that the query will just be returned more slowly. Within CosmosDB, the higher the RU’s usage will mean a higher cost to the customer.

The Maximum available Units are set per database/container and can be scaled up/down via a process such as PowerShell or autoscaling. With autoscaling, you set a minimum and a maximum level of RUs and at busy times it will scale up for an hour and then after the hour if the database is longer busy it will scale back down to the minimum setting. If however you know specific busy time periods, let’s say you know that there is activity between 9.30am – 10.40am, a process (PowerShell) could be automated to scale up and scale back down the CosmosDB and this way save on your Azure bill. With Auto Scaling it would be set at the maximum RUs for 2 hours. Using a schedule as in this case it would be at the maximum for 1 hour and 10 minutes.

It is important to note that you should not choose CosmosDB to save money as it can be an expensive option, especially if you are not careful how you query it. However, if you need global distribution, fast writes and the other features as mentioned above then it can be a great option.

One final thing to note is that for SQL Server the acidity/consistency of the data is of utmost importance. There is flexibility of course such as for AG’s where you can choose a synchronising state for the data, but in general the data will all be in-sync. With CosmosDB there are consistency choices that can be made. These range from Strong Consistency, which means all data is synchronously replicated and so is similar to SQL. The weakest consistency level is Eventual , which does not have any ordering guarantees for the data – For more information including a great video please look at the following Microsoft link – https://learn.microsoft.com/en-us/azure/cosmos-db/consistency-levels. Remember, the stronger the consistency the slower your CosmosDB will be.

Coeo and how can we help you?

Where you have already implemented CosmosDB we have our own bespoke azure monitoring solution where we can monitor Service Availability, RU consumption, Request Latency. In addition, we can also help with RU configuration and Performance Tuning, Networking, Security, Backups & Restores.

Coeo can also help you with your migration path to CosmosDB and help to improve the chances of a successful journey towards a NoSQL future.

Conclusions

How can I find out if CosmosDB is right for me? Well, there is a free version/tier of CosmosDB. Details of this version can be found here – https://learn.microsoft.com/en-us/azure/cosmos-db/free-tier. This allows you to have up to 1000 RUs (sounds a lot but can be soon used up) and up 25GB of storage. It can also be linked to a free Azure account to get up to 1400 RU’s and 50GB of storage. However, if this not enough, then there is an option to pay for more on top. This is certainly a good option for the developers to evaluate CosmosDB and see if it is a best option for your data. I have tried this out myself and it is easy to setup a test database/containers. There are other NoSQL Options e.g. Mongo & Amazon DynamoDB, which could well be the right option for you.

SQL Server is not dead and is still thriving. However, with NoSQL there is another option for your data and indeed it may well be that once you have done the initial work with your data through CosmosDB that the data ends up in SQL Server as a cold storage option. The days of one Database technology to rule them all is no longer the case; both options have their strengths and weaknesses. So, choose the right option for your data’s journey and it could well be that you choose both.

Driving insight and value out of data

Data Strategy

Data Platform

Analytics

Digital transformation

Financial Services

Retail

Professional Services

Welcome to Coeo

Our Story and Values

Our Approach

Our Team

Featured

Chat with your Data in a Day

Fabric Analyst in a Day

Data and AI Leaders Round Table

Join Coeo

Life at Coeo

Current Vacancies

Data & AI Academy

CosmosDB vs SQL Server – Which is the right option for your data?

Introduction

CosmosDB/NOSQL database – Could this be the answer?

The JSON of it all

Partitioning Your Data is key for NoSQL

Is CosmosDB better than SQL Server?

Choose SQL Server/Azure SQL Database if:

Choose Azure Cosmos DB if:

CosmosDB – How is it different?

Coeo and how can we help you?

Conclusions

Author

Ben Huxtable-Smith

What’s new

Featured

AI Transformation Workshop

Transformative Analytics Workshop – Microsoft Fabric

SQL AI App in a Day

Driving insight and value out of data

Data Strategy

Data Platform

Analytics

Digital transformation

Financial Services

Retail

Professional Services

Welcome to Coeo

Our Story and Values

Our Approach

Our Team

Featured

Chat with your Data in a Day

Fabric Analyst in a Day

Data and AI Leaders Round Table

Join Coeo

Life at Coeo

Current Vacancies

Data & AI Academy

Introduction

CosmosDB/NOSQL database – Could this be the answer?

The JSON of it all

Partitioning Your Data is key for NoSQL

Is CosmosDB better than SQL Server?

Choose SQL Server/Azure SQL Database if:

Choose Azure Cosmos DB if:

CosmosDB – How is it different?

Coeo and how can we help you?

Conclusions

Author

Ben Huxtable-Smith

Contact us

Get in touch

What’s new

Featured

AI Transformation Workshop

Transformative Analytics Workshop – Microsoft Fabric

SQL AI App in a Day