In this article, we will look briefly at the Azure Cosmos DB service and how you can query JSON documents to store, retrieve and process data using the SQL Application Programming Interface (API).
Introduction
Azure Cosmos DB is a multi-model PaaS offering on Microsoft Azure. It is a NoSQL globally distributed highly available database system with less than 10ms of latency for both reads and writes.
It works on the concept of Request Units per second (RU/s) for managing the performance and cost of databases, in short, the amount of computation and I/O resources required to perform a simple read request. We can provision throughput on databases or containers. We will walk through how we can set it up later while creating a Cosmos DB account.
Azure Cosmos DB is great for IoT, retail, marketing, web, and mobile applications. Apart from these, it is also used for mission-critical applications running globally like Skype, Xbox, and Azure. Additionally, Azure Cosmos DB offers not just the Strong and Eventual consistency levels but also a range of other levels including Bounded Staleness, Session and Consistent Prefix. To read more information on Consistency levels, you can check this Microsoft documentation.
Azure Cosmos DB supports several APIs like Core (SQL), Cassandra (to access Columnar data), Gremlin (for Graph data), MongoDB API (for Document data), and Azure Table (Azure Table Storage for Key-value data) that can be used to access a variety of data. This makes the job of various data professionals easier to work with Cosmos DB. We will be looking at Core (SQL) API in this article.
Azure Cosmos DB is generally available in Documents. It allows us to access these documents using the above interfaces. You can consider a document as a collection of fields provided with a Key, and it uses a structured JSON format with no pre-defined schema. That means, the fields don’t have to be the same in every entity. Documents in an Azure Cosmos DB are organized into containers, they are further grouped into partitions in Containers. We will be creating these shortly in a lab.
Pre-requisites
To follow along with this article, you will either need to have an Azure subscription or Cosmos DB trial account. With the latter, you will have 30 days of free access with no subscription required.
Azure Cosmos DB account
Let’s quickly jump to the Azure portal and quickly create a Cosmos DB account. Search for Azure Cosmos DB and click Add. This will bring you to a new Create Azure Cosmos DB Account window. Provide in detail like the Resource Group, Account Name, API – Core (SQL), Location, etc. and click on Review + create.
Quick note: Core (SQL) API is the native API in Cosmos DB and is also called SQL API. It supports a SQL-like query language that allows retrieving documents using SELECT and other basic SQL commands.
If you want a step-by-step guide, I would recommend you go through this great article, Start your journey with Azure Cosmos DB, to create a Cosmos DB in detail.
You can choose between Provisioned throughput and Serverless (preview). We have selected the former that gives you 400 RU/s for free.
Once you click on Review + create, it will validate all the details, and then select Create button to create this resource.
Click on Go to resource button to navigate to this newly created service. You can see below, Azure Cosmos DB named, apisqlcosmosdb is created and different statistics can be seen on the Overview tab as shown below.
Adding a database and container to the Azure Cosmos DB account
Click Data Explorer blade on the left side and click on New Container and a new Add Container window gets opened on the right. We need to provide Database id (Employees), Container id (Personnel) and Partition key (/employeeId). If you notice below, we have an option to set Throughput at both the database or container level, the minimum throughput that can be allocated to a database or container is 400 RU/s.
You can scroll down to fill in other details.
There are two options (Autoscale and Manual) to provision Throughput. We are selecting 400 RU/s for now. In case, you predict to have large workloads in the future, you need to increase the throughput to avoid getting an HTTP 429 error (Request rate is too large).
Finally hit OK to create the Employees database and Personnel container in it.
Once done, expand DATA under your SQL API account, and you would find the Items option under the Personnel container. Click on it and you will see New Item in the top menu bar to add data to your container.
Since items are stored as JSON, we will add three records in JSON format that would include hypothetical employees’ personal information like name, age, gender, pets they own, phone numbers etc. Here, streetAddress, city and state attributes are all nested in the Address property.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 |
{ "employeeId": 101, "firstName": "Brian", "lastName": "Woods", "Age": 33, "Gender": "Male", "Pets": [ "Fish", "Dog", "Cat" ], "phoneNumbers": [ { "type": "work", "number": "5563459874" }, { "type": "home", "number": "1453459874" } ], "Address": { "streetAddress": "300", "city": "Jersey City", "state": "New Jersey", "postalCode": "07028" } } { "employeeId": 102, "firstName": "Maria", "lastName": "Torres", "Age": 37, "Gender": "Female", "Pets": [ "Dog", "Cat" ], "phoneNumbers": [{ "type": "work", "number": "8954598740" }], "Address": { "streetAddress": "568", "city": "California", "state": "San Francisco" } } { "employeeId": 103, "firstName": "Robin", "lastName": "Watson", "Age": 31, "Gender": "Male", "Pets": [ "Hamster", "Cat" ], "phoneNumbers": [ { "type": "Travel", "number": "7856945987" } ], "Address": { "state": "New Jersey", "postalCode": "07028" }} |
You can see three employee ids (3 items) added as shown below:
Querying JSON data using the SQL API
Now comes the main part of this article, i.e. learning to work with JSON data using SQL query language in an Azure Cosmos DB account. Click on the New SQL Query icon on the top menu bar to open a query window. We will start with the basic queries using SELECT, WHERE, ORDER BY, TOP, Between and IN clauses, and further understand Joins, aggregations, arrays, different formats in which we can retrieve the output.
SELECT clause
Type “Select * FROM Personnel” and click on Execute Query button on the top right-hand side. This will return all the records present in the container as shown below:
SELECT clause using dot(.) notation for Projections
In order to retrieve only selected properties from the documents, you can use (.) notation in the SELECT clause. Projection helps us limit the properties that are returned in the result.
- Quick note: The attribute name after (.) is also case sensitive
1 |
SELECT p.firstName, p.lastName, p.Age FROM Personnel p |
You can also use the quoted notation (“) in place of (.) and it will give the same results. One example is shown below:
1 |
SELECT p["firstName"], p["lastName"], p["Age"] FROM Personnel p |
FROM clause with Property names
You can use the expression <containername>.<propertyname> in the FROM clause to return the list of properties/fields. Below are some examples:
1 |
SELECT * FROM Personnel.firstName |
1 |
SELECT * FROM Personnel.Address |
WHERE and Between clauses
Just like SQL queries, the WHERE clause helps to apply a filter on the JSON data as well in order to retrieve selected data. For example, you want to return data based on the age of an employee. We have used BETWEEN clause to output the range of such data.
1 2 |
SELECT p.firstName, p.lastName, p.Gender, p.Address.city FROM Personnel p WHERE p.Age between 36 AND 38 |
TOP and ORDER BY clauses
The below query will show how you can use the TOP and ORDER BY clause to display the top 2 results in the descending order of employee ids.
1 2 3 4 |
SELECT Top 2 p.employeeId,p.firstName, p.Age, p.Gender FROM Personnel p WHERE p.Gender = 'Male' ORDER BY p.employeeId DESC |
The above clauses were straight-forward and the only point to keep in mind here is even though the language is SQL, the output is in JSON format.
Aliasing and JSON expressions
While working with data, data professionals often have to format data that is originally stored in Cosmos DB. One such example is shown below where the city is retrieved as CityName using the alias keyword, AS and since nothing was provided for Postal Code and Pets properties, the clause used implicit argument variable names $1 and $2 respectively for them. We are fetching results for only New Jersey state.
1 2 3 4 5 6 |
SELECT p.Address.city AS CityName, {"Postal Code": p.Address.postalCode}, {"Pets": p.Pets}, {"Emp Number":p.employeeId , "Gender":p.Gender, "Current Age" : p.Age} EmployeeInformationfromtheSameState FROM Personnel p WHERE p.Address.state = 'New Jersey |
Working with arrays in JSON using the SQL API
In the JSON items used in this article, we have Address property as the nested property and apart from that JSON also supports arrays. For example, property phoneNumbers is an array with type and number as its attributes. Like for any array, we access them by referencing its position [0],[1],[2].. and so on. The following query retrieves records for employees whose phone number type is Travel. [0] indicates, it scans the first record in the phoneNumbers array.
1 2 3 |
SELECT p.employeeId, p.firstName, p.Gender, p.phoneNumbers[0].number FROM Personnel p WHERE p.phoneNumbers[0].type = 'Travel' |
Using JOINS
In conventional relational database systems, Joins are usually applied on Keys like Primary Keys or Foreign Keys, however, containers in Azure Cosmos DB contain JSON items that are schema-free, so we are basically working with the denormalized data model and self-joins here. Below are a few JOINS examples using the SQL API.
Here, this query returns the selected properties with the age of employees more than 30 and ordered by lastName attribute.
1 2 3 4 5 |
SELECT p.employeeId,p.firstName,p.lastName, p.Pets FROM Personnel p JOIN p.Pets WHERE p.Age > 30 ORDER BY p.lastName |
The below query displays a more conventional form of Join using the IN keyword and helps to show results based on the type of phone numbers (Work/Travel/Home) available for employees.
1 2 3 |
SELECT p.employeeId, ph.type AS PhoneTypeAvailable FROM Personnel p JOIN ph IN p.phoneNumbers |
Another interesting example could be the below where we want to find ‘work’ phone numbers of employees in the company.
1 2 3 4 |
SELECT p.employeeId, ph.number FROM Personnel p JOIN ph in p.phoneNumbers where ph.type = "work" |
Aggregates in JSON
Aggregations play a crucial role in the data world. The SQL API supports various aggregation functions like SUM and AVG (for numeric properties) and COUNT, MIN, MAX (for strings and numbers). The following query shows an aggregation example that calculates the average age and count of employees.
1 |
SELECT AVG(p.Age) AverageAge, COUNT(p.employeeId) FROM Personnel p |
Using Group By clause
Like in SQL, Group By clause is used to arrange similar data in different groups, using various functions like Count, Min, Max etc. The following example grouped the employees based on the gender property i.e. Male or Female.
1 2 3 |
SELECT p.Gender, Count(1) AS NOofEmployees FROM Personnel p GROUP BY p.Gender |
However, there are some limitations for using the GROUP BY clause in Azure Cosmos DB, for example, the ORDER BY clause cannot be used along with it. For more information on this, check out this Microsoft documentation.
Conclusion
We began this article learning briefly about Azure Cosmos DB and the different facets associated with it. Along the way, we explored how an account can be created and added JSON data to a container. And finally, we learned how we can work with JSON documents to store, retrieve, and process data in the Cosmos DB account using the SQL API.
- Oracle Substring function overview with examples - June 19, 2024
- Introduction to the SQL Standard Deviation function - April 21, 2023
- A quick overview of MySQL foreign key with examples - February 7, 2023