Introduction
Though the Common Table Expressions (CTE) were introduced to SQL Server more than a decade ago with the SQL Server 2005 version, still this is not much utilized by database developers due to the unawareness. This article provides what and how you can utilize the CTE effectively and efficiently.
What is a Common Table Expression (CTE)
Common Table Expressions can be explained as a temporary view. However, unlike the view, common table expression is not physical. This means that CTE is valid only to the scope of the query. However, you can write a CTE inside a stored procedure or User Defined Functions (UDFs) or triggers or views. However, you cannot implement CTEs inside indexed views.
First, let us see the syntax of CTE. Let us use the Microsoft Sample database, AdventureWorks for the demonstrations.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
WITH OrdersIn2011 AS ( SELECT SOH.SalesOrderID ,SOH.SalesOrderNumber ,P.NAME ProductName ,SOD.OrderQty ,SOD.LineTotal ,SU.NAME SubCategory ,S.NAME Category FROM sales.SalesOrderDetail SOD INNER JOIN sales.SalesOrderHeader SOH ON SOD.SalesOrderID = SOH.SalesOrderID INNER JOIN Production.Product P ON P.ProductID = SOD.ProductID INNER JOIN Production.ProductSubcategory SU ON SU.ProductSubcategoryID = P.ProductSubcategoryID INNER JOIN Production.ProductCategory S ON S.ProductCategoryID = SU.ProductCategoryID WHERE Year(OrderDate) = 2011 ) SELECT * FROM OrdersIn2011 WHERE OrderQty > 2 |
In the above query, we have used Order details and the name of the CTE is OrdersIn2011. This is equal to creating a view named Ordersin2011.
Common Table Expressions vs Temp Tables vs Table Variables
There are a few other options to store temporary data in SQL Server. Those options are CTEs, Temp Tables and Table Variables.
- Temp tables and table variables need explicit inserts to populate those objects while CTE does not need separate insert statements to populate. CTE will be populated with the definition itself
- Table variable scope is for the batch while CTE’s scope is only for the query. In the case of temp tables, the table is available for the session. There are options in temp tables where you can create ## temporary tables that can be assessable within multiple sessions
- CTE always uses memory whereas temp tables always use the disk. Table variable uses both. Let us verify this by means of write transactions/sec counter which is shown in the below figure
As you can see, CTE is completely utilizing the memory while the other two objects are using the disk. This means that we should not use CTE for a large volume of data.
Implementation for CTEs
As a developer, you need to understand where you can implement CTE for different purposes.
Recursion Queries
Recursion is one of the popular implementations for CTEs. Manager – Employee recursive relationship is one of the common examples. Let us create a table and populate some data.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
IF EXISTS ( SELECT 1 FROM sys.tables WHERE NAME = 'Employees' ) DROP TABLE Employees CREATE TABLE Employees ( EmployeeID INT PRIMARY KEY CLUSTERED ,EmployeeName VARCHAR(50) ,ManagerID INT ) ALTER TABLE [dbo].[Employees] WITH CHECK ADD CONSTRAINT [FK_Employees_Employees] FOREIGN KEY ([ManagerID]) REFERENCES [dbo].[Employees]([EmployeeID]) ALTER TABLE [dbo].[Employees] CHECK CONSTRAINT [FK_Employees_Employees] |
Then use the following query to populate this table.
1 2 3 4 5 6 7 8 9 10 11 12 |
INSERT INTO Employees VALUES (1,'Baker Downs',NULL), (2,'Clio Skinner',1), (3,'Emery Ford',2), (4,'Indira Cain',2), (5,'Robert Becker',4), (6,'Tyrone Chang',1), (7,'Evan Beck',5), (8,'Barrett Moore',6), (9,'Cameron Parker',5), (10,'Nissim Rhodes',6) |
In the above data, Baker Downs is the boss of the company. Suppose, you want to find out who all employees are reporting to Clio Skinner (EmployeeID =2). This can be achieved by using the Common Table Expression as shown in the below script.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
WITH EmployeeRecursion([EmployeeName],EmployeeID,ManagerID) AS ( SELECT [EmployeeName],EmployeeID,ManagerID From Employees WHERE EmployeeID = 2 -- @EmployeeID UNION ALL SELECT e.[EmployeeName],e.EmployeeID,e.ManagerID From Employees e INNER JOIN EmployeeRecursion r ON e.ManagerID = r.EmployeeID ) SELECT R.EmployeeID,R.[EmployeeName],E.[EmployeeName] Manager FROM EmployeeRecursion R INNER JOIN Employees E ON R.ManagerID = E.EmployeeID |
Output for the above script is shown in the following figure that shows the entire list of recursive employees.
Now let us say, you do not want the entire list, but you want to limit the level to 2. Then you can add the MAXRECURSION option to the query as follows.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
WITH EmployeeRecursion([EmployeeName],EmployeeID,ManagerID) AS ( SELECT [EmployeeName],EmployeeID,ManagerID From Employees WHERE EmployeeID = 2 -- @EmployeeID UNION ALL SELECT e.[EmployeeName],e.EmployeeID,e.ManagerID From Employees e INNER JOIN EmployeeRecursion r ON e.ManagerID = r.EmployeeID ) SELECT R.EmployeeID,R.[EmployeeName],E.[EmployeeName] Manager FROM EmployeeRecursion R INNER JOIN Employees E ON R.ManagerID = E.EmployeeID OPTION (MAXRECURSION 2) |
Though the above query will generate the correct result, it will raise the following error.
Msg 530, Level 16, State 1, Line 1
The statement terminated. The maximum recursion 2 has been exhausted before statement completion.
Since this query is generating the error, as a developer you will run into the issue of raising an exception. In case this query is in a transaction, the transaction will fail. However, including an additional recursive column called Level and adding a filter to it will solve the issue.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
WITH EmployeeRecursion([EmployeeName],EmployeeID,ManagerID,Level) AS ( SELECT [EmployeeName],EmployeeID,ManagerID,0 From Employees WHERE EmployeeID = 2 -- @EmployeeID UNION ALL SELECT e.[EmployeeName],e.EmployeeID,e.ManagerID,Level + 1 From Employees e INNER JOIN EmployeeRecursion r ON e.ManagerID = r.EmployeeID ) SELECT R.EmployeeID,R.[EmployeeName],E.[EmployeeName] Manager ,Level FROM EmployeeRecursion R INNER JOIN Employees E ON R.ManagerID = E.EmployeeID WHERE Level <= 2 |
Finding Duplicates
Duplicates are another issue that developers have to fix in their databases. Let us see how we can use CTEs to identify duplicates in a table. Let us duplicate a record in the employee table and run the following common table expression.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
WITH DuplicateEmployees AS ( SELECT MIN(EmployeeID) AS EmployeeID ,[EmployeeName] FROM [dbo].[Employees] GROUP BY [EmployeeName] HAVING COUNT(1) > 1 ) SELECT E.EmployeeID ,E.[EmployeeName] ,cte.EmployeeID ,cte.[EmployeeName] FROM [Employees] E INNER JOIN DuplicateEmployees cte ON E.[EmployeeName] = cte.[EmployeeName] WHERE E.EmployeeID <> cte.EmployeeID |
This is the output for the above query.
As you can see that the duplicates were detected from the above query.
Multiple CTEs
You can use multiple CTEs in a single query as shown below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
WITH Sales2012 ( ProductNumber ,Amount ) AS ( SELECT Prod.ProductNumber ,SUM(LineTotal) FROM Sales.SalesOrderDetail SOD INNER JOIN Sales.SalesOrderHeader SOH ON SOD.SalesOrderID = SOH.SalesOrderID INNER JOIN Production.Product Prod ON Prod.ProductID = SOD.ProductID WHERE YEAR(OrderDate) = 2012 GROUP BY Prod.ProductNumber ) ,Sales2011 ( ProductNumber ,Amount ) AS ( SELECT Prod.ProductNumber ,SUM(LineTotal) FROM Sales.SalesOrderDetail SOD INNER JOIN Sales.SalesOrderHeader SOH ON SOD.SalesOrderID = SOH.SalesOrderID INNER JOIN Production.Product Prod ON Prod.ProductID = SOD.ProductID WHERE YEAR(OrderDate) = 2011 GROUP BY Prod.ProductNumber ) SELECT Sales2012.ProductNumber ,ISNULL(CAST(Sales2012.Amount AS INT), 0) Sales2012 ,ISNULL(CAST(Sales2011.Amount AS INT), 0) Sales2011 ,CAST(ISNULL(Sales2012.Amount, 0) - ISNULL(Sales2011.Amount, 0) AS INT) SalesIncrement ,(ISNULL(Sales2012.Amount, 0) - ISNULL(Sales2011.Amount, 0)) * 100 / ISNULL(Sales2012.Amount, 0) Percentage FROM Sales2011 FULL JOIN Sales2012 ON Sales2011.ProductNumber = Sales2012.ProductNumber WHERE Sales2012.Amount > 0 OR Sales2011.Amount > 0 |
You will see that, you join CTEs like you join tables and views. Here are the results for the above query.
Ranking by Aggregates
You may have to rank by the aggregates. This is something that can be achieved by the CTEs as shown below.
1 2 3 4 5 6 7 8 9 10 11 12 |
WITH CustomerAgg AS ( SELECT CustomerID ,SUM(TotalDue) AS TotalAmount FROM Sales.SalesOrderHeader GROUP BY CustomerID ) SELECT * ,RANK() OVER ( ORDER BY TotalAmount DESC ) AS AmountRank FROM CustomerAgg |
The below is the resultset for the above query.
Conclusion
Common Table Expressions or CTEs are one of the techniques that can be used to store intermediate results in memory. There are a few use cases for CTEs such as recursive queries, identification of duplicates, joining multiple CTEs, ranking by aggregate are the common use cases for the CTEs. It is important to note that, we should not use CTEs when the data volume is high.
- Testing Type 2 Slowly Changing Dimensions in a Data Warehouse - May 30, 2022
- Incremental Data Extraction for ETL using Database Snapshots - January 10, 2022
- Use Replication to improve the ETL process in SQL Server - November 4, 2021