This article intends to give some details about the query optimization process in SQL Server.
Introduction
The query optimization is very grinding work for the database professionals who desire to overcome performance issues of the queries. The reason for this problem is query optimization processes are a bit complicated and puzzling. In this context, understanding the query optimization architecture can help to resolve query performance issues more easily and we can also interpret the execution plans more logically.
- If you don’t have enough knowledge or you are a newbie about the execution plans, you can check out this article, How to read an execution plan with all the details
The query optimization process has formed a set of phases and each phase has different responsibilities. These phases are :
Parse -> Binding -> Simplification -> Trivial Plans -> Explore Search
In the following sections of the article, we will focus on these phases’ details.
Pre-requirements
In this article, we will use the AdventureWorks sample database.
Parse: Talk is cheap, Show me the code
When we submit a query in SQL Server, the first step will be to validate the query’s syntax. If the query is written correctly then the query parser produces a parse tree as output and the parse tree passes into the next stage of query processing. Otherwise, it will return a syntax error. The parse tree is an interior representation of the query.
Query binding: Everyone has a choice
The main responsibility of this step is validating the existence of the tables and columns and other metadata objects in the database that is used in the query. At the same time, the user permissions are checked in this step. In this step the aggregate and group bindings regulate. Another main responsibility of this phase is checking any cached plan exists for the submitted query. If any cached execution plan exists for the executed query, this query plan will be used and all the next steps will be skipped. After the query binding process, an input tree is produced as output. When we want to see this logical tree of a query, we can enable the trace flag 8605.
1 2 3 4 5 6 7 8 |
DBCC TRACEON (3604) SELECT SalesDetail.CarrierTrackingNumber FROM Sales.SalesOrderDetail SalesDetail WHERE SalesOrderID > 20 OPTION ( RECOMPILE, QUERYTRACEON 8605 ); |
Simplification: Autobots, Transform and Roll Out
In fact, this phase can be evaluated as the first step of query optimization. In this stage, the query optimizer analyzes the logical tree of the query and tries to eliminate the redundant parts of the logical tree that may cause to consume more resources. Simplification includes some sub-phases and the following are the major ones :
Constant folding helps to evaluate one or more expressions before the query is compiled and tries to simplify the possibility of them. In the following example, a complicated string expression will be evaluated and transformed into WHERE ProductNumber LIKE ‘AR%’.
1 2 3 4 |
SELECT Name,ProductNumber FROM [Production].[Product] WHERE ProductNumber LIKE CONCAT(SUBSTRING('AXXLPLR',1,1) , 'R') + '%' |
This expression transformation can be seen on the Predicate attribute of the clustered index scan operator.
Contradiction detection: In this sub-phase SQL Server detects the conflicts in the queries and removed these parts from the queries. For example, the query optimizer has information that there are not any matched rows for the where condition because of the declared check constraint on the SafetyStockLevel. This constraint prevents inserting a row with SafetyStockLevel value equals to 0 into the Product table. In this case, the optimizer does not read the Product table so that does not consume any I/O.
1 2 |
SELECT Name,ProductNumber FROM Production.Product WHERE SafetyStockLevel =0 |
As we can see clearly, no data read operation has been performed for this example query because contradiction detection eliminates this step.
The Predicate Pushdown: In this sub-phase, the optimizer tries to push down the filter specified in the where clauses. In the following execution plan example, we can see that the query reads only 270 rows.
The Domain Simplification: In this sub-phase, the optimizer tries to transform complex expressions into simple range expressions. For example, the following where clause will be evaluated as like WHERE ProductID BETWEEN 813 AND 1000
1 2 3 4 |
SELECT ProductID FROM Sales.SalesOrderDetail S WHERE (ProductID > 813 AND ProductID<950) OR (ProductID >= 950 AND ProductID<=1000) OR (ProductID=813) |
The Join Simplification: The goal of this sub-phase to remove redundant joins in the queries So that the optimizer saves up CPU, I/O, and memory resources for the query. For example, if we analyze the following query execution plan, we can not see any read operation related to the SalesOrderHeader table because the inner join statement will be removed. This is because there is a foreign key constraint between the two tables.
1 2 3 4 |
SELECT SD.CarrierTrackingNumber FROM Sales.SalesOrderHeader SO INNER JOIN Sales.SalesOrderDetail SD ON SO.SalesOrderID = SD.SalesOrderID WHERE SD.CarrierTrackingNumber = '4911-403C-98' |
Trivial Query Plans: Fast and Furious
Trivial query plans are used by the query optimizer to avoid cost-based optimization for simple queries. So that the query optimizer does not waste time for the full optimization step and at the same time bypasses this process. When we execute the following query, the optimizer will generate a trivial plan.
1 2 3 4 5 6 |
SELECT p.ProductID, p.Name, p.Color, p.Size FROM Production.Product p LEFT JOIN Production.ProductSubcategory s ON p.ProductSubcategoryID = s.ProductSubcategoryID WHERE P.Name ='Adjustable Race' |
This situation can be seen on the Optimization Level attribute of the select operator.
The trivial plans never generate parallel query plans and never suggest any missing indexes. We can disable the trivial plan generation with the help of the trace flag 8757. When we execute the same query with this query trace flag, the optimizer will complete all optimization cycles and generates an optimum plan.
1 2 3 4 5 6 7 |
SELECT p.ProductID, p.Name, p.Color, p.Size FROM Production.Product p LEFT JOIN Production.ProductSubcategory s ON p.ProductSubcategoryID = s.ProductSubcategoryID WHERE P.Name ='Adjustable Race' OPTION ( QUERYTRACEON 8757 ); |
Explore Search Phases: Show me the money
The query optimizer performs a cost-based optimization and it takes the tables, indexes, statistics, and constraints as input and then begins to calculate a cost for all the possible query plans. According to this cost, the optimizer decides the optimum execution plan. All these operations are performed in this phase and it is the most complicated stage of the query optimizer. Explore Search phase contains 3 sub-phases:
- Phase 0: In this sub-phase, the optimizer tries to explore basic optimizer rules and it also considers using the nested loop or hash join types
- Phase 1: In this sub-phase optimizer tries to find out more rules, and alternate join ordering
- Phase 2: This is the final sub-phase and all alternative plans are evaluated in this phase all effort is to find an optimum query plan
Now, let’s reinforce this theoretical information with some examples. The following query execution plan will be found before phase 2 because the Reason For Early Termination Of Statement Optimization attribute shows the Good Enough Plan Found. This means that before Phase 2 a good plan has been found by the optimizer and the query optimization process is terminated.
1 2 3 |
SELECT * FROM Sales.SalesOrderDetail WHERE ProductID = 776 OPTION (RECOMPILE) |
However, the exact phase information can be found only using the sys.dm_exec_query_optimizer_info view. This view store detailed statistics about the query optimizer.
1 2 3 4 5 |
select * from sys.dm_exec_query_optimizer_info where counter like 'search%' SELECT * FROM Sales.SalesOrderDetail WHERE ProductID = 776 OPTION (RECOMPILE) select * from sys.dm_exec_query_optimizer_info where counter like 'search%' |
As we can see, the search 1 occurrence value increases, and this case indicates that the query execution plan finds in phase 1.
On the other hand, sometimes the optimizer terminates exploring an optimum query plan for the complex queries because after all attempts it could not find an optimum execution plan. In this circumstance, the Reason For Early Termination Of Statement Optimization attribute shows the Time Out.
We can find out this case by using the sys.dm_exec_query_optimizer_info view.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
select * from sys.dm_exec_query_optimizer_info SELECT soh.[SalesPersonID] ,p.[FirstName] + ' ' + COALESCE(p.[MiddleName], '') + ' ' + p.[LastName] AS [FullName] ,e.[JobTitle] ,st.[Name] AS [SalesTerritory] ,soh.[SubTotal] ,YEAR(DATEADD(m, 6, soh.[OrderDate])) AS [FiscalYear] FROM [Sales].[SalesPerson] sp INNER JOIN [Sales].[SalesOrderHeader] soh ON sp.[BusinessEntityID] = soh.[SalesPersonID] INNER JOIN [Sales].[SalesTerritory] st ON sp.[TerritoryID] = st.[TerritoryID] INNER JOIN [HumanResources].[Employee] e ON soh.[SalesPersonID] = e.[BusinessEntityID] INNER JOIN [Person].[Person] p ON p.[BusinessEntityID] = sp.[BusinessEntityID] OPTION (RECOMPILE) select * from sys.dm_exec_query_optimizer_info |
In order to avoid these types of issues, we can use trace flag 8780, this flag can give more time to the optimizer to evaluate more alternative execution plans.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
SELECT soh.[SalesPersonID] ,p.[FirstName] + ' ' + COALESCE(p.[MiddleName], '') + ' ' + p.[LastName] AS [FullName] ,e.[JobTitle] ,st.[Name] AS [SalesTerritory] ,soh.[SubTotal] ,YEAR(DATEADD(m, 6, soh.[OrderDate])) AS [FiscalYear] FROM [Sales].[SalesPerson] sp INNER JOIN [Sales].[SalesOrderHeader] soh ON sp.[BusinessEntityID] = soh.[SalesPersonID] INNER JOIN [Sales].[SalesTerritory] st ON sp.[TerritoryID] = st.[TerritoryID] INNER JOIN [HumanResources].[Employee] e ON soh.[SalesPersonID] = e.[BusinessEntityID] INNER JOIN [Person].[Person] p ON p.[BusinessEntityID] = sp.[BusinessEntityID] OPTION (RECOMPILE, QUERYTRACEON 8780) |
Conclusion
In this article, we have talked about query optimization phase details. When we want to boost the query performance, we have to understand the execution plan with all details because lots of question answers are hidden under it. Understanding the query optimization process steps internals will be very helpful. In this way, we can interpret the created plans more meaningfully. Now let’s quickly recall these phases for the last time :
Parse -> Binding -> Simplification -> Trivial Plans -> Explore Search
- SQL Performance Tuning tips for newbies - April 15, 2024
- SQL Unit Testing reference guide for beginners - August 11, 2023
- SQL Cheat Sheet for Newbies - February 21, 2023