Depending on the performance problem cause, fixing poor SQL query design can be quick or time consuming. It’s recommended to follow some general recommendations and best practices while writing queries, so you don’t have to rewrite and optimize them later.
Factors that affect query performance
Query performance also depends on data volume and transaction concurrency. Executing the same query on a table with millions of records requires more time that performing the same operation on the same table with only thousands of records.
A lot of concurrent transactions can degrade SQL Server performance. The shorter the queue of transactions that wait to be processed, the better performance.
Executing a SQL query to retrieve records from multiple joined tables with small sets of data in a sandbox is quick, but running the same query in production with millions of rows in each joined table and multiple users accessing the same tables and data can add significant pressure. That’s why sometimes developers are not fully aware of query performance.
To be able to see how a SQL query performs in real production environment, it’s necessary to provide the same conditions as in the production environment. Otherwise, the potential underperforming can be masked.
We’ll use STATISTICS TIME to show the number of milliseconds required to parse, compile, and execute a SELECT statement.
We’ll execute a simple SELECT statement:
1 2 3 4 5 6 |
SET STATISTICS TIME ON SELECT [AddressID] [int], [AddressLine1], [AddressLine2], [City] FROM [Person].[Address] SET STATISTICS TIME OFF |
When executed on a million row table, it takes approximately 13 seconds:
SQL Server Execution Times:
CPU time = 1031 ms, elapsed time = 13263 ms.
When executed on a two million row table, the time practically is doubled:
(2000000 row(s) affected)
SQL Server Execution Times:
CPU time = 2140 ms, elapsed time = 26961 ms.
STATISTICT TIME shows two execution times. The first one is the CPU time, it presents the number of milliseconds the processor has been processing the data. The second time named ‘elapsed time’ is the time needed to show the results in the grid. When working with a large set of records, showing the records in the grid lasts much longer than retrieving them from the table.
Don’t retrieve more data than necessary
When retrieving data from SQL Server tables, don’t retrieve more than you need. Obtaining excessive data is resource expensive and time consuming.
We’ll use the Person.Address table in the AdventureWorks database:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
CREATE TABLE [Person].[Address]( [AddressID] [int] IDENTITY(1,1) NOT FOR REPLICATION NOT NULL, [AddressLine1] [nvarchar](60) NOT NULL, [AddressLine2] [nvarchar](60) NULL, [City] [nvarchar](30) NOT NULL, [StateProvinceID] [int] NOT NULL, [PostalCode] [nvarchar](15) NOT NULL, [SpatialLocation] [geography] NULL, [rowguid] [uniqueidentifier] ROWGUIDCOL NOT NULL, [ModifiedDate] [datetime] NOT NULL, CONSTRAINT [PK_Address_AddressID] PRIMARY KEY CLUSTERED ( [AddressID] ASC ) ON [PRIMARY]) |
We’ll add random records to the table, so that it contains 1 million records:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
DECLARE @i int SET @i = 0 WHILE @i < 1000000 BEGIN INSERT INTO Person.Address( AddressLine1, AddressLine2, City, StateProvinceID, PostalCode, rowguid, ModifiedDate ) VALUES(CONVERT(nvarchar(60), NEWID()) , CONVERT(nvarchar(60), NEWID()) , rand () *5 , rand (), rand () *7, NEWID(), DATEADD (day, (ABS(CHECKSUM(NEWID())) % 1625), GETDATE()) SET @i = @i + 1 END |
If you need for example IDs of the addresses modified before 2015, retrieve only that information. Don’t retrieve their addresses, cities, postal codes, or dates they were modified.
We’ll start with the worst case – selecting all columns and all rows:
1 2 3 4 5 6 |
SET STATISTICS TIME ON SELECT * FROM Person.Address SET STATISTICS TIME OFF |
The statistics shows:
(1000000 row(s) affected)
SQL Server Execution Times:
CPU time = 2219 ms, elapsed time = 20917 ms.
Now, we’ll retrieve only two columns – the address ID and modified date where the year condition should be applied:
1 2 3 4 5 6 |
SET STATISTICS TIME ON SELECT AddressID, ModifiedDate FROM Person.Address SET STATISTICS TIME OFF |
The processor time is almost four times smaller:
(1000000 row(s) affected)
SQL Server Execution Times:
CPU time = 578 ms, elapsed time = 10219 ms.
Finally, we’ll retrieve only the records needed – IDs of the addresses modified before 2015:
1 2 3 4 5 6 7 |
SET STATISTICS TIME ON SELECT AddressID FROM Person.Address WHERE YEAR (ModifiedDate) < 2015 SET STATISTICS TIME OFF |
(158956 row(s) affected)
SQL Server Execution Times:
CPU time = 515 ms, elapsed time = 3286 ms.
The number of the returned rows is approximately 6 times smaller, the number of the columns retrieved also smaller than in the previous example, yet the processor time is only 10% shorter. That’s due to the YEAR built-in function used on a non-index column.
If possible, avoid using functions (both built-in such as YEAR, LEFT, etc. and user-defined) in the WHERE clause:
1 2 3 4 5 6 7 |
SET STATISTICS TIME ON SELECT AddressID FROM person.Address1 WHERE ModifiedDate < '2015/01/01' SET STATISTICS TIME OFF |
(158956 row(s) affected)
SQL Server Execution Times:
CPU time = 298 ms, elapsed time = 2930 ms.
As shown, the narrower the results set is, the less time is needed to retrieve it. More resources are available for other operations and there will be fewer chances for bottlenecks.
However, pay attention to the column you use in the function used in the predicate (as the parameter in the WHERE condition). In the example above, we used a non-index column. If the clustered index column is used, the time needed will be longer:
1 2 3 4 5 6 7 |
SET STATISTICS TIME ON SELECT AddressID FROM Person.Address1 WHERE LEFT (AddressID,6) < 158956 SET STATISTICS TIME OFF |
(158956 row(s) affected)
SQL Server Execution Times:
CPU time = 845 ms, elapsed time = 3788 ms.
As shown, for the same number of records, the processor requires almost 300% more time
WHERE condition, 158,956 records retrieved | |||||
SELECT * |
SELECT <column list> |
ModifiedDate < ‘2015/01/01’ |
YEAR (ModifiedDate) < ‘2015’ |
LEFT (AddressID,2) < 12000 |
|
Processor time in ms |
2,219 | 578 | 298 | 515 | 845 |
In this article, we’ve presented the factors that affect SQL query performance and gave some general guidelines for testing. We showed how much processor time is used for retrieving different amount of records. In the part 2 of this article, we’ll show how using indexed views instead of adding indexes to tables affects SQL performance.
- Using custom reports to improve performance reporting in SQL Server 2014 – running and modifying the reports - September 12, 2014
- Using custom reports to improve performance reporting in SQL Server 2014 – the basics - September 8, 2014
- Performance Dashboard Reports in SQL Server 2014 - July 29, 2014