SQL Server cursor performance problems

Introduction

In a previous article we discussed the how to set up a basic cursor. We explained that a cursor is a row-based operation that takes a given SELECT statement and breaks downs the processing of the data into looping executions. People that do not have the background in Database Administration or who did not enjoy database classes while studying will often find SQL Server cursors handy and fun to write. This is because they break free of the shackles of set-based logic that is the normal when writing most T-SQL scripts. It is for this reason that we so often find SQL Server cursors written into the business logic of an application and it is a real pity because they are real performance hogs. Yes, there are times when cursors are OK to use and they can be tuned slightly by playing with their different types but, as a general rule of thumb, one should try to avoid them at all costs.

Performance problems

SQL Server cursors are notoriously bad for performance. In any good development environment people will talk about cursors as if they were demons to be avoided at all costs. The reason for this is plain and simple; they are the best way to slow down an application. This is because SQL Server, like any good relational database management system (RDBMS), is optimized for set-based operations. Take this simple SELECT statement as an example:

SELECT *

FROM AdventureWorks2012.Sales.SalesOrderDetail

WHERE ModifiedDate BETWEEN '2008-07-15 00:00:00.000' AND '2008-07-31 00:00:00.000'

When you write a SELECT statement like this (that returns 1051 rows) to fetch a bunch of data from that database the system receives the statement and creates or uses an existing query plan, then it uses indexes to locate the data on the disk, fetches the data in one foul swoop and returns the data as a set. If your indexes are correctly placed the query can be sped up. In the case above if the ModifiedDate field was included in an index it would run faster.

When running this query and turning time statistics on (SET STATISTICS TIME ON) one can see that the entire process takes less than a second:

SQL Server Execution Times:
CPU time = 15 ms, elapsed time = 87 ms.

Now let’s say you wanted (for some reason) to replace your WHERE statement with a variable that can be used to call a single row each time but 1051 times you can use a cursor to do so:

DECLARE @rowguidVar UNIQUEIDENTIFIER -- prepare unique ID variable to use in the WHERE statement below

DECLARE test_cursor CURSOR FOR

SELECT rowguid

FROM AdventureWorks2012.Sales.SalesOrderDetail

WHERE ModifiedDate BETWEEN '2008-07-15 00:00:00.000' AND '2008-07-31 00:00:00.000'

--This is the same query as above except we SELECT only the ID for each row

OPEN test_cursor

FETCH NEXT FROM test_cursor INTO @rowguidVar

--This is the start of the cursor loop.

WHILE @@FETCH_STATUS = 0

BEGIN

SELECT *

FROM Sales.SalesOrderDetail

WHERE rowguid = @rowguidVar

-- Here we select on row and then move onto the next row ID and loop

FETCH NEXT FROM test_cursor INTO @rowguidVar

END

CLOSE test_cursor

DEALLOCATE test_cursor

-- Don't forget these statements which flush the cursor from memory

Results:

This cursor will fetch exactly the same set of data but it does it on a row by row basis, and it takes heck of a lot longer to do so, as a matter of fact 48 seconds as opposed to 87 milliseconds, that’s 55172% slower! This is because the set-based logic for which RDBMS systems like SQL Server are optimized is completely broken and the entire query process has to be repeated for each row.

So why do they exist?

SQL Server cursors and any other type of cursors date back to before procedural programming languages could handle sets of data and required to be split into rows (E.g. COBOL, FORTRAN, old style C etc.) So in that regard they are just plain old-fashioned. However, other than for backwards compatibility they can still serve us well in the right situations. One such time would be when you want to write a script to restore a bunch of databases from backup files on a disk. In this case you can write a cursor to collect the database names and run a RESTORE DATABASE command for each database in a one-by-one fashion. Another time this may be useful is when you need to update an entire column of a large table that is constantly being queried in a production environment. Doing this on a row-by-row basis would avoid locks and waits for other users and your UPDATE query while concurrent operations are happening on the same pages of data. However, even in this case it is usually preferable to write a WHILE loop to update sets of data (i.e. on a 1000 by 1000 row basis). This would also avoid too many locks and would do the job quicker.

To illustrate how a while loop works I have massaged the above query example to return the same data again but this time, instead of row-by-row or a full set, it does something in between and returns 100 rows at a time.

DECLARE @GUIDS TABLE (rowguid UNIQUEIDENTIFIER PRIMARY KEY)

--Here we create an indexed table variable to store all the GUIDs

INSERT INTO @GUIDS

SELECT rowguid

FROM AdventureWorks2012.Sales.SalesOrderDetail

WHERE ModifiedDate BETWEEN '2008-07-15 00:00:00.000' AND '2008-07-31 00:00:00.000'

--Here we insert all our GUIDs into the variable/temporary table

WHILE EXISTS (SELECT rowguid

FROM @GUIDS)

--This is a basic WHILE loop that runs as long as there is data in the variable table

BEGIN

SELECT *

FROM AdventureWorks2012.Sales.SalesOrderDetail

WHERE rowguid IN (SELECT TOP 100 *

FROM @GUIDS)

--We SELECT the top 100 rows that are in our variable table

DELETE TOP (100)

FROM @GUIDS

--This deletes the 100 rows that we have just selected\

END

--If there is still data in the variable table we return to the BEGIN point and process the next 100

Even this clumsy WHILE loop is blisteringly fast in comparison to the SQL Server cursor approach. It takes less than a second but is closer to 800ms than 87ms as is the case for the pure set-based query.

Conclusion

People are right to loath cursors. If it becomes normal for you to use cursors in your T-SQL whilst building the business logic of an application you are heading off down a path to disastrous performance. Imagine, for example, you wrote a stored procedure that returns results based on a cursor and then you write another stored procedure using a cursor that calls the first one. This is called nested cursors and it is a perfect way to bog down expensive/performant server equipment with sloppy, badly performing code.

So, avoid cursors more than you would avoid your mother-in-law and only use them when you have mastered set-based T-SQL and you know that a row-by-row approach is needed and only for a one off maintenance/patch-script operation.

Useful resources

See more

Check out ApexSQL Plan, to view SQL execution plans, including comparing plans, stored procedure performance profiling, missing index details, lazy profiling, wait times, plan execution history and more

Author
Recent Posts

Evan Barke

Having worked on highly transactional production systems and advanced corporate business intelligence, Evan is now using this experience make a difference in the eHealth world. He is driven by the love of technology and a desire to solve complex problems creatively.

View all posts by Evan Barke

Introduction

Performance problems

So why do they exist?

Conclusion

Useful resources

See more

Related posts: