Index strategies overview
This article is about techniques for optimizing the SQL Server indexes strategy. It is an appendix of the SQL index overview and strategy article in which I covered different areas like what indexes actually do, how to create them, and I briefly mentioned some index design guidelines. Furthermore, I also presented an example of how to design them by tuning and optimizing queries, so I’ve really tried to cover all but there is always more when it comes to SQL Server indexes.
Note: To see what are the best practices for improved performance tuning, please read 11 SQL Server Index Best Practices for Improved Performance Tuning article.
So, this is going to be another brief article to go over a lot of important areas and take a high-level look at what kind of strategy we should employ for each major kind of index.
If you head over to the initial article that I put as a “plug” at the beginning and read it, you’ll notice that some of this stuff I already covered but we’re going to look at it again from a higher level and show some different examples, so we can get another perspective of how all this stuff works.
Clustered indexes
To kick off, let’s see what clustered SQL Server indexes are and how they should be used. A clustered index is one that physically stores the data sorted on disk by the key. Now, before we continue, you most likely have heard of clustered index and have some understanding of what it is, how it works, etc. but I bet not everyone has heard about heap. Heaps are tables that don’t contain a clustered SQL Server index. They are just a pile of data, neither sorted logically or physically in any order. Having said that, it brings up the question is there any situation where we should or would have use heap? It is obviously bad. In other words, this is roughly translated to is there any situation where we wouldn’t use an index. The answer to this question is rarely. The best-case scenario would be to put a clustered SQL Server index on every table but there are cases when a table is so small that it simply won’t be beneficial to have an index on it because SQL Server would choose to do a table scan anyway, so the cost of having and managing the index would be greater than not having one.
The bottom line is the heaps are bad because they represent unorganized data. They cause a lot more I/O for SQL Server to handle and the best way to organize this pile of unorganized data is to simply put a clustered index on it.
Having a clustered SQL Server index on a table means the data will be physically stored and organized in order on disk by the key. The key is simply a field that we choose when creating a clustered index and it’s also going to form the base for all other non-clustered indexes which will essentially store pointers to where that data sits inside the clustered indexes. Therefore, it important to ensure the following:
- Static – First of all, it’s really important to choose a static key or in other words one that doesn’t change. Otherwise, if we choose a field that is often modified then any time that clustered index is modified SQL Server has a lot of organizing to do at the physical level. In addition to this, all non-clustered indexes need to be changed too to reflect the changes
- Narrow – Having a narrow key for the clustered index is equity important because non-clustered indexes are going to store clustered indexes key in their index as a pointer. Meaning if we have a wide key, all the non-clustered indexes are going to store that wide key which will essentially require a lot more data pages which will ultimately require more I/O, memory, CPU, and more work for SQL Server to do
- Unique – This is a good SQL Server index. No matter if we choose a non-unique index key, SQL Server will always make it unique by appending four bytes to the end of it. How unique the key is directly connected to how narrow it is. If we don’t have a unique column, then we don’t have a very narrow column either
- Sequential – Finally, we always want a sequential key. The one that will always auto-increase. This will guarantee that whenever there is new data coming in, it will be placed in order (at the end of the index). Otherwise, if it’s not sequential, SQL Server would have a lot of shifting to do which creates fragmentation, and potential problems along the way
So, when choosing a key for a clustered SQL Server index, ask yourself four questions. Is it all from above? If it has all of the above characteristics, then you have a very good clustered index key.
Also, worth mentioning is that almost every table should have a clustered key except for the very rare cases when the heap is okay.
Identity columns are good because they have all of the good characteristics for an excellent clustered SQL Server index. The only downside of the identity column is that it doesn’t really describe data in any way. A good practice here would be if you can find a natural key, that’s great but if you can’t, don’t force it. Why? Because if you force it, there is a big chance that it will not have the best characteristics mentioned above.
GUID columns are also acceptable but the big problem with those is that they are not narrow – on the contrary, they are very wide. They will most likely solve the problem of generating unique IDs almost anywhere, but they are not a great clustered key and in addition to this, they are not sequential by default. The general rule of thumb here is to only use GUID columns if you have to because there is a bigger problem. Sure thing, you can use the new sequential ID but there’s a catch. For example, if you reboot a SQL Server, there’s a chance that it will start generating them sequentially from a point prior to the most recent one before the SQL Server is rebooted.
Multiple columns are also generally bad for SQL Server indexes, same as GUID columns, they are not narrow so we bump into the same issue with characteristics.
Bottom line is to try to stay away from composite keys. The only use case when it would be okay to use those is when using intermediate tables because usually when we create an intermediate table, it will have a few keys in there, so it can represent many-to-many relationships and a lot of time those are going to be integers. So, a couple of integers as a composite key is okay because it’s still going to be pretty narrow. But then again if you choose a composite key that is comprised of natural keys then you can get into trouble. So, the ultimate advice is just to pay attention to those four characteristics as this is definitely the best strategy.
Non-clustered indexes
Non-clustered SQL Server indexes are dependably performing hard work when it comes to the performance of our databases. They represent a separate structure that is logically sorted and they just point to the physical data. This means that a non-clustered index can point to either a heap or a clustered index. If it’s a clustered index, it’s going to use a key as its pointer, and if it’s a heap it’s going to use row ID as its pointer.
Non-clustered SQL Server indexes are most commonly used for searching columns. A good practice would be to analyze our queries every time we define an index, look for the predicates (a logical condition being applied to rows in a table), also look at all the filters in the Where clause and try to create indexes that include anything that is inside the Where clause.
If we do this right, that will increase chances that it can pull all the data from the actual indexes rather than have to go into the data pages which will ultimately give us huge performance gain.
Filtered SQL Server indexes are also good because they allow us to create an index on a subset of data and this is accomplished by putting a Where clause inside an index. These are great for sparse columns (ordinary columns that have an optimized storage for null values) and popular subsets of data.
Last but not least, one of the greatest strategies when it comes to non-clustered indexes is indexing foreign keys. This is pretty much the best practice and you should always do it.
Columnstore indexes
Columnstore SQL Server indexes are used to speed up access to a very large amount of data on large tables. Columnstore index is basically a vertical read-only index. To understand what this means, think of a picture where data is stored horizontally on data pages. This is how most of the relation engines stores data at the record level. This means field by field, column by column… the entire row is stored in a data page.
On the other hand, with the vertical index, it just stores a column. Performance wise, when SQL Server loads data pages into memory, it could be loading a lot of unnecessary data to extract what it needs. When only the column’s data is loaded we have efficient use of memory and SQL Server has a lot less to do because it’s loading only what it needs into memory.
Strategy guidelines
How many SQL Server indexes should a table have is the most asked question. The rule of thumb is to keep it under 10 for OLTP environments. Depending on the type of table, the indexing can vary and be less or more than 10. For example, let’s say there is a table that is rarely hit with Inserts, Updates, and Deletes than we can go above 10. On the other side, if a table is extremely busy then it’s a good idea to go with far more less than 10. In general, we should stick to under 10 indexes for OLTP environments.
In a world of data warehouses AKA OLAP, the above rule is the opposite. OLAP is characterized by a relatively low volume of transactions. So, what that means is there should be no activity or much less activity in our data warehouses. The only thing to be considered here is performance. Therefore, we should generously index OLAP environments.
Furthermore, don’t forget three important things to deploy to decrease fragmentation. Try to specify fill factor for a SQL Server index. If we have a table that is hit very often, and in addition to this there is high fragmentation, then this means a lot of page splitting is going under the hood. So, we can decrease the fill factor by leaving empty spaces on the pages which will ultimately lead to decrease fragmentation. Also, don’t forget to choose the right data types and specify default when you can because these will also lower the fragmentation.
Conclusion
I would wrap things up with this and recommend a few other articles for reading that I wrote on indexing subject with the goal of boosting the performance of SQL Server. If you’re interested, go ahead and read the following articles:
I hope this article on optimizing SQL Server index strategies has been informative and I thank you for reading.
- Visual Studio Code for MySQL and MariaDB development - August 13, 2020
- SQL UPDATE syntax explained - July 10, 2020
- CREATE VIEW SQL: Working with indexed views in SQL Server - March 24, 2020