Background
We recently inherited a database environment where we’re facing significant data growth with limits on the sizes we can allow our databases to grow. Since we maintain multiple development, QA and production environments and these environments must be sized appropriately. We set a few standards about tables that exceed certain sizes – rows, data or both, or have certain growth patterns and our standards force compression at thresholds we set for our different environments (including using page, row, or clustered columnstore index compression) We’re looking for how we can get information about data and compression in tables and options we have so that we can quickly determine candidates that don’t currently match our best practices design we’ve set for our environments.
Discussion
In the below few images, we see a few properties of a table, such as the row count, the compression information, and the index storage. If we want to get information about one table and we manage one server, we could use the below properties to get this information. What if we manage hundreds of servers and want to get this information for a few tables, or a set of tables that are a part of a database?
We want to create a script that will return the information we need for any table based on the server and database that we pass to our function. In some cases, we may find tables without compression that are a significant amount of space, even though we seldom use them (even if we need the data). We may also discover tables with large space in index use relative to the size of the table; it’s unfortunately common for developers to add indexes to tables without considering that more indexes do not necessarily mean higher performance. We also may administer an environment with many developers and use this script to identify new objects that don’t conform to what we require in our environment.
We’ll be using the SQL management objects library (SMO library) for obtaining information for a table and we’ll begin our script by returning the row count before adding the option for other details. Returning table row counts can be used for many other purposes, like comparing two environments, so we’ll want the functionality to be separate if the use case calls for us to only use one feature in this function.
Since the location of the SMO library varies by SQL Server version, we’ll first create a switch statement and a parameter that requires input from a set of values that we pre-determine. The below locations reflect where different SMO library versions are stored, and we may want to verify this with our setup as well. I also have the library version written out before it’s added, using the Add-Type function; it’s possible that we may not need this, so we can remove since it’s not necessary. During troubleshooting, it can be helpful to have.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
Function Get-TableInfo { Param( [ValidateSet("SQL Server 2008R2","SQL Server 2012","SQL Server 2014","SQL Server 2016")][string]$smodllversion ) Process { switch ($smodllversion) { "SQL Server 2008R2" { Add-Type -Path "C:\Program Files (x86)\Microsoft SQL Server\100\SDK\Assemblies\Microsoft.SqlServer.Smo.dll" } "SQL Server 2012" { Add-Type -Path "C:\Program Files (x86)\Microsoft SQL Server\110\SDK\Assemblies\Microsoft.SqlServer.Smo.dll" } "SQL Server 2014" { Add-Type -Path "C:\Program Files (x86)\Microsoft SQL Server\120\SDK\Assemblies\Microsoft.SqlServer.Smo.dll" } "SQL Server 2016" { Add-Type -Path "C:\Program Files (x86)\Microsoft SQL Server\130\SDK\Assemblies\Microsoft.SqlServer.Smo.dll" } } } } Get-TableInfo -smodllversion |
Provided that we enter the correct version in our function, we’ll be able to use this information to get the table information. If the highest version we have installed in 2014, we’ll want to make sure to select this in the dropdown option.
Since we want to get the row count, storage and compression information using PowerShell, we will use another switch statement with a set of options to allow users to choose what they want. If users want storage information, they’ll pass in “storage”, for row count, they’ll pass in “rowcount”, etc. Within each choice (the parameter name), we’ll have different functions to returning the information for the user.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 |
Function Get-TableInfo { Param( [ValidateSet("SQL Server 2012","SQL Server 2014","SQL Server 2016")][string]$smodllversion , [ValidateSet("rowcount","storage","compression")][string]$choice ) Process { switch ($smodllversion) { "SQL Server 2012" { Add-Type -Path "C:\Program Files (x86)\Microsoft SQL Server\110\SDK\Assemblies\Microsoft.SqlServer.Smo.dll" } "SQL Server 2014" { Add-Type -Path "C:\Program Files (x86)\Microsoft SQL Server\120\SDK\Assemblies\Microsoft.SqlServer.Smo.dll" } "SQL Server 2016" { Add-Type -Path "C:\Program Files (x86)\Microsoft SQL Server\130\SDK\Assemblies\Microsoft.SqlServer.Smo.dll" } } switch ($choice) { "rowcount" { } "storage" { } "compression" { } } } } |
Next, we’ll add the connection information for our script and we’ll add the table information, since the row count, storage and compression information involve a table for this script. Since each detail is a part of a table, we’ll add the table information outside of our choice switch statement, rather than create the table object within each part of the switch statement. This is cleaner and more efficient and also allows us to return all information, if we choose to add that choice (the all option seen in the below script).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 |
Function Get-TableInfo { Param( [ValidateSet("SQL Server 2012","SQL Server 2014","SQL Server 2016")][string]$smodllversion , [ValidateSet("rowcount","storage","compression","all")][string]$choice , [Parameter(Mandatory=$true)]$server , [Parameter(Mandatory=$true)]$database , [Parameter(Mandatory=$true)]$table ) Process { switch ($smodllversion) { "SQL Server 2012" { Add-Type -Path "C:\Program Files (x86)\Microsoft SQL Server\110\SDK\Assemblies\Microsoft.SqlServer.Smo.dll" } "SQL Server 2014" { Add-Type -Path "C:\Program Files (x86)\Microsoft SQL Server\120\SDK\Assemblies\Microsoft.SqlServer.Smo.dll" } "SQL Server 2016" { Add-Type -Path "C:\Program Files (x86)\Microsoft SQL Server\130\SDK\Assemblies\Microsoft.SqlServer.Smo.dll" } } $smoscon = New-Object Microsoft.SqlServer.Management.SMO.Server($server) $gettableinfo = $smoscon.Databases["$database"].Tables["$table"] switch ($choice) { "rowcount" { $gettableinfo.RowCount } "storage" { ### The "SpaceUsed" reports in kilobytes, so we convert to megabytes ($gettableinfo.DataSpaceUsed + $gettableinfo.IndexSpaceUsed)/(1024) } "compression" { $gettableinfo.HasClusteredColumnStoreIndex $gettableinfo.HasCompressedPartitions } "all" { $all += "Rowcount: " + $gettableinfo.RowCount + ", SpaceUsed: " + ($gettableinfo.DataSpaceUsed + $gettableinfo.IndexSpaceUsed)/(1024) + ", ClusteredColumnStoreIndex: " + $gettableinfo.HasClusteredColumnStoreIndex + ", CompressedPartitions: " + $gettableinfo.HasCompressedPartitions $all } } } } Get-TableInfo -smodllversion 'SQL Server 2014' -choice all -server "OurServer" -database "OurDatabase" -table "OurTable" |
With the script, we can select our choice and experiment with tables. When we return the row count information, this mirrors checking the row count properties of a table within the interface. A few DBAs and developers debate about the best approach of checking row counts, and this varies based on use case. In most cases, when I check a row count, I do not want to interrupt data flow. However, this method may be inappropriate if you must know the exact count and prefer a “hard count” query. The space information combines the total space of a table by aggregating the data and index space. According to Microsoft, both of these space properties within the table class report details in kilobytes, so in the script we convert this to megabytes. If we’re in larger data environments, we can convert to an appropriate measure such as megabytes, gigabytes, petabytes, etc. Finally, the compression information reports whether we have a clustered columnstore index (which uses compression) and (or) whether we have compressed partitions.
Since we may want all this information, we can add one more piece to our switch statement:
1 2 3 4 5 |
"all" { $all += "Rowcount: " + $gettableinfo.RowCount + ", SpaceUsed: " + ($gettableinfo.DataSpaceUsed + $gettableinfo.IndexSpaceUsed)/(1024) + ", ClusteredColumnStoreIndex: " + $gettableinfo.HasClusteredColumnStoreIndex + ", CompressedPartitions: " + $gettableinfo.HasCompressedPartitions $all } |
If we want to return everything from our script, we can add one more option in our choice parameter – all – and calling this will return all the information we have about the table. In the below example, we look at a couple of tables in the returned output:
While this script will return the information, if we want to take it a step further, we can retain this information in files or within a table by adding the output to a file or SQL insert statement. Getting information with a service account or SQL user with lower level permissions is one thing, though as we do more – such as save information, we’ll need the account executing the script to have more permissions.
Final thoughts
In this tip, we looked at a PowerShell script which can return the row count, space used, and compression information about a table – with the option of also returning all these details. Depending on the standards we set along with the thresholds, we can get the information that we need quickly and determine the next course of action, or add that course of action within the script with additional permissions.
- Data Masking or Altering Behavioral Information - June 26, 2020
- Security Testing with extreme data volume ranges - June 19, 2020
- SQL Server performance tuning – RESOURCE_SEMAPHORE waits - June 16, 2020