Introducing the Problem
Gaps existence in automatic sequenced columns occurs all the time. Missing identity values (or other sequencing values) occur for a variety of Reasons.
The most common reasons include: roll backed transactions, failed inserts and Deletes, large row deletes after delete commands that occur after many inserts to a sequenced table and so forth.
The sequence gaps appear in all kinds of sequences, IDENTITY, generic integer Columns that act as a sequenced primary key of a table and also dedicated sequence Objects.
In all of these cases, the sequenced number does not contain any duplicate values.
The sequence gaps scenarios problem can cause some of business questions since the values are not sequenced properly and values are missing for no apparent Reason.
In order to reduce gaps in sequences, I have constructed a stored procedure that analyze the gaps in a given table and sequence column.
The procedure gives back a new sequence number from one of four selected Methods (strategies):
1 – (The first fit gap strategy) – The first number that fit in the first gap
2 – (The last fit gap strategy) – The first number that fit in the last gap
3 – (The Largest fit gap strategy) – The first number that fit in the largest gap.
4 – (The smallest fit gap Strategy) – The first number that fit in the smallest gap.
Advantage and weakness
The advantage of this procedure is clear, it saves space. Rather than using a big integer that is 8 bytes for a large sequence column that advances all the time, we keep a shorter int column (only 4 bytes) and adjust the sequence value from time to time.
This scenario is useful when there are many deletes and not only inserts to the table.
The weakness of this method is clearly performance, executing the procedure takes time, and the execution time gets larger when the table is larger. Using an indexed (usually clustered index on the sequenced column), helps in this case.
How does the solution work?
I have created a stored procedure called GetNextSequence that acts as an advisor to the next value the sequence should have. The stored procedure uses a dynamic TSQL SELECT statement that is constructed inside the stored procedure. The statement uses the LEAD window function.
Starting from the SQL 2012 SQL Server edition, The LAG and LEAD window functions have been introduced as functions for accessing the prior or subsequent rows along with the current one.
If the table that is given as the parameter, has gaps in the given column, the table is then queried, comparing the Values that exist in the current row compared to the value in the next row.
If the difference between the two values is greater than 1, then a gap exists, and therefore, will be returned in the result set.
If there are no gaps in the table (that means that there are no adjacent values with difference greater than 1, no rows are returned from the query and the procedure returns -1)
The assumption of the procedure is that the sequence column is a positive integer (n integer and > 0) then a positive integer value is returned.
The Procedure gets three parameters and outputs one value.
The input parameters
The Table name that has gaps in the sequenced column (i.e @table parameter)
The name of the sequence column (i.e @col parameter)
The method, as a string that has four valid codes (@i.e. the @method parameter)
The codes are:
- FIRST_FIT – The very first , minimal value for a sequence that fit in the first found gap
- LAST_FIT – The first value of the sequence that fits in the last gap found
- LARGEST_GAP – The first value of the sequence that fit in the largest gap found
- SMALLEST_GAP – The first value of the sequence that fit in the smallest gap found
An Output parameter (The @sequence parameter):
- If a sequence is found according to any method then a positive Integer value is returned according to the method selected
- If -1 is returned it stands for – "No gaps found", so NEXT number should be Max (sequence) + 1
Note:
Note that it is up to the DBA that executes the procedure, to reseed the identity column on to alter the sequence back to the value suggested by the procedure.
The Procedure TSQL Code
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 |
CREATE PROC GetNextSequence ( @table VARCHAR(40) ,@col VARCHAR(30) ,@method VARCHAR(20) ,@seq BIGINT OUTPUT ) AS BEGIN SET nocount on DECLARE @dyntsql VARCHAR(1000) CREATE TABLE ##Sequences ( start_seq BIGINT ,end_seq BIGINT ) SET @dyntsql = 'SELECT t.CurrRow + 1 AS [StartGap],t.NextRow - 1 AS [EndGap] FROM ( SELECT ' SET @dyntsql += @col + ' AS CurrRow,LEAD(' + @col + ', 1, NULL) OVER (ORDER BY ' + @col + ') AS NextRow FROM ' SET @dyntsql += @table + ' ) as t WHERE t.NextRow - t.CurrRow > 1' INSERT ##Sequences EXEC (@dyntsql) IF (@method = 'FIRST_FIT') BEGIN SELECT @seq = start_seq FROM ##Sequences WHERE start_seq = ( SELECT min(start_seq) FROM ##Sequences ) END IF (@method = 'LAST_FIT') BEGIN SELECT @seq = start_seq FROM ##Sequences WHERE start_seq = ( SELECT max(start_seq) FROM ##Sequences ) END IF (@method = 'SMALLEST_GAP') BEGIN SELECT @seq = start_seq FROM ##Sequences WHERE end_seq - start_seq = ( SELECT min(end_seq - start_seq) FROM ##Sequences ) END IF (@method = 'LARGEST_GAP') BEGIN SELECT @seq = start_seq FROM ##Sequences WHERE end_seq - start_seq = ( SELECT max(end_seq - start_seq) FROM ##Sequences ) END IF (@seq IS NULL) SET @seq = - 1 DROP TABLE ##Sequences END GO |
Some Explanations for the code:
- The procedure gets the table name , column name and method input parameters and constructs a dynamic TSQL statement in the following algorithmic format :
1 2 3 4 5 6 7 8 |
SELECT t.currRow + 1 as [Start Value In Gap], t.nextRow – 1as [Last Value in Gap] From (SELECT column as currRow , LEAD (column, 1 , NULL) OVER ( ORDER BY column) From table) as t Where nextRow – currRow > 1 |
In The dynamic TSQL execution, the table and column are replaced by the corresponding parameters of the procedure.
- The result of this dynamic TSQL statement is entered into a temporary, global table called ##sequences.
- This temporary table is then queried, according to the method parameter.
- If method is first-fit then the minimal value of the starting gap sequence is returned
- If method is last-fit then the maximal value of the starting gap sequence is returned
- If method is smallest-gap then the value of the starting gap sequence
- Where the Gap is the smallest is returned.
- If method is largest-gap then the value of the starting gap sequence
- Where the Gap is the largest is returned
Here is an example for procedure execution with some explanations:
- A test case table called testTB is created with some test values
As you can see there are gaps in the table:
1234567891011121314151617181920212223CREATE TABLE testTB (col INT)GOINSERT INTO testTB (col)VALUES (10),(11),(12),(50),(51),(52),(53),(54),(55),(90),(91),(92),(400),(850),(851),(852),(1054); Then, we execute the procedure and test it with all the methods available:
1234567891011DECLARE @sq bigintexec GetNextSequence 'testTB','col','FIRST_FIT',@sq OUTPUTprint @sqexec GetNextSequence 'testTB','col','LAST_FIT',@sq OUTPUTprint @sqexec GetNextSequence 'testTB','col','SMALLEST_GAP',@sq OUTPUTprint @sqexec GetNextSequence 'testTB','col','LARGEST_GAP',@sq OUTPUTprint @sq
We get the following output values, after this code execution:
13
853
56
401
Explanation for the above example:
- The first gap is between 13 and 49, so according to the first fit method 13 is returned
- The last gap is between 853 and 1053, so according to the last fit method 853 is returned
- The smallest gap is between 56 and 89, so according to the smallest gap method 56 is returned.
- The largest gap is between 401 and 849, so according to the largest gap method 401 is returned.
- Creating a stored procedure to fix orphaned database users - January 25, 2016
- Creating a gap in sequences – TSQL Stored Procedure advisor - January 6, 2016
- Construct a special multi-statement table function for checking SQL Server’s health - December 24, 2015