The requirement of data refactoring is very common and vital in data mining operations. In the previous article SQL string functions for Data Munging (Wrangling), you’ll learn the tips for getting started with SQL string functions, including the substring function for data munging with SQL Server. As we all agree that the data stored in one form sometimes require a transformation, we’ll take a look at some common functions or tasks for changing the case of a string, converting a value into a different type, trimming a value, and replacing a particular string in a field and so on.
After reading this article, you’ll understand more about:
- SQL String functions
- Understand the SQL Server SUBSTRING function
- How to handle data using it
- How to use the SQL Server SUBSTRING function in the where clause
- How to dynamically locate the starting and end character position
- How to work with date-time string using the SQL Server SUBSTRING function
- How to Create a simple sub-select using the T-SQL SUBSTRING function
- And more…
We’ll deep dive on above points in this article.. It allows us to truncate the length of string value that is varchar data-type, when we select data from the input string or tables.
It takes three arguments.
- The first one is the field that we want to query on.
- The second argument is the starting character,
- and the third argument is the ending character
The SQL Server SUBSTRING function syntax is as follows:
SUBSTRING (expression, position, length)
Parameters:
- expression: Input source string
- position: Is an integer value that specifies the initial position from which the characters can be extracted from the given expression. The first position of an expression is always starting with 1. The initial position can be a negative integer.
- length: Is a positive integer value. It specifies the ending limit and determines how many characters are going to be extracted from the given expression.
Note: The SQL Substring function traversal is always from left to right.
Examples
In this section, we are going to deal with some real-world scenarios using SQL string functions. For few demos, the Adventureworks2016 database is used and for some other, SQL data is generated manually. Let’s get our hands dirty and see more action.
- Simple data handling with the SQL Server Substring function
Let’s start with a basic SQL query. The following example returns a portion of a character string starting at an initial position 1 and extracts 5 characters from the starting position. The T-SQL SUBSTRING function is very useful when you want to make sure that the string values returned from a query will be restricted to a certain length.
1 |
SELECT FirstName, substring(firstname,1,5), lastname FROM Person.Person |
In the following output, by using the SQL Server SUBSTRING function and specifying the ‘firstname’ column, the starting position of one, and the length of five characters, executing this SQL query will truncate the length of the strings that are returned from the table to five characters long. It doesn’t matter if the value itself in the table is longer than five characters.
In the following, the 3rd parameter, the length, defined as 15. This will ensure that it doesn’t matter the length of the data stored in the table column itself, the query will only return the first 15 characters. This can be helpful to make sure that the output of query data is formatted according to our expectations or what our application requires.
1 |
SELECT FirstName, substring(firstname,1,15), lastname FROM Person.Person |
In the following, change the starting position as well as the length parameter, the length, defined as 10.
1 |
SELECT FirstName, substring(FirstName,2,5), lastname FROM Person.Person |
Extending the length to 10 characters will show longer string values. And changing the starting position to 2 will start counting characters from 2 through the string. In this case, substring function extracts 10 characters of the string starting at the second position. The SUBSTRING SQL function is very useful when you want to make sure that the string values returned from a query will be restricted to a certain length.
So you’re getting an idea of how the SQL SUBSTRING function works. The field that we want to act on, we start at which character, and we end at which character
- Use of the SQL Server SUBSTRING function in the where clause
In the following example, using the ‘firstname’ column, the last two characters are matched with the word ‘on’ using the SQL SUBSTRING function in the where clause.
1 2 3 4 5 |
SELECT DISTINCT FirstName, lastname FROM Person.Person WHERE SUBSTRING(FirstName, LEN(FirstName)-1,2) = 'on' |
- Dynamically locate the starting and end character
In the following example, the input string has alpha-numeric characters. Using the SQL Server Substring function, the numeric sub-set of a string from the column col is transformed using CHARINDEX. You can also use the SQL string function PATINDEX to find the initial position and length parameter of the SQL SUBSTRING function.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
DROP TABLE IF EXISTS Dummy; CREATE TABLE Dummy (col varchar(20)); INSERT INTO Dummy (col) VALUES ('NY-123 US'), ('AZ-456 GB'), ('MI-789 MO'); select substring (col, charindex('-',col,1)+1, charindex(' ',col,1)-charindex('-',col,1) ) from Dummy; |
OR
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
DROP TABLE IF EXISTS Dummy; CREATE TABLE Dummy (col varchar(20)); INSERT INTO Dummy (col) VALUES ('NY-123 US'), ('AZ-456 GB'), ('MI-789 MO'); select substring (col, PATINDEX('%-%',col)+1, PATINDEX('% %',col)- PATINDEX (%-%,col) ) from Dummy; |
In this example, using the SQL PATINDEX function, the initial position the string ‘-‘ is found. But the numeric value only starts in the next position so ‘1’ is added to initial position. Similarly, length is calculated by searching the next position ‘ ‘(space) and subtracting its value with the initial position gives the length. Now, we have values for all the arguments. Run the T-SQL statement.
- Working with DateTime strings
In the following example, you can see that the column col has a data-set and it is a datetime string. Using the SQL Server SUBSTRING function, the input values are truncated using CHARINDEX or PATINDEX function to get the date-time value. And then the derived string is type-casted to DateTime so that it can be used to compare with other DateTime values. In this case, it’s compared against the SQL GETDATE() function.
You can easily find the initial position and convert the data to required data-type (valid values) using the convert or cast functions. Using CHARINDEX, search for the position of ‘/’ of the input column. After finding the position, the value is subtracted by ‘3’ to get an initial value ‘12’ for the SQL SUBSTRING function. Similarly, the search is made to find a position for the character ’,’(comma). In this way, subtracting value with the initial position will yield the length of the string.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
DROP TABLE IF EXISTS Dummy; CREATE TABLE Dummy (col varchar(100)); INSERT INTO Dummy VALUES ('The Date is 04/18/2015 08:00:00, a Saturday'), ('The Date is 02/20/2016 07:00:00, a Sunday'), ('The Date is 03/13/2017 10:00:00, a Monday'), ('The Date is 06/07/2018 09:00:00, a Tuesday') GO select col,charindex('/',col,1)-3,charindex(',',col,1)-charindex('/',col,1) , substring (col, charindex('/',col)-3,charindex(',',col)-charindex('/',col)+3 ) from Dummy |
1 2 3 4 5 6 7 8 9 |
select col,patindex('%is%',col)+4,patindex('%,%',col),patindex('%,%',col)-patindex('%is%',col)-5, substring (col, patindex('%is%',col)+4,patindex('%,%',col)-patindex('%is%',col)-4 ) from Dummy where getdate()-300>=cast(substring (col, patindex('%is%',col)+4,patindex('%,%',col)-patindex('%is%',col)-4) as datetime) |
- Creating a simple sub-select
A Sub-select, in SQL Server, is effectively a nested select statement. In SQL, the result of a select statement is effectively a table. It usually just exists in memory but it can always be used, as you would use a table. Because of this, a select statement may be used as a data source for another select statement
In the following example, you can see how the columns are transformed using the SQL Server SUBSTRING function and used as a table for the SQL join statement.
If you see the temp table values, the first two characters of the first column represent state and next four characters represents the state-code. Similarly, the second column, the first two characters represents the country and rest four characters form the country-code. Using the SQL SUBSTRING function, the two columns are effectively parsed and transformed as four new columns. These columns can be used just as if they were a table in a database. In the select statement, it’s joined with the country table so that, we can actually find the name from the country.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
DROP TABLE IF EXISTS Country; CREATE TABLE Country ( CCode int, CNAME VARCHAR(20)) INSERT INTO Country VALUES ( 1234,'Great Britain'),(5678, 'UNITED STATES'), (4567, 'FRANCE' ) SELECT * FROM Country DROP TABLE IF EXISTS temp; CREATE TABLE temp ( Id1 VARCHAR(6), Id2 VARCHAR(6) ) INSERT INTO temp VALUES ( 'NY1234', 'US5678' ), ( 'AZ5678', 'GB1234' ), ( 'CA9012', 'FR4567' ) SELECT * FROM temp; SELECT SUBSTRING(Id1, 1, 2) AS State, SUBSTRING(Id1, 3,len(ID1)) AS SCode, SUBSTRING(Id2, 1, 2) AS Country, SUBSTRING(Id2, 3,len(id2)) AS CCode FROM temp; SELECT co.CName, ss.CCode FROM Country co INNER JOIN ( SELECT SUBSTRING(Id1, 1, 2) AS State, SUBSTRING(Id1, 3,len(ID1)) AS SCode, SUBSTRING(Id2, 1, 2) AS Country, SUBSTRING (Id2, 3, len(id2)) AS CCode FROM temp ) AS ss ON co.CCode = ss.CCode ; |
Note: You can also use RIGHT and LEFT string functions. You can refer to the SQL string functions for Data Munging (Wrangling) article for more information.
1 |
SELECT left(Id1,2) AS State, right(Id1, 4) AS SCode, left(Id2,2) AS Country,right(Id2, 4) AS CCode FROM temp; |
Summary
So far, we’ve seen several examples of the SUBSTRING function in SQL Server, the character functions that SQL server makes readily available for use, and how you can use them to manipulate string values in your database and in your result set. In this way it is helpful to make sure that the output of SQL query data is formatted according to the expectations or business requirement.
We also need to understand the importance of the data-set. It is always recommended to thoroughly validate the input value. There are multiple ways to transform the data using the T-SQL SUBSTRING function. In a few cases, it is possible to transform using other SQL string functions. In some cases, volume of data, performance, and SQL Server version defines the options one over the other.
That’s all for now. I hope you enjoyed this article on SQL string functions and the SQL Server SUBSTRING function in particular. Feel free to ask any questions in the comments below.
- Stairway to SQL essentials - April 7, 2021
- A quick overview of database audit in SQL - January 28, 2021
- How to set up Azure Data Sync between Azure SQL databases and on-premises SQL Server - January 20, 2021