This article explores T-SQL RegEx commands in SQL Server for performing data search using various conditions.
Introduction
We store data in multiple formats or data types in SQL Server tables. Suppose you have a data column that contains string data in alphanumeric format. We use LIKE logical operator to search specific character in the string and retrieve the result. For example, in the Employee table, we want to filter results and get the only employee whose name starts with character A.
We use regular expressions to define specific patterns in T-SQL in a LIKE operator and filter results based on specific conditions. We also call these regular expressions as T-SQL RegEx functions. In this article, we will use the term T-SQL RegEx functions for regular expressions.
We can have multiple types of regular expressions:
- Alphabetic RegEx
- Numeric RegEx
- Case Sensitivity RegEx
- Special Characters RegEx
- RegEx to Exclude Characters
Pre-requisite
In this article, we will use the AdventureWorks sample database. Execute the following query, and we get all product descriptions:
1 2 |
SELECT [Description] FROM [AdventureWorks].[Production].[ProductDescription]; |
Let’s explore T-SQL RegEx in the following examples.
Example 1: Filter results for description starting with character A or L
Suppose we want to get product description starting with character A or L. We can use format [XY]% in the Like function.
Execute the following query and observe the output contains rows with first character A or L:
1 2 3 |
SELECT [Description] FROM [AdventureWorks].[Production].[ProductDescription] where [Description] like '[AL]%' |
Example 2: Filter results for description with first character A and second character L
In the previous example, we filtered results for starting character A, or L. Suppose we want starting characters of descriptions AL. We can use T-SQL RegEx [X][Y]% in the Like operator.
1 2 3 |
SELECT [Description] FROM [AdventureWorks].[Production].[ProductDescription] where [Description] like '[A][L]%' |
In the output, you can we get only records with first character A and second characters L.
We can specify multiple characters as well to filter records. The following query gives results for starting characters [All] together:
1 2 3 |
SELECT [Description] FROM [AdventureWorks].[Production].[ProductDescription] where [Description] like '[A][L][L]%' |
Example 3: Filter results for description and starting character between A and D
In the previous example, we specified a particular starting character to filter the results. We can specify character range using [X-Z]% functions.
The following query gives results for description starting character from A and D:
1 2 3 |
SELECT [Description] FROM [AdventureWorks].[Production].[ProductDescription] where [Description] like '[A-D]%' |
Similarly, we can specify multiple conditions for each character. For example, the below query does the following searches:
- The first character should be from A and D alphabets
- The second character should be from F and L alphabet
1 2 3 |
SELECT [Description] FROM [AdventureWorks].[Production].[ProductDescription] where [Description] like '[A-D][F-I]%' |
In the output, you can see that both result set satisfies both conditions.
Example 4: Filter results for description and ending character between A and D
In the previous examples, we filtered the data for the starting characters. We might want to filter for the end position character as well.
In the previous examples, note the position of percentage (%) operator. We specified a percentage character at the end of search characters.
1 2 3 |
SELECT [Description] FROM [AdventureWorks].[Production].[ProductDescription] where [Description] like '[A-D][F-I]%' |
In the following query, we changed the position of percentage character at the beginning on search character. It looks for the characters with the following condition:
- Ending character should be from G and S
1 2 3 |
SELECT [Description] FROM [AdventureWorks].[Production].[ProductDescription] where [Description] like '%[G-S]' |
In the output, we get the character that satisfies our search condition.
Example 5: Filter results for description starting letters AF and ending character between S
Let’s make it a bit complex. We want to search using the following conditions:
- Starting character should be A (first) and F (second)
- Ending character should be S
Execute the following query, and in the output, we can see it satisfies our requirement:
1 2 3 |
SELECT [Description] FROM [AdventureWorks].[Production].[ProductDescription] where [Description] like '[A][F]%[S]' |
Example 6: Filter results for description starting letters excluding A to T
In the following example, we do not want the first character of output rows from A to T. We can exclude characters using [^X-Y] format in Like operator.
1 2 3 |
SELECT [Description] FROM [AdventureWorks].[Production].[ProductDescription] where [Description] like '[^A-T]%' |
In the output, we do not have any first characters from A to T.
Example 7: Filter results for description with a specific pattern
In the example below, we want to filter records using the following conditions:
- The first character should be from R and S character – [R-S]
- We can have any combination after the first character – %
- We require the P character – [P]
- It should be followed by either an [P] or [I] – [PI]
- It can have any other character after previous condition- %
1 2 3 |
SELECT [Description] FROM [AdventureWorks].[Production].[ProductDescription] where [Description] like '[R-S]%[P][I]%' |
Example 8: Case sensitive search using T-SQL RegEx functions
By default, we do not get case sensitive results. For example, the following queries return the same result set:
1 2 3 4 5 6 7 |
SELECT [Description] FROM [AdventureWorks].[Production].[ProductDescription] where [Description] like '[r-s]%[P][i]%' SELECT [Description] FROM [AdventureWorks].[Production].[ProductDescription] where [Description] like '[R-S]%[P][I]%' |
We can perform case sensitive search using the following two ways:
-
Database collation setting: Each database in SQL Server have a collation. Right-click on the database and in the properties page, you can see the collation
We have SQL_Latin1_General_CP1_CI_AS performs case insensitive behaviour for the database. We can change this collation to case sensitive collation. It is not a simple solution. It might create issues for your queries. It is not a recommended way unless you explicitly require case sensitive collation.
We can use Column Collation with T-SQL RegEx functions to perform case sensitive search.
1234567Create table Characters(Alphabet char(1))GoInsert into Characters values ('A')Insert into Characters values ('a')GoIn the table, we have letter A in upper and lowercase. If we run the following select statement, it returns both uppercase and lowercase:
12SELECT * from Characterswhere Alphabet like '[A]%'Suppose we want to filter the uppercase letter in the result. We can use column collation as per the following query:
12select * from Characterswhere Alphabet COLLATE Latin1_General_BIN like '[A]%'It returns uppercase letter A in the output.
Similarly, the following query returns lowercase letter in the output:
12select * from Characterswhere Alphabet COLLATE Latin1_General_BIN like '[a]%' -
We can use T-SQL RegEx function to find both upper and lowercase characters in the output.
We want the following output:
- The first character should be uppercase character C
- The second character should be lowercase character h
- Rest of the characters can be in any letter case
123SELECT [Description]FROM [AdventureWorks].[Production].[ProductDescription]where [Description] COLLATE Latin1_General_BIN like '[C][h]%'
Example 9: Use T-SQL Regex to Find Text Rows that Contain a Number
We can find a row that contains the number as well in the text. For example, we want to filter the results with rows that contain number 0 to 9 in the beginning.
1 2 3 4 5 6 7 8 9 10 11 12 |
SELECT [Description] FROM [AdventureWorks].[Production].[ProductDescription] where [Description] like '[0-9]%'</p> <p> <img style="margin: 0px auto; display: block;" src="/wp-content/uploads/2019/09/use-t-sql-regex-to-find-text-rows-that-contain-a-n.png" alt="Use T-SQL Regex to Find Text Rows that Contain a Number" /> </p> <p> Similar to the characters, we can also specify the numbers for different positions. In the following example, we want the first digit from 1 to 5. The second digit should be in between 0 to 9. </p> <p><pre lang="tsql">SELECT [Description] FROM [AdventureWorks].[Production].[ProductDescription] where [Description] like '[1-5][0-9]%' |
Example 10: Use T-SQL Regex to Find valid email ID’s
Let’s explore a practical scenario of the RegEX function. We have a customer table, and it holds the customer email address. We want to identify valid email address from the user data. Sometimes, users make typo mistake and enter @@ instead of @ character.
First, create the sample table and insert some email address into it in different formats.
1 2 3 4 5 6 7 8 9 |
CREATE TABLE TSQLREGEX( Email VARCHAR(1000) ) Insert into TSQLREGEX values('raj@gmail.com') Insert into TSQLREGEX values('HSDFX@gmail.com') Insert into TSQLREGEX values('JHKHKO.PVS@gmail.com') Insert into TSQLREGEX values('ABC@@gmail.com') Insert into TSQLREGEX values('ABC.DFG.LKF#@gmail.com') |
Execute the following select statement with the T-SQL RegEx function and it eliminates invalid email addresses.
1 2 |
Select * from TSQLREGEX where email LIKE '%[A-Z0-9][@][A-Z0-9]%[.][A-Z0-9]%' |
We do not have following invalid email address in the list.
- ABC@@gmail.com
- ABC.DFG.LKF#@gmail.com
Conclusion
In this article, we explored T-SQL RegEx functions to perform a search using various conditions. You should be aware of these to search based on specific requirements.
- Understanding PostgreSQL SUBSTRING function - September 21, 2024
- How to install PostgreSQL on Ubuntu - July 13, 2023
- How to use the CROSSTAB function in PostgreSQL - February 17, 2023