Rajendra Gupta
Work with the email addresses using the SUBSTRING function

Understanding PostgreSQL SUBSTRING function

September 21, 2024 by

PostgreSQL is an open-source, object-oriented relational database system with reliability, performance, and robustness. It supports SQL (relational) and JSON (non-relational) queries.PostgreSQL’s popular features include Multi-Version Concurrency Control (MVCC), point-in-time recovery, granular access controls, tablespaces, and asynchronous replication.

Data refactoring is a common requirement in data mining operations. Data stored in a table requires transformations based on the needs. Let’s say your table store customer address in a column. The address contains the house number, street name, and postal code. You want to extract a customer postal code for sorting customer data as per their location (postal code). Similarly, you might wish to extract data before or after a particular character, such as a comma or semi-colon.

PostgreSQL includes several built-in functions for manipulating string data in fields such as CONCAT(), FORMAT(), LENGTH (), POSITION(), LTRIM(), RTRIM(), REPLACE(), SUBSTRING().

In this article, we explore how the PostgreSQL substring function works in extracting specific data from a string,

This article covers the following topics.

  • Substring function in PostgreSQL
  • Examples of Substring functions
  • Dynamically locating the starting and end character positions

Environment details

This article uses PostgreSQL 15.1 on Ubuntu with the pgadmin GUI tool.

PostgreSQL version

You can use the following links to set up the PostgreSQL environment.

PostgreSQL: https://www.postgresql.org/download/

PgAdmin: https://www.pgadmin.org/download/

PostgreSQL substring function

The PostgreSQL substring function returns a subset of the input string based on the supplied string starting position and length. You can use the SUBSTRING function for extracting data as per fixed string length and regular expressions.

Syntax:

SUBSTRING (String, Start_Position, length)

  • String: The input string with data type char, varchar, text, datetime, etc.
  • Start_Position: It is an integer value (positive) that specifies the starting character position of the string. The first character in the string has the start_position value as one. If you do not specify any start_position value, PostgreSQL always sets the starting position from the first character.
  • Length: Length is also a positive integer that defines the number of characters you need to extract from the specified string beginning at the start_position. It is an optional parameter. If we do not specify the length parameter, the PostgreSQL substring function returns the whole starting from the start_position.

Let’s explore the substring function with various examples.

  • Example: SUBSTRING() function with a string

The following substring function extracts a string of 8th characters from the starting position 1. The first character in a string always has a starting position of 1.

PostgreSQL SUBSTRING function

The length parameter is optional in the substring function. If you do not specify it, all characters from the starting position are returned. For example, in the following code, the output is the string from the 8th position character.

PostgreSQL SUBSTRING function example

  • Example: SUBSTRING function with an alternative syntax

In the previous example, we specified string, start_position, and length (optional) parameters separated by commas. Alternatively, you can use the following format to get the same output.

This format specifies the parameters in a human-friendly way. For example, the following code instructs Postgres to extract the string from the starting position 1 with three characters in length.

SUBSTRING function with an alternative syntax

Similarly, if you do not require a length parameter, specify the starting position per the query below.

alternate way of writing code

  • Example: SUBSTRING function with table data

We have a table in the sample database named actor that has the following content:

function with table data

We can use the substring function with the columns to extract a specific string portion. The following query extracts the first character of the first and last name from the actor table.

function with table data columns

  • Example: SUBSTRING function with the ORDER BY clause

This example uses SUBSTRING() function with the ORDER BY clause for sorting data as per the extracted string.

Here, the ORDER BY 1 shows data sorting is required as per the content of the first column.

SUBSTRING function with the ORDER BY clause

If you need data sorted in descending order, add the DESC keyword with the ORDER BY clause.

ORDER BY Clause example

Extract matching Substrings with SQL Regular Expression

Postgres SUBSTRING function can extract string that matches with regular expressions. Regular expressions are the patterns that can be used to match character combinations in strings. These expressions can use the following operators:

Bracket [ ]: The bracket expression matches the single character from the characters range specified in the brackets. For example, [a-z] specifies a range that matches any lowercase letter from “a” to “z.” Similarly, [1-9] specifies the range matching numbers 1 to 9.

^: It matches the starting position within the string.

$: It matches the ending position of the string or a position just before a string-ending newline.

[^ ]: It matches a single character not contained within the brackets.

Let’s look at the following code that uses the regular expression. In this code, we have the followings:

String: This is my 600th article

Regular expression: [0-9]

Starting position: 1

Length: 600

The code interpretation is extracting the substring where a first position character is a number between 0 to 9. Therefore, the code returns the value 600 specified in the code.

SELECT

matching Substrings with SQL Regular Expression

If we change the regular expression to [7-9], we get a NULL value in the output because the string does not have a character where the number is between 7 to 9.

matching Substrings

  • Example: Work with the email addresses using the SUBSTRING function

Suppose your company uses the email address in the format where the string before the @ keyword is the customer user name that the employee uses to login to the web portal. How do you extract the user name from the email address in this case?

The POSITION function returns the numeric character position of the character (@). Here, we do not require (@) from the extracted string. Therefore, we use the minus 1 in the position function.

Work with the email addresses using the SUBSTRING function

Use a Negative number as starting position in the SUBSTRING function

Suppose you specified the negative number for the start_position of the SUBSTRING function. The function takes the first character position as 1. Therefore, as defined in the function -5 value considers the empty string five characters back from the first character and returns the number of characters you specified in length.

Negative number as starting position in the SUBSTRING function

The following code returns an empty string because it calculates from the minus five character position and takes length as four characters. Since we do not have a string on starting position -5 and length 4, you get the empty string output as below.

Negative Length

However, if you specify the length as a negative value, you get the error: “negative substring length not allowed, SQL State: 22011”

Error

Conclusion

This article explored the PostgreSQL SUBSTRING() function that extracts a substring from the specified string based on the starting position and optional length parameter. This function is commonly used when you work with strings and requires data extraction. Explore this useful function in your PostgreSQL environment.

In conclusion, the substring function in PostgreSQL is a powerful tool that allows you to extract a portion of a string based on the starting position and the number of characters. This function can be helpful in various scenarios, such as extracting parts of names, addresses, or product codes. The substring function is also case-sensitive, which means that it can distinguish between uppercase and lowercase letters. Using the substring function, you can manipulate and process string data more efficiently in your database, making it a valuable tool for data analysis and management.

Rajendra Gupta
Latest posts by Rajendra Gupta (see all)
168 Views