This article will explore the use of the CROSSTAB function in PostgreSQL.
Introduction
PostgreSQL is an open-source,multi-operating system supported (Windows, Linux, and Unix), object-oriented relational database with reliability, performance, and feature robustness. It uses multi-version concurrency control (MVCC) for managing the concurrency.
A pivot table is a popular tool in Microsoft Excel that shows summarized data and helps you analyze it in various ways. Pivot tables collect and organize data from different rows, columns, and tables. Pivot tables are a great way to summarise data, and a handy tool for analyzing sales revenue, products sold, sales performance, etc.
Relational database tables store data in multiple rows and columns. You can calculate data using various functions such as count, sum, and average. SQL Server provides the PIVOT and UNPIVOT functions for working with pivot tables. How do we create the pivot tables in PostgreSQL? Let’s find it out.
This article will explore the crosstab function and its different use cases.
Pre-requisites
You need the following requirements to work with this article.
- In this article, I am using PostgreSQL 14.5 on Ubuntu. If you do not have Postgres installed on your system, refer to the URL https://www.postgresql.org for downloading software and installation guide. The SELECT Version() command returns the Postgres version, as shown below.
- PgAdmin4:The PgAdmin4 is an open-source tool with a graphical interface for interacting with the Postgres database. The PgAdmin tool is suitable for developing, maintaining, and performing administration activities. To use the latest PgAdmin4 tool, download it from the URL https://www.pgadmin.org.
- Create a new (sample) database for executing the queries. I used the Postgres database [DemoDb] for this article, as shown below.
- I would recommend you be familiar with existing articles on PostgreSQL.
PostgreSQL CrossTab function
Crosstab function
The Crosstab function generates the Pivot tables in the PostgreSQL database. The crosstab gets the select statement as a parameter, and this select should be compliant with the following conditions:
-
The Select statement output must have three columns.
- The first column identifies each row of the result table.
- The Second column identifies the category of the pivot table
- The third column represents values assigned to each cell.
Let’s create a sample table to explore the crosstab function in SQL Server. The query creates a table [ProductSales]. It inserts data for a few product sale counts in different years.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
create Table ProductSales ( Productname varchar(50), Year int, Sales int ); Insert into ProductSales values ('A',2017,100), ('A',2018,150), ('A',2019,300), ('A',2020,500), ('A',2021,450), ('A',2022,675), ('B',2017,0), ('B',2018,900), ('B',2019,1120), ('B',2020,750), ('B',2021,1500), ('B',2022,1980); Select * from ProductSales; |
The query returns data in the tabular format, as shown below.
Let’s look at the following crosstab query. It has the following sections:
- Select columns output: The query uses a select statement that retrieves three columns, [Productname],[year], and [sales], from the [ProductSales] table. The query is enclosed in the single quotes. You will get an error if the query is not specified in the single quotes.
- The crosstab function is invoked for the select statement using the FROM clause – Select * from crosstab()
- We need to define the columns for the pivot table and their data types. For this example, we require the following columns in the output:
Column | Data type |
Productname | Varchar(50) |
Year1 | Int |
Year2 | Int |
Year3 | Int |
Year4 | Int |
Year5 | Int |
Year6 | int |
1 2 3 4 5 |
Select * from crosstab('select Productname,year,sales from ProductSales order by 1,2') as ProductSales(Productname varchar(50),year1 int,y2 int, y3 int,y4 int,y5 int,y6 int); |
Before running the crosstab query, we need to enable the crosstab function. The crosstab function is part of the PostgreSQL extension called tablefunc. The tablefunc module contains various functions such as crosstab, normal_rand, connectby.
Execute the following query to enable the tablfunc module.
1 2 3 |
create extension tablefunc; |
After you enable the tablefunc extension, it will be visible in the Postgres database under the extensions group.
Now, execute the crosstab query and view the result. The pivot table shows the product sales for different years. It makes it easy for you to understand the sales journey of a product over various years in a single row.
Advanced Crosstab function
In the advanced form of the crosstab function, it can take two input parameters. The multiple (two) parameter helps to handle the null entries. To understand the advanced form better, let’s create three tables, [Departmentstores],[products], and [sales], using the below scripts.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 |
CREATE TABLE Departmentstores ( id SERIAL PRIMARY KEY NOT NULL, storename varchar(50) NOT NULL, city varchar(50) NOT NULL ); CREATE TABLE products ( id SERIAL PRIMARY KEY NOT NULL, Productname varchar(50) NOT NULL, Productcost int NOT NULL ); CREATE TABLE sales ( id SERIAL PRIMARY KEY, storeid int NOT NULL, productid int NOT NULL, numberofunitssold int NOT NULL ); insert into Departmentstores (id, storename, city) values (1, 'Agarwal Departments', 'Gurugram'), (2, 'Gupta Store', 'Noida'), (3, 'Balaji Shopee LLC', 'Rohtak') ; insert into sales (id, storeid, productid, numberofunitssold) values (1, 1, 2, 99),(2, 2, 3, 658),(3, 3, 1, 236), (4, 2, 3, 122),(5, 2, 1, 122),(6, 1, 1, 58), (7, 3, 3, 103),(8, 3, 2, 587),(9, 2, 2, 697), (10, 3, 3, 34),(11, 3, 3, 475),(12, 1, 1, 457), (13, 3, 2, 555),(14, 2, 3, 951),(15, 3, 2, 328), (16, 2, 3, 47),(17, 3, 2, 357),(18, 3, 3, 147), (19, 3, 1, 99),(20, 3, 2, 336); insert into products (id, Productname,productcost) values (1, 'candle',10), (2, 'sandlewood stick',20), (3, 'soap',30); |
The advanced crosstab function has the following syntax:
Crosstab(source_sql,category_sql)
-
The source_sql contains the select statement that returns the source set of data. As stated earlier, the crosstab must have three columns. The source_sql first column is used as the row_name. The last two columns are the category and value columns. For the source_sql, we will use the following select statement that has four columns as output:
- Storename: Row name in pivot table
- Productname: category column
- Total_units: Value column
1 2 3 4 5 6 7 8 |
SELECT str.storename as store_name, prd.Productname as product_name, sum(sl.numberofunitssold) as total_units FROM sales sl LEFT JOIN DepartmentStores str ON sl.storeid = str.id LEFT JOIN products prd ON sl.productid = prd.id GROUP BY storename, productname ORDER BY storename ASC, productname ASC |
- The category_sql is another select statement that returns the unique product names from the products table. It contains the DISTINCT to produce unique values.
1 2 3 |
SELECT DISTINCT Productname FROM products ORDER BY Productname ASC |
We can apply the crosstab function using the source_sql and category_sql. We defined pivot columns and their data types explicitly.
1 2 3 4 5 6 7 8 9 10 11 12 |
SELECT * FROM crosstab(' SELECT str.storename as store_name, prd.Productname as product_name, sum(sl.numberofunitssold) as total_units FROM sales sl LEFT JOIN DepartmentStores str ON sl.storeid = str.id LEFT JOIN products prd ON sl.productid = prd.id GROUP BY storename, productname ORDER BY storename ASC, productname ASC ', 'SELECT DISTINCT Productname FROM products ORDER BY Productname ASC') AS productsales(storename varchar, candle int, sandlewood_stick int, soap int); |
The query generates the pivot table as shown below. We get a null value for row 1 and column (soap) because there is no matching data for the product sold in the specified store. For the remaining column, it returns the value as the respective stores sold those products.
crosstabN(text)
You might have noticed that we need to define the Pivot columns and their data types in the crosstab function. If your query returns text data types for the row_name and value columns, you can use the crosstabN function. The crosstabN function is a wrapper for the crosstab function. It works the same as the crosstab function, except that you do not need to determine the pivot columns.
Let’s create another productsales_t table with the text data types for this example.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
create Table ProductSales_t ( Productname text, Year text, Sales text ); Insert into ProductSales_t values ('A',2017,100), ('A',2018,150), ('A',2019,300), ('A',2020,500), ('A',2021,450), ('A',2022,675), ('B',2017,0), ('B',2018,900), ('B',2019,1120), ('B',2020,750), ('B',2021,1500), ('B',2022,1980); |
As we have all text columns, we can use the crosstab3 function without specifying the pivot columns’ name and their data types.
1 2 3 |
Select * from crosstab3('select Productname,year,sales from ProductSales_t order by 1,2') |
The query returns the pivot table as shown below. It shows the column names as row_name, category_1, category_2, and category_3.
What if you do not have text columns in the source tables? Can you use the crosstabN function? Yes, we can use that. We can create custom types and functions for the underlying crosstab() function.
- Create the composite type with the desired output columns and data types. The following query creates a composite type my_crosstab_custom and defines the columns and data type similar to a regular table.
1 2 3 4 5 6 7 |
CREATE TYPE my_crosstab_custom AS ( productname varchar(50), Year int, sales int ); |
- Create the function that accepts one text parameter and returns the desired output columns. The function crosstab_custom below uses the type my_crosstab_custom for the crosstab function.
1 2 3 4 5 |
CREATE OR REPLACE FUNCTION crosstab_custom(text) RETURNS setof my_crosstab_custom AS '$libdir/tablefunc','crosstab' LANGUAGE C STABLE STRICT; |
Instead of the crosstab function, use the new function crosstab_custom without specifying the Pivot table columns and their data types.
1 2 3 |
Select * from crosstab_custom('select Productname,year,sales from ProductSales order by 1,2') |
Here is the output of the crosstab_custom function. You can see the column names as you define in type my_crosstab_custom.
Conclusion
This blog post was an introduction to the CROSSTAB function in PostgreSQL. It’s a powerful function for creating pivot tables in PostgreSQL that can be used in various ways. If you have any comments feel free to leave them below.
- Understanding PostgreSQL SUBSTRING function - September 21, 2024
- How to install PostgreSQL on Ubuntu - July 13, 2023
- How to use the CROSSTAB function in PostgreSQL - February 17, 2023