SQL PARTITION BY Clause overview

This article will cover the SQL PARTITION BY clause and, in particular, the difference with GROUP BY in a select statement. We will also explore various use cases of SQL PARTITION BY.

We use SQL PARTITION BY to divide the result set into partitions and perform computation on each subset of partitioned data.

Preparing Sample Data

Let us create an Orders table in my sample database SQLShackDemo and insert records to write further queries.

Use SQLShackDemo

CREATE TABLE [dbo].[Orders]

(

[orderid] INT,

[Orderdate] DATE,

[CustomerName] VARCHAR(100),

[Customercity] VARCHAR(100),

[Orderamount] MONEY

)

I use ApexSQL Generate to insert sample data into this article. Right click on the Orders table and Generate test data.

It launches the ApexSQL Generate. I generated a script to insert data into the Orders table. Execute this script to insert 100 records in the Orders table.

USE [SQLShackDemo]

INSERT [dbo].[Orders] VALUES (216090, CAST(N'1826-12-19' AS Date), N'Edward', N'Phoenix', 4713.8900)

INSERT [dbo].[Orders] VALUES (508220, CAST(N'1826-12-09' AS Date), N'Aria', N'San Francisco', 9832.7200)

…

Once we execute insert statements, we can see the data in the Orders table in the following image.

We use SQL GROUP BY clause to group results by specified column and use aggregate functions such as Avg(), Min(), Max() to calculate required values.

Group By function syntax

SELECT expression, aggregate function ()

FROM tables

WHERE conditions

GROUP BY expression

Suppose we want to find the following values in the Orders table

Minimum order value in a city
Maximum order value in a city
Average order value in a city

Execute the following query with GROUP BY clause to calculate these values.

SELECT Customercity,

AVG(Orderamount) AS AvgOrderAmount,

MIN(OrderAmount) AS MinOrderAmount,

SUM(Orderamount) TotalOrderAmount

FROM [dbo].[Orders]

GROUP BY Customercity;

In the following screenshot, we can see Average, Minimum and maximum values grouped by CustomerCity.

Now, we want to add CustomerName and OrderAmount column as well in the output. Let’s add these columns in the select statement and execute the following code.

SELECT Customercity, CustomerName ,OrderAmount,

AVG(Orderamount) AS AvgOrderAmount,

MIN(OrderAmount) AS MinOrderAmount,

SUM(Orderamount) TotalOrderAmount

FROM [dbo].[Orders]

GROUP BY Customercity;

Once we execute this query, we get an error message. In the SQL GROUP BY clause, we can use a column in the select statement if it is used in Group by clause as well. It does not allow any column in the select clause that is not part of GROUP BY clause.

We can use the SQL PARTITION BY clause to resolve this issue. Let us explore it further in the next section.

SQL PARTITION BY

We can use the SQL PARTITION BY clause with the OVER clause to specify the column on which we need to perform aggregation. In the previous example, we used Group By with CustomerCity column and calculated average, minimum and maximum values.

Let us rerun this scenario with the SQL PARTITION BY clause using the following query.

SELECT Customercity,

AVG(Orderamount) OVER(PARTITION BY Customercity) AS AvgOrderAmount,

MIN(OrderAmount) OVER(PARTITION BY Customercity) AS MinOrderAmount,

SUM(Orderamount) OVER(PARTITION BY Customercity) TotalOrderAmount

FROM [dbo].[Orders];

In the output, we get aggregated values similar to a GROUP By clause. You might notice a difference in output of the SQL PARTITION BY and GROUP BY clause output.

Group By

SQL PARTITION BY

We get a limited number of records using the Group By clause

We get all records in a table using the PARTITION BY clause.

It gives one row per group in result set. For example, we get a result for each group of CustomerCity in the GROUP BY clause.

It gives aggregated columns with each record in the specified table.

We have 15 records in the Orders table. In the query output of SQL PARTITION BY, we also get 15 rows along with Min, Max and average values.

In the previous example, we get an error message if we try to add a column that is not a part of the GROUP BY clause.

We can add required columns in a select statement with the SQL PARTITION BY clause. Let us add CustomerName and OrderAmount columns and execute the following query.

SELECT Customercity,

CustomerName,

OrderAmount,

AVG(Orderamount) OVER(PARTITION BY Customercity) AS AvgOrderAmount,

MIN(OrderAmount) OVER(PARTITION BY Customercity) AS MinOrderAmount,

SUM(Orderamount) OVER(PARTITION BY Customercity) TotalOrderAmount

FROM [dbo].[Orders];

We get CustomerName and OrderAmount column along with the output of the aggregated function. We also get all rows available in the Orders table.

In the following screenshot, you can for CustomerCity Chicago, it performs aggregations (Avg, Min and Max) and gives values in respective columns.

Similarly, we can use other aggregate functions such as count to find out total no of orders in a particular city with the SQL PARTITION BY clause.

SELECT Customercity,

CustomerName,

OrderAmount,

COUNT(OrderID) OVER(PARTITION BY Customercity) AS CountOfOrders,

AVG(Orderamount) OVER(PARTITION BY Customercity) AS AvgOrderAmount,

MIN(OrderAmount) OVER(PARTITION BY Customercity) AS MinOrderAmount,

SUM(Orderamount) OVER(PARTITION BY Customercity) TotalOrderAmount

FROM [dbo].[Orders];

We can see order counts for a particular city. For example, we have two orders from Austin city therefore; it shows value 2 in CountofOrders column.

PARTITION BY clause with ROW_NUMBER()

We can use the SQL PARTITION BY clause with ROW_NUMBER() function to have a row number of each row. We define the following parameters to use ROW_NUMBER with the SQL PARTITION BY clause.

PARTITION BY column – In this example, we want to partition data on CustomerCity column
Order By: In the ORDER BY column, we define a column or condition that defines row number. In this example, we want to sort data on the OrderAmount column

SELECT Customercity,

CustomerName,

ROW_NUMBER() OVER(PARTITION BY Customercity

ORDER BY OrderAmount DESC) AS "Row Number",

OrderAmount,

COUNT(OrderID) OVER(PARTITION BY Customercity) AS CountOfOrders,

AVG(Orderamount) OVER(PARTITION BY Customercity) AS AvgOrderAmount,

MIN(OrderAmount) OVER(PARTITION BY Customercity) AS MinOrderAmount,

SUM(Orderamount) OVER(PARTITION BY Customercity) TotalOrderAmount

FROM [dbo].[Orders];

In the following screenshot, we get see for CustomerCity Chicago, we have Row number 1 for order with highest amount 7577.90. it provides row number with descending OrderAmount.

PARTITION BY clause with Cumulative total value

Suppose we want to get a cumulative total for the orders in a partition. Cumulative total should be of the current row and the following row in the partition.

For example, in the Chicago city, we have four orders.

CustomerCity	CustomerName	Rank	OrderAmount	Cumulative Total Rows	Cumulative Total
Chicago	Marvin	1	7577.9	Rank 1 +2	14777.51
Chicago	Lawrence	2	7199.61	Rank 2+3	14047.21
Chicago	Alex	3	6847.66	Rank 3+4	8691.49
Chicago	Jerome	4	1843.83	Rank 4	1843.83

In the following query, we the specified ROWS clause to select the current row (using CURRENT ROW) and next row (using 1 FOLLOWING). It further calculates sum on those rows using sum(Orderamount) with a partition on CustomerCity ( using OVER(PARTITION BY Customercity ORDER BY OrderAmount DESC).

SELECT Customercity,

CustomerName,

OrderAmount,

ROW_NUMBER() OVER(PARTITION BY Customercity

ORDER BY OrderAmount DESC) AS "Row Number",

CONVERT(VARCHAR(20), SUM(orderamount) OVER(PARTITION BY Customercity

ORDER BY OrderAmount DESC ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING), 1) AS CumulativeTotal,

Similarly, we can calculate the cumulative average using the following query with the SQL PARTITION BY clause.

SELECT Customercity,

CustomerName,

OrderAmount,

ROW_NUMBER() OVER(PARTITION BY Customercity

ORDER BY OrderAmount DESC) AS "Row Number",

CONVERT(VARCHAR(20), AVG(orderamount) OVER(PARTITION BY Customercity

ORDER BY OrderAmount DESC ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING), 1) AS CumulativeAVG

ROWS UNBOUNDED PRECEDING with the PARTITION BY clause

We can use ROWS UNBOUNDED PRECEDING with the SQL PARTITION BY clause to select a row in a partition before the current row and the highest value row after current row.

In the following table, we can see for row 1; it does not have any row with a high value in this partition. Therefore, Cumulative average value is the same as of row 1 OrderAmount.

For Row2, It looks for current row value (7199.61) and highest value row 1(7577.9). It calculates the average for these two amounts.

For Row 3, it looks for current value (6847.66) and higher amount value than this value that is 7199.61 and 7577.90. It calculates the average of these and returns.

CustomerCity	CustomerName	Rank	OrderAmount	Cumulative Average Rows	Cumulative Average
Chicago	Marvin	1	7577.9	Rank 1	7577.90
Chicago	Lawrence	2	7199.61	Rank 1+2	7388.76
Chicago	Alex	3	6847.66	Rank 1+2+3	7208.39
Chicago	Jerome	4	1843.83	Rank 1+2+3+4	5867.25

Execute the following query to get this result with our sample data.

SELECT Customercity,

CustomerName,

OrderAmount,

ROW_NUMBER() OVER(PARTITION BY Customercity

ORDER BY OrderAmount DESC) AS "Row Number",

CONVERT(VARCHAR(20), AVG(orderamount) OVER(PARTITION BY Customercity

ORDER BY OrderAmount DESC ROWS UNBOUNDED PRECEDING), 1) AS CumulativeAvg

FROM [dbo].[Orders];

Conclusion

In this article, we explored the SQL PARTIION BY clause and its comparison with GROUP BY clause. We also learned its usage with a few examples. I hope you find this article useful and feel free to ask any questions in the comments below

Author
Recent Posts

Rajendra Gupta

Hi! I am Rajendra Gupta, Database Specialist and Architect, helping organizations implement Microsoft SQL Server, Azure, Couchbase, AWS solutions fast and efficiently, fix related issues, and Performance Tuning with over 14 years of experience.

I am the author of the book "DP-300 Administering Relational Database on Microsoft Azure". I published more than 650 technical articles on MSSQLTips, SQLShack, Quest, CodingSight, and SeveralNines.

I am the creator of one of the biggest free online collections of articles on a single topic, with his 50-part series on SQL Server Always On Availability Groups.

Based on my contribution to the SQL Server community, I have been recognized as the prestigious Best Author of the Year continuously in 2019, 2020, and 2021 (2nd Rank) at SQLShack and the MSSQLTIPS champions award in 2020.

Personal Blog: https://www.dbblogger.com
I am always interested in new challenges so if you need consulting help, reach me at rajendra.gupta16@gmail.com

View all posts by Rajendra Gupta

Latest posts by Rajendra Gupta (see all)

Understanding PostgreSQL SUBSTRING function - September 21, 2024
How to install PostgreSQL on Ubuntu - July 13, 2023
How to use the CROSSTAB function in PostgreSQL - February 17, 2023

Preparing Sample Data

Group By function syntax

SQL PARTITION BY

PARTITION BY clause with ROW_NUMBER()

PARTITION BY clause with Cumulative total value

ROWS UNBOUNDED PRECEDING with the PARTITION BY clause

Conclusion

Related posts: