This article will provide a deep dive into the SQL Union operator, describing its many uses along with examples and explore some common questions like the differences between Union vs Union All.
To address real-world data requirements, we may need to combine result sets from multiple data sources so that we could do data analysis or create new datasets. The datasets may be identical but there are chances that they reference different tables. Is there a way to combine the data in a single query? Are Set Operators a viable option? Let’s get started and see how some of the existing operators can be used to help us address these common challenges.
In this article, we’ll review:
- What a Set operator is
- Union vs Union All and how they work
- Discuss the rules for using Union vs Union All
- SQL Operator Syntax
- How to use simple SQL Union clause in the select statement
- How to use SQL Union with the queries that have the WHERE clause
- How to use the SELECT INTO clause with Union
- How to use SQL Union with the queries that have a WHERE clause and order by clause
- How to use SQL Union and SQL Pivot
- How to use SQL Union with GROUP and HAVING clauses
Operators
An operator is a symbol or a keyword defines an action that is performed on one or more expressions in the Select statement.
Set Operator
Let’s get into the details of Set Operators in SQL Server, and how to use them
There are four basic Set Operators in SQL Server:
- Union
- Union All
- EXCEPT
- INTERSECT
Union
The Union operator combines the results of two or more queries into a distinct single result set that includes all the rows that belong to all queries in the Union. In this operation, it combines two more queries and removes the duplicates.
For example, the table ‘A’ has 1,2, and 3 and the table ‘B’ has 3,4,5.
The SQL equivalent of the above data is given below
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
( SELECT 1 ID UNION SELECT 2 UNION SELECT 3 ) UNION ( SELECT 3 UNION SELECT 4 UNION SELECT 5 ); |
In the output, you can see a distinct list of the records from the two result sets
Union All
When looking at Union vs Union All we find they are quite similar, but they have some important differences from a performance results perspective.
The Union operator combines the results of two or more queries into a single result set that includes all the rows that belong to all queries in the Union. In simple terms, it combines the two or more row sets and keeps duplicates.
For example, the table ‘A’ has 1,2, and 3 and the table ‘B’ has 3,4,5.
The SQL equivalent of the above data is given below
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
( SELECT 1 ID UNION SELECT 2 UNION SELECT 3 ) UNION ALL ( SELECT 3 UNION SELECT 4 UNION SELECT 5 ); |
In the output, you can see all the rows that include repeating records as well.
INTERSECT
The interest operator keeps the rows that are common to all the queries
For the same dataset from the aforementioned example, the intersect operator output is given below
The SQL Representation of the above tables
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
( SELECT 1 ID UNION SELECT 2 UNION SELECT 3 ) INTERSECT ( SELECT 3 UNION SELECT 4 UNION SELECT 5 ); |
The row ‘3’ is common between the two result sets.
EXCEPT
The EXCEPT operator lists the rows in the first that are not in the second.
For the same dataset from the aforementioned example, the Except operator output is given below
The SQL representation of the above tables with EXCEPT operator is given below
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
( SELECT 1 [Non-Common from only A ] UNION SELECT 2 UNION SELECT 3 ) EXCEPT ( SELECT 3 B UNION SELECT 4 UNION SELECT 5 ); |
List the non-common rows from the first set.
Note: It is very easy to visualize a set operator using a Venn diagram, where each of the tables is represented by intersecting shapes. The intersections of the shapes, where the tables overlap, are the rows where a condition is met.
Syntax:
The syntax for the Union vs Union All operators in SQL is as follows:
SELECT Column1, Column2, … ColumnN
FROM <table>
[WHERE conditions]
[GROUP BY Column(s]]
[HAVING condition(s)]
UNION
SELECT Column1, Column2, … ColumnN
FROM table
[WHERE condition(s)];
ORDER BY Column1,Column2…
Rules:
There are a few rules that apply to all set operators:
- Expressions in each row or the number of columns that are defined in each query must have the same order
- Subsequent SQL statement row sets must match the data type of the first query
- Parentheses are allowed to construct other set operators in the same statement
- It possible to have an ORDER BY clause, but that should be the last statement of the SQL
- GROUP BY and HAVING clauses can be applied to the individual query
Note:
- All of these Set operators remove duplicates, except for the Union All operator
- The output column names are referred from the first query i.e. when we run the SELECT statements with any of the Set operators and result set of each of the queries may have different column names, so the result of the SELECT statement refers the column names from the first query in the operation.
- SQL JOIN is more often used combine columns from multiple related tables whereas SET Operators combines rows from multiple tables.
- When the expression types are the same but differ in precision, scale, or length, the result is determined based on the same rules for combining expressions
Examples:
The following T-SQL queries are prepared and run against the Adventureworks2014 database. You can download the sample AdventireWorks2014 database here
How to use simple SQL Union clause in select statement
In this example, the result set includes a distinct set of rows from the first set and second set.The following example is based on the rule 1, 3 & 5.
123456789101112131415161718192021222324252627SELECT *FROM((SELECT 1 AUNIONSELECT 2UNIONSELECT 3)UNION(SELECT 3 BUNIONSELECT 4UNIONSELECT 5)UNION ALL(SELECT 8 cUNIONSELECT 9UNIONSELECT 1)) T;The output is a combination of Union and Union All operators using parenthesis.
.
How to use SQL Union with the queries that have the WHERE clause
The following example shows the use of Union in two SELECT statements with a WHERE clause and ORDER BY clause.
The following example is based on the rule 1,2 and 3
12345678910SELECT P1.ProductModelID,P1.NameFROM Production.ProductModel P1WHERE ProductModelID IN(3, 4)UNIONSELECT P2.ProductModelID,P2.NameFROM Production.ProductModel P2WHERE P2.ProductModelID IN(3, 4)ORDER BY P1.Name;
How to use SELECT INTO clause with SQL Union
The following example creates a new dbo.dummy table using the INTO clause in the first SELECT statement which holds the final result set of the Union of the columns ProductModel and name from two different result sets. In this case, it is derived from the same table but in a real-world situation, this can also be two different tables. The following example is based on the rule 1 , 2 and 4.
12345678910111213141516DROP TABLE IF EXISTS dbo.dummy;SELECT P1.ProductModelID,P1.NameINTO dummyFROM Production.ProductModel P1WHERE ProductModelID IN(3, 4)UNIONSELECT P2.ProductModelID,P2.NameFROM Production.ProductModel P2WHERE P2.ProductModelID IN(3, 4)ORDER BY P1.Name;GOSELECT *FROM dbo.Dummy;
How to use SQL Union with the queries that have a WHERE clause and ORDER BY clause
This is only possible when we use TOP or aggregate functions in every select statement of the Union operator. In this case, top 10 rows are listed from each result set and combined the rows using Union clause to get a final result. You also see that the order by clause is placed in all the select statement.
123456789101112131415161718192021SELECT a.ModelID,a.NameFROM(SELECT TOP 10 ProductModelID ModelID,NameFROM Production.ProductModelWHERE ProductModelID NOT IN(3, 4)ORDER BY Name DESC) aUNIONSELECT b.ProductModelID,b.NameFROM(SELECT TOP 10 ProductModelID,NameFROM Production.ProductModelWHERE ProductModelID IN(5, 6)ORDER BY Name DESC) b;
How to use of SQL Union and SQL Pivot
In the following example, we’re trying to combine more than one set of results. In a real-world situation, you may have financial numbers from various regions or departments and the tables might have the same columns and data types, but you want to put them in one set of rows and in a single report. In such a scenario, you would use the Union clause and it is very easy to combine results and transform the data into a more meaningful report.
In this example, the ProductModel is categorized into Top10, Top100, Top 100 and transforming the rows as an aggregated set of the values into the respective columns. The following example is based on the rule 2.
123456789101112131415161718192021222324SELECT MAX(Top10) Top10,MAX(Top100) Top100,MAX(Top1000) Top100FROM(SELECT COUNT(*) Top10,0 Top100,0 Top1000FROM Production.ProductModelWHERE ProductModelID < 10UNIONSELECT 0,COUNT(*),0FROM Production.ProductModelWHERE ProductModelID > 11AND ProductModelID < 100UNIONSELECT 0,0,COUNT(*)FROM Production.ProductModelWHERE ProductModelID > 101) T;NULL values are very important with the Set Operators and are treated as second-class database citizens. Because NULLs are considered unique and if two rows have a NULL in the same column, then they would be considered identical, so in that case, you are actually comparing a NULL to a NULL and getting equality. In the following example, you see the use of NULL values. In this case, it is operated with the aggregate function, max.
123456789101112131415161718192021222324SELECT MAX(Top10) Top10,MAX(Top100) Top100,MAX(Top1000) Top100FROM(SELECT COUNT(*) Top10,NULL Top100,NULL Top1000FROM Production.ProductModelWHERE ProductModelID < 10UNIONSELECT NULL,COUNT(*),NULLFROM Production.ProductModelWHERE ProductModelID > 11AND ProductModelID < 100UNIONSELECT NULL,NULL,COUNT(*)FROM Production.ProductModelWHERE ProductModelID > 101) T;
How to use SQL Union with Group and Having clauses
The following examples use the Union operator to combine the result of the table that all have the conditional clause defined using Group by and Having clause.
The lastname is parsed by specifying the conditions in the having clause.
The following example is based on rule 5.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
SELECT pp.lastname, COUNT(*) repeatedtwice, 0 Repeatedthrice FROM Person.Person AS pp JOIN HumanResources.Employee AS e ON e.BusinessEntityID = pp.BusinessEntityID GROUP BY pp.lastname HAVING COUNT(*) = 2 UNION SELECT pp.LastName, 0, COUNT(*) NtoZRange FROM Person.Person AS pp JOIN HumanResources.Employee AS e ON e.BusinessEntityID = pp.BusinessEntityID GROUP BY pp.LastName HAVING COUNT(*) > 2; |
We can see that the last names are derived into two different columns using the Union operator
That’s all for now…
Summary
Thus far, we addressed various available options and rules to use Set operators and understand when to use them. When determining whether to use Union vs Union All there are a few points to consider. Use Union All if you know that the result sets from multiple queries don’t overlap or generate duplicates and remember, if you need to use parentheses, you can do that. You can also pivot and transform the output.
Just make sure your data types match up and if you need to do an ORDER BY, do it at the end after you’ve run all those set queries. By nature, when it comes to Union vs Union All, Union All is faster than Union; as Union operators incur additional overhead to eliminate duplicates.
I hope you enjoyed this article on the SQL Union operator. Feel free to ask any questions in the comments below.
- Stairway to SQL essentials - April 7, 2021
- A quick overview of database audit in SQL - January 28, 2021
- How to set up Azure Data Sync between Azure SQL databases and on-premises SQL Server - January 20, 2021