In this article, I am going to give a detailed explanation of how to use the SQL MERGE statement in SQL Server. The MERGE statement in SQL is a very popular clause that can handle inserts, updates, and deletes all in a single transaction without having to write separate logic for each of these. You can specify conditions on which you expect the MERGE statement to insert, update, or delete, etc.
Using the MERGE statement in SQL gives you better flexibility in customizing your complex SQL scripts and also enhances the readability of your scripts. The MERGE statement basically modifies an existing table based on the result of comparison between the key fields with another table in the context.
Figure 1 – MERGE Illustration
The above illustration depicts how a SQL MERGE statement basically works. As you can see, there are two circles that represent two tables and can be considered as Source and a Target. The MERGE statement tries to compare the source table with the target table based on a key field and then do some of the processing. The MERGE statement actually combines the INSERT, UPDATE, and the DELETE operations altogether. Although the MERGE statement is a little complex than the simple INSERTs or UPDATEs, once you are able to master the underlying concept, you can easily use this SQL MERGE more often than using the individual INSERTs or UPDATEs.
Applications of the SQL MERGE statement
In a typical SQL Data warehouse solution, it is often essential to maintain a history of data in the warehouse with a reference to the source data that is being fed to the ETL tool. A most common use case is while trying to maintain Slowly Changing Dimensions (SCD) in the data warehouse. In such cases, you need to insert new records into the data warehouse, remove or flag records from the warehouse which are not in the source anymore, and update the values of those in the warehouse which have been updated in the source.
The SQL MERGE statement was introduced in the SQL Server 2008 edition which allowed great flexibility to the database programmers to simplify their messy code around the INSERT, UPDATE and DELETE statements while applying the logic to implement SCD in ETL.
Optimizing the performance of the SQL MERGE statement
There are a few aspects using which you can optimize the performance of your MERGE statements. Having said that, it means now you can write all your DML statements (INSERT, UPDATE, and DELETE) combined in a single statement. From a data processing perspective, this is quite helpful as it reduces the I/O operations from the disk for each of the three statements individually and now data is being read from the source only once.
Also, the performance of the MERGE statement greatly depends on the proper indexes being used to match both the source and the target tables. Apart from indexes, it is also essential that the join conditions are optimized as well. We should also try to filter the source table so that only necessary records are being fetched by the statement to do the necessary operations.
Hands-on with the MERGE statement
Now that we have gathered enough information regarding how the MERGE statement works, lets us go ahead and try to implement the same practically. For the purpose of this tutorial, I am going to create a simple table and insert a few records in it. You can use the following SQL script to create the database and tables on your machine.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
CREATE DATABASE SqlShackMergeDemo GO USE SqlShackMergeDemo GO CREATE TABLE SourceProducts( ProductID INT, ProductName VARCHAR(50), Price DECIMAL(9,2) ) GO INSERT INTO SourceProducts(ProductID,ProductName, Price) VALUES(1,'Table',100) INSERT INTO SourceProducts(ProductID,ProductName, Price) VALUES(2,'Desk',80) INSERT INTO SourceProducts(ProductID,ProductName, Price) VALUES(3,'Chair',50) INSERT INTO SourceProducts(ProductID,ProductName, Price) VALUES(4,'Computer',300) GO CREATE TABLE TargetProducts( ProductID INT, ProductName VARCHAR(50), Price DECIMAL(9,2) ) GO INSERT INTO TargetProducts(ProductID,ProductName, Price) VALUES(1,'Table',100) INSERT INTO TargetProducts(ProductID,ProductName, Price) VALUES(2,'Desk',180) INSERT INTO TargetProducts(ProductID,ProductName, Price) VALUES(5,'Bed',50) INSERT INTO TargetProducts(ProductID,ProductName, Price) VALUES(6,'Cupboard',300) GO SELECT * FROM SourceProducts SELECT * FROM TargetProducts |
Figure 2 – Sample Data inserted
Now that the database is ready, the next step I am going to perform is to apply the MERGE statement and try to get both the tables to synchronize with each other. The first operation that we are trying to see is how to manage the INSERTs. You can copy and paste the below SQL code to merge the new data from the source to the target table.
1 2 3 4 5 6 7 8 9 |
USE SqlShackMergeDemo GO MERGE TargetProducts AS Target USING SourceProducts AS Source ON Source.ProductID = Target.ProductID WHEN NOT MATCHED BY Target THEN INSERT (ProductID,ProductName, Price) VALUES (Source.ProductID,Source.ProductName, Source.Price); |
Figure 3 – MERGE operation performed on the source and target tables
As you can see, the two records with ProductID 3 and 4, which were not present in the target table are now inserted. This operation is done by matching the source and the target tables based on the ProductID field.
Now that we have learned how to insert records using the SQL MERGE statement, let us learn how to update the values in the same statement. In order to update the values, the ProductID field must have a common value in both the source and the target tables. Only then the database engine will be able to match the records and the update operation can be performed on the columns that have been specified.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
USE SqlShackMergeDemo GO MERGE TargetProducts AS Target USING SourceProducts AS Source ON Source.ProductID = Target.ProductID -- For Inserts WHEN NOT MATCHED BY Target THEN INSERT (ProductID,ProductName, Price) VALUES (Source.ProductID,Source.ProductName, Source.Price) -- For Updates WHEN MATCHED THEN UPDATE SET Target.ProductName = Source.ProductName, Target.Price = Source.Price; |
Figure 4 – Record updated using the MERGE statement
As you can see in the figure above, the initial value for the product “Desk” in the target table was mentioned as “180.00”. When the SQL MERGE statement was executed, it updated the values for all the matched records that had an entry in the source. Also, if you notice the SQL script now, you can see that I have just added the update script after the insert statement, and that means all the inserts and the updates are being executed in the same script itself.
Let us now see how to delete or remove records from the target table in the same script itself.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
USE SqlShackMergeDemo GO MERGE TargetProducts AS Target USING SourceProducts AS Source ON Source.ProductID = Target.ProductID -- For Inserts WHEN NOT MATCHED BY Target THEN INSERT (ProductID,ProductName, Price) VALUES (Source.ProductID,Source.ProductName, Source.Price) -- For Updates WHEN MATCHED THEN UPDATE SET Target.ProductName = Source.ProductName, Target.Price = Source.Price -- For Deletes WHEN NOT MATCHED BY Source THEN DELETE; |
Figure 5 – Records deleted using the MERGE statement
Now, if you see, all records with ProductID 5 and 6 are being deleted from the target table since these records are not available in the source. In this way, you can implement a SQL MERGE statement in a very simple yet powerful way and can handle complex business requirements.
If you would like to see a summary of all the actions that have been performed by the MERGE statement, then you may modify your existing script and include the following output actions. It will return us a list of records on which we have performed the merge and what operation has been executed on that particular record.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
USE SqlShackMergeDemo GO MERGE TargetProducts AS Target USING SourceProducts AS Source ON Source.ProductID = Target.ProductID -- For Inserts WHEN NOT MATCHED BY Target THEN INSERT (ProductID,ProductName, Price) VALUES (Source.ProductID,Source.ProductName, Source.Price) -- For Updates WHEN MATCHED THEN UPDATE SET Target.ProductName = Source.ProductName, Target.Price = Source.Price -- For Deletes WHEN NOT MATCHED BY Source THEN DELETE -- Checking the actions by MERGE statement OUTPUT $action, DELETED.ProductID AS TargetProductID, DELETED.ProductName AS TargetProductName, DELETED.Price AS TargetPrice, INSERTED.ProductID AS SourceProductID, INSERTED.ProductName AS SourceProductName, INSERTED.Price AS SourcePrice; |
Figure 6 – Checking output actions by the merge statement
Important things to remember while implementing SQL MERGE
Although we have now understood how to write the MERGE statement from scratch and how to modify the script to include logic for handling inserts, updates and deletes, there are also some other key important points that we should keep in mind while preparing the scripts.
- Every MERGE statement must end with a semi-colon. If a semi-colon is not present at the end of the MERGE statement, then an error will be thrown
- You can use SELECT @@RowCount after writing the MERGE statement which will return the number of records that have been modified by the transaction
- It is mandatory that one of the MATCHED clauses is provided in order for the MERGE statement to operate
Conclusion
In this article, I have explained in detail about the SQL MERGE statement. This MERGE statement has been introduced in the SQL Server 2008 which brought a great revolution in writing simpler and maintainable code in SQL. The MERGE statement takes in two tables – a source and a target and compares the records based on a key column, often the index column, and then performs an operation on it. Being a database developer, I would definitely advise all young programmers to start using the SQL MERGE statement more frequently while using complex stored procedures in SQL.
- Getting started with PostgreSQL on Docker - August 12, 2022
- Getting started with Spatial Data in PostgreSQL - January 13, 2022
- An overview of Power BI Incremental Refresh - December 6, 2021