A data warehouse has to be historically correct. This becomes an issue when data like the Product List Price for a previous year needs to be saved historically. Dimensional Modeling methodologies provide a solution for the situation. The Slowly Changing method integrated with components from SQL Server Integration Services solves the issue. This article will look at updating a product dimension table using the Slowly Changing Type 2 Dimension while maintaining the Type 1 columns.
Slowly Changing Type 1 (SC1) refers to columns in a dimension table that are overwritten with new data. Say the color was entered incorrectly. The dimension process will need to update the incorrect value. The historical reporting will change but the business wants this. Slowly Changing Type 2 (SC2) refers to the example of the ListPrice changing from year to year. The reports from the previous year will need to include the List Price for that year. The dimension table will track multiple rows for the products with historical data in the previous rows based on a date range.
The regular product dimension table used in this article appears in Code Block 1.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
CREATE TABLE [dbo].[DimProduct]( [ProductSKey] [int] IDENTITY(1,1) NOT NULL, [ProductID] [nvarchar](25) NULL, [ProductName] [nvarchar](50) NOT NULL, [Color] [nvarchar](15) NOT NULL, [ListPrice] [money] NULL, [Class] [nchar](2) NULL, [ProductSubcategorySKey] [int] NULL CONSTRAINT [PK_DimProduct_ProductSKey] PRIMARY KEY CLUSTERED ( [ProductSKey] ASC ) ON [PRIMARY]) ON [PRIMARY] GO ALTER TABLE [dbo].[DimProduct] WITH CHECK ADD CONSTRAINT [FK_DimProduct_DimProductSubcategory] FOREIGN KEY([ProductSubcategorySKey]) REFERENCES [dbo].[DimProductSubcategory] ([ProductSubcategorySKey]) GO ALTER TABLE [dbo].[DimProduct] CHECK CONSTRAINT [FK_DimProduct_DimProductSubcategory] GO |
The following columns will be added to the product dimension table.
- StartDate (DateTime) – Start date for this product to being active and related to new sales
- Status (Boolean) – 0 means not active, 1 means active
- Enddate (DateTime) – End date for this row to be inactive but still related to historical sales
The data would look like the following for Slowly Change Type 2 (SC2) products.
Key | ID | Name | Color | ListPrice | SubcatSKey | StartDate | EndDate | Status |
243 | FR-R92R-44 | HL Road Frame – Red, 44 | Red | 1431.50 | 14 | 2016-07-01 | NULL | 1 |
242 | FR-R92R-44 | HL Road Frame – Red, 44 | Red | 1263.4598 | 14 | 2014-07-01 | 2015-06-30 | 0 |
241 | FR-R92R-44 | HL Road Frame – Red, 44 | Red | 1301.3636 | 14 | 2015-07-01 | 2016-06-30 | 0 |
246 | FR-R92R-48 | HL Road Frame – Red, 48 | Red | 1431.50 | 14 | 2016-07-01 | NULL | 1 |
245 | FR-R92R-48 | HL Road Frame – Red, 48 | Red | 1263.4598 | 14 | 2014-07-01 | 2015-06-30 | 0 |
244 | FR-R92R-48 | HL Road Frame – Red, 48 | Red | 1301.3636 | 14 | 2015-07-01 | 2016-06-30 | 0 |
In Table 1, the Product ID FR-R92R-44 (HL Road Frame – Red 44) has 3 rows. The data seems to be duplicated in the table, but the StartDate, EndDate, and Status do label each row for when it is valid. The Null in the EndDate along with the Status of 1, shows the Key row 243 to be the active row for this product.
SQL Server Integration Services provides a Slowly Changing Dimension component (it is actually a wizard), but sometimes it is better to build it with other components. This gives the package more flexibility when updating the dimension table with additional columns. The package will look like any dimension table import. Figure 2 shows the Control Flow for the product dimension.
The additional Execute SQL Task, named Update Product Changes, allows a set base update on the DimProduct table instead of using an OLEDB Command component in a Data Flow Task. An OLEDB Command used to execute a T-SQL UPDATE might be a performance problem when there are many rows to be updated. This is referred to as “Rows by Agonizing Rows” or RBAR.
Figure 2 shows the Data Flow Task for the Product Dimension. The part that needs to be modified is the Conditional Split. Here, a Multicast needs to be added to insert a new row for the Slowly Changing Type 2 (SC2) data in the Product table plus a pipe to a check for Slowly Changing Type 1 (SC1) changes. The code for SC2 needs to check for ListPrice changes. If the ListPrice changes, then a new row will be added to the product dimension table and the existing current product row needs the EndDate to be updated to the end of yesterday and the Status changed to 0 (Inactive).
Figure 3 shows the Change to the Lookup to see if an existing Product is in the Dimension table. The Status = 1 WHERE clause will only return active Products in the Data Warehouse. The columns selected (ListPrice, Color, etc.) will be compared to the existing dimension table for changes – SC1 and SC2.
The Multicast will be placed before the Conditional Split and this Conditional Split (SC1) will not look at ListPrice, only Color, and Class. These 2 columns have been determined to be Slowly Changing Type 1 which means overwrite existing Product Dimension rows whether active or inactive.
An additional Conditional Split (Figure 4) will be added after the Multicast to compare the ListPrice. A Union ALL component (Figure 5) will be added to merge existing new rows from the transactional database and the new SC2 rows from the new Conditional Split. Another Multicast will pipe the new SC2 rows to Union All and a Staging table.
Figure 4 shows the Conditional Split comparison for ListPrice column. Note here that there is no check for Nulls. This Data Warehouse does not allow Nulls and the source query will return NA or Unknown for Null values in the source data like the code in Code Block 2.
1 2 3 4 5 6 7 8 9 10 |
SELECT p.ProductNumber AS ProductID, p.Name AS ProductName , IsNull( p.Color, 'NA') AS Color, IsNull( p.Class, 'NA') AS Class , IsNull( p.ListPrice, 0) AS ListPrice , ISNULL(p.ProductSubcategoryID, -1) AS ProductSubcategoryID , CAST( CAST( GETDATE() AS DATE) AS DATETIME) AS StartDate , DATEADD( SECOND, -1, CAST( CAST( GETDATE() AS DATE) AS DATETIME) ) AS EndDate , 1 AS ProductActive FROM Production.Product p |
The Data Flow Task (Figure 5) is now a little more complicated, but manageable.
Going back to the Control Flow, there are 2 Execute T-SQL Statement components rather than one as seen in Figure 6.
The Update statements for the SC1 changes are in Table 2 and the SC2 updates are in Code Blocks 3 and 4. They are updated based on a join to the staging table for the changed data.
1 2 3 4 5 6 7 8 |
UPDATE dbo.DimProduct SET Color = s1.Color, Class = s1.Class FROM dbo.DimProduct dp INNER JOIN StageDW.dbo.StageDimproduct sdp ON sdp.matchProductSKey = dp.ProductSKey |
1 2 3 4 5 6 7 8 9 |
UPDATE dbo.DimProduct SET EndDate = s1.EndDate , [Status] = 0 FROM dbo.DimProduct dp INNER JOIN StageDW.dbo.StageProduct_SC2 s2 ON sdp.matchProductSKey = dp.ProductSKey |
SQL Server Integration Services can handle almost any data import situation that is given it. There are so many options to accomplish things like Slowly Changing Dimensions. Starting with an opportunity like this helps individuals dig deeper into the functionality of SQL Server. As newer releases become available, it is noticeable that Microsoft is improving the tools that make Data Management more attainable.
Useful links
- Kimball Group Slowly Changing Dimension
- Kimball Group Slowly Changing Dimension Part II
- Conditional Split
- Performance tuning – Nested and Merge SQL Loop with Execution Plans - April 2, 2018
- Time Intelligence in Analysis Services (SSAS) Tabular Models - March 20, 2018
- How to create Intermediate Measures in Analysis Services (SSAS) - February 19, 2018