Introduction
Slowly Changing Dimensions in Data Warehouse is an important concept that is used to enable the historic aspect of data in an analytical system. As you know, the data warehouse is used to analyze historical data, it is essential to store the different states of data.
In data warehousing, we have fact and dimension tables to store the data. Dimensional tables are used to analyze the measures in the fact tables. In a data environment, data is initiated at operational databases and data will be extracted-transformed-loaded (ETL) to the data warehouse to suit the analytical environment.
Customer, Product are examples for Dimensional tables. These dimension attributes are modified over time and in the data warehouse, we need to maintain the history. In operational systems, we may overwrite the modified attributes as we may not need the historical aspects of data. Since our primary target in data warehousing is to analyze data with the perspective of history, we may not be able to simply overwrite the data and we need to implement special techniques to maintain the history considering analytical and volume aspects of the data warehouse. This implementation is done using Slowly Changing Dimensions in Data Warehouse.
What are Slowly Changing Dimensions
Before discussing the details of Slowly Changing Dimensions (SCDs), let us list the different slowly changing dimensions as shown in the below table.
SCD Type |
Summary |
Type 0 |
Ignore any changes and audit the changes. |
Type 1 |
Overwrite the changes |
Type 2 |
History will be added as a new row. |
Type 3 |
History will be added as a new column. |
Type 4 |
A new dimension will be added |
Type 6 |
Combination of Type 2 and Type 3 |
Now let us look at each type of slowly changing dimension.
SCD Type 0
There are situations where you ignore any changes. For example, when an employee joined an organization, there are joined related attributes such as joined Designation and JoinedDate, etc. that should not change over time.
The following is the example for Type 0 of Slowly Changing Dimensions in Data Warehouse.
In the above Customer Dimension, FirstDesignation, JoinedDate and DateFirstPurchase are the attributes that will not be updated which is Type 0 SCD.
SCD Type 1
In the Type 1 SCD, you simply overwrite data in dimensions. There can be situations where you don’t have the entire data when the record is initiated in the dimension. For example, when the customer record is initiated, you may not get all attributes. Therefore, when the customer record is initiated at the operational database, there will be empty or null records in the customer records. Once the ETL is executed, those empty records will be created in the data warehouse. Once these attributes are filled in the operational databases, that has to be updated in the data warehouse.
Type 1 SCDs are identifying if the existing attributes are null and you are receiving a value from the operational table.
In the above Customer Dimension table, the AnnualIncome of customers CustomerKey 11015 and 11019 are NULL. When these records are updated in the operational database, those values should be updated in the data warehouse without considering those are historical values.
SCD Type 2
Type 2 Slowly Changing Dimensions in Data warehouse is the most popular dimension that is used in the data warehouse. As we discussed data warehouse is used for data analysis. If you need to analyze data, you need to accommodate historical aspects of data. Let us see how we can implement SCD Type 2.
For the SCD Type 2, we need to include three more attributes such as StartDate, EndDate and IsCurrent as shown below.
In the above customer dimension, there are two records and let us say that customer whose CustomerCode is AW00011012, has been promoted to Senior Management. However, if you simply update the record with the new value, you will not see the previous records. Therefore, a new record will be created with a new CustomerKey and a new Designation. However, other attributes will be remaining the same.
As you can see in the above figure, CustomerCode AW00011012 has a new record with 11013. All the new transactions will be related to CustomerKey 11013 while previous transactional are related to CustomerKey 11012. This mechanism helps to preserve the historic aspect of the customer as shown in the below query.
1 2 3 4 |
SELECT C.Designation,SUM(SalesAmount) SalesAmount,SUM(TotalProductCost) TotalProductCost FROM FactInternetSales F INNER JOIN Dim_Customer C ON F.CustomerKey = C.CustomerKey GROUP BY C.Designation |
Once the query is executed, the following results will be observed.
As you can see Management designation can be seen in the above result which means that it has covered the historical aspects. Type 2 SCD is one of the implementations where you cannot avoid surrogate keys in dimensional tables in the data warehouse.
SCD Type 3
Type 3 Slowly Changing Dimension in Data warehouse is a simple implementation where history will be kept in the additional column. If we relate the same scenario that we discussed under Type 2 SCD to Type 3 SCD, the customer dimension would look like below.
As you can see, historical aspects of the data are preserved as a different column. However, this method will not be scalable if you want to preserve history. Further, this technique will allow only to keep the last version of the history, unlike Type 2 SCD.
Typically, this would be better suited to implement name changes of an employee. In some cases, female employees will change their names after their marriage. In such situations, you can use Type 3 SCD since these types of changes will not occur rapidly.
SCD Type 4
As we discussed in SCD type 2, we maintain the history by adding a different version of the row to the dimension. However, if the changes are rapid in nature Type 2 SCD will not be scalable.
For example, let us assume we want to keep the customer risk type depending on his previous payment. Since this is an attribute related to the customer, it should be stored in a customer dimension. This means every month there will be a new version of the customer record. If you have 1000 customers, you are looking at 12,000 records per month. As you can imagine this Slowly Changing Dimensions in Data Warehouse is not scalable.
Following is the relationship between the Fact and the Customer Dimension table.
SCD Type 4 is introduced in order to fix this issue. In this technique, a rapidly changing column is moved out of the dimension and is moved to a new dimension table. This new dimension is linked to the fact table as shown in the below diagram.
With the above implementation of Type 4 Slowly Changing Dimensions in Data Warehouse, you are eliminating the unnecessary volume in the main dimension. However, still you have the capabilities of performing the required analysis.
SCD Type 6
Type 6 Slowly Changing Dimensions in Data Warehouse is a combination of Type 2 and Type 3 SCDs. This means that Type 6 SCD has both columns are rows in its implementation.
With this implementation, you can further improve the analytical capabilities in the data warehouse. If you want to find out an analysis between current and historical occupation, you can use the following query.
1 2 3 4 5 |
SELECT C.Occupation,C.CurrentOccupation, SUM(SalesAmount) SalesAmount,SUM(TotalProductCost) TotalProductCost FROM FactInternetSales F INNER JOIN Dim_Customer C ON F.CustomerKey = C.CustomerKey GROUP BY C.Occupation,C.CurrentOccupation |
Above query will provide the following result:
Without Type 6, Slowly Changing Dimensions in Data Warehouse, complex queries have to be used.
In the Type 6 SCD, not only the current occupation, you can use the first occupation as well in order to provide more analysis.
Conclusion
Slowly Changing Dimensions in Data Warehouse are used to perform different analyses. This article provides details of how to implement Different types of Slowly Changing Dimensions such as Type 0, Type 1, Type 2, Type 3, Type 4 and Type 6. Type 2 and Type 6 are the most commonly used dimension in a data warehouse.
- Testing Type 2 Slowly Changing Dimensions in a Data Warehouse - May 30, 2022
- Incremental Data Extraction for ETL using Database Snapshots - January 10, 2022
- Use Replication to improve the ETL process in SQL Server - November 4, 2021