Introduction
A big part of the technical debt in my organization’s data warehouse (DW) and business intelligence (BI) environments relates to hardcoded lookup data. This is data required by the business to make sense of transactional data but was never planned for in the underlying source system and consequently get injected into DW and BI solutions. Inevitably, it is only a matter of time before DW and BI team lose track of the places wherein the hardcoded data reside thus making it difficult to maintain. Furthermore, due to lack of documentation or staff retention, anyone who subsequently takes over these DW/BI solutions can unknowingly create duplicate lookup data. In this article, I explain how we reduced such technical debt in my organization by moving most of the hardcoded lookups into SQL Server Master Data Services (MDS).
The Challenges of Hardcoding Your Lookups
Figure 1 shows a sample QlikView dashboard that contains a breakdown of a total number of policy claims received by claimant’s gender. The problem with the information on this dashboard is that whilst it looks like gender ID 3 has submitted more policy claims compared to 1 and 2, consumers of this information may not readily understand what gender IDs 1, 2 and 3 actually stand for.
In fact, at first glance, you might easily accuse the BI team of delivering a poorly designed dashboard, but actually the representation of claimant’s gender as 1, 2 and 3 in the dashboard can be traced back to the input dataset received from source system, which is shown in Table 1.
ID | Gender | Date | Volumes |
1 | 1 | 31-Jan-17 | 111 |
2 | 2 | 28-Feb-17 | 86 |
3 | 3 | 31-Mar-17 | 325 |
4 | 1 | 31-Jan-17 | 201 |
5 | 2 | 28-Feb-17 | 199 |
6 | 3 | 31-Mar-17 | 303 |
7 | 1 | 31-Jan-17 | 217 |
8 | 2 | 28-Feb-17 | 280 |
9 | 3 | 31-Mar-17 | 319 |
Ideally, one would prefer that a data lookup for gender 1, 2 and 3 values should be handled by the source system development team but there could be circumstances that prevent or delay the implementation of such a lookup in the source system. So, reluctantly, the DW and BI teams could be asked (sometimes instructed) to temporarily implement a lookup solution within their platforms. The DW and BI teams could implement the lookup in the following manner:
-
DW Team’s Implementation of Hardcoded Lookup
The data warehouse team can create a derived dimension in a form of a SQL Server view that employs the UNION function to join several hardcoded SELECT statements as shown in Script 1.
12345678910111213CREATE VIEW [dbo].[DimGender]ASSELECT 1 AS [ID],'Male' AS [Description]UNIONSELECT 2 AS [ID],'Female' AS [Description]UNIONSELECT 3 AS [ID],'Other' AS [Description];GOScript 1: Hardcoded Lookup Dimension The BI team can then simply reference the newly created view as a lookup for gender in their dashboard. The updated dashboard which now includes gender description instead id gender ID is shown in Figure 2.
Figure 2: Updated QlikView Dashboard with Gender description -
BI Team’s Implementation of Hardcoded Lookup
Instead of relying on the DW team to first add a lookup dimension before it can be available for QlikView, the BI team can apply the hardcoding of gender values in their QlikView data model. This can be done by firstly capturing the hardcoded values in some excel document and then loading that excel document into QlikView as shown in Figure 3.
Figure 3: Loading of excel file into QlikView
As mentioned above, the main challenge with relying on hardcoded lookups is that whenever new lookup IDs are introduced in the transactional data, DW and BI solutions will remain outdated until you manually edit the lookup dataset again. To demonstrate such a scenario, I have altered the source dataset presented in Table 1 to include a 10th transaction that now includes gender ID 4 as highlighted in yellow in Table 2.
ID | Gender | Date | Volumes |
1 | 1 | 31-Jan-17 | 111 |
2 | 2 | 28-Feb-17 | 86 |
3 | 3 | 31-Mar-17 | 325 |
4 | 1 | 31-Jan-17 | 201 |
5 | 2 | 28-Feb-17 | 199 |
6 | 3 | 31-Mar-17 | 303 |
7 | 1 | 31-Jan-17 | 217 |
8 | 2 | 28-Feb-17 | 280 |
9 | 3 | 31-Mar-17 | 319 |
10 | 4 | 30-Apr-17 | 108 |
Once the data in QlikView is reloaded, you will immediately notice that the last vertical bar in the dashboard is missing gender description as shown Figure 4.
The gender description for the 4th bar is missing because the excel document used as a lookup for gender IDs hasn’t been updated to include the description for gender ID 4. Once gender ID 4 is captured into the excel lookup document and QlikView data model is reloaded, the missing gender description will then be resolved. The challenge with this kind of manual intervention depends on the number of files needed to be updated. Therefore, while the updating of a single lookup document could seem trivial, having to update several multiple lookup files that are stored in various locations can become very tedious. Thus, a more robust and dynamic mechanism is required to address the issue of dealing with hardcoded lookup data.
Solution
One such robust and dynamic mechanism involves implementing a master data management solution using SQL Server Master Data Services (MDS). MDS provides several features such as the ability to enable business users to define, capture and maintain master data (or lookups). It also reduces unnecessary time spent by IT resources (i.e. DW and BI team members) conducting data fixes. MDS ultimately provides a consolidated single view of domain data that can be referenced by different systems within an organization.
To demonstrate the power of MDS, we have imported the gender lookup data from an excel document into an MDS entity called Gender Entity which is shown in Figure 5. Through this web-based interface (or using an excel add-in for MDS), business users can manage descriptions for all gender IDs.
MDS further provides subscription views, which can be used by systems such as QlikView and Data Warehouse to access MDS data via the backend. Figure 6 shows a sample subscription view – GenderLookUp – that has been created to export Gender Entity data.
As soon as the subscription view is created, it will appear under the mdm schema in the Views sub-node of a SQL Server database configured to store MDS information, as shown in Figure 7.
Finally, just remember that any structural changes to MDS entities will not be propagated into subscription views unless the subscription view is regenerated.
Summary
Hardcoded lookup data can contribute to an organization’s technical debt if not handled correctly. Using applications such as SQL Server Master Data Services both data ware and business intelligence teams can take advantage of moving the responsibility for maintaining the data back to business users. The DW and BI teams can use subscription views to access data captured in the back-end.
See more
For BI documentation, consider ApexSQL Doc, a tool that documents SQL Server instances, databases, objects, SSIS packages, SSAS cubes, SSRS reports, Tableau server sites and SharePoint Server farms.
References
- Master Data Services Overview (MDS)
- Create a Subscription View to Export Data (Master Data Services)
- SQL UNION Operator