Introduction
In the article, Multiple Options to Transposing Rows into Columns, I covered various options available in SQL Server to rotating a given row into columns. One of the options included the use of a PIVOT relational operator. The mandatory requirement of the operator is that you must supply the aggregate function with only a numeric data type. Such a mandatory requirement is usually not an issue as most aggregations and subsequent pivoting is performed against fields of numeric data type. However, sometimes the nature of business reporting requests may be such that you are required to cater for pivoting against non-numeric data types. In this article we take a look at how you can deal with such requirements by introducing a workaround to pivoting on non-numeric fields.
Similarly to the aforementioned article, a denormalized relational representation of our case study looks as shown in Table 1:
RecKey | Policy | PolType | Effective Date | DocID | DocName | Submitted | Outstanding | Submitted Date | Captured By ID |
1 | Pol002 | Hospital Cover | 2007/10/01 | 1 | Doc A | NULL | 1 | NULL | NULL |
2 | Pol002 | Hospital Cover | 2007/10/01 | 4 | Doc B | NULL | 1 | NULL | NULL |
3 | Pol002 | Hospital Cover | 2007/10/01 | 5 | Doc C | 1 | NULL | 2016/01/13 | 1 |
4 | Pol002 | Hospital Cover | 2007/10/01 | 7 | Doc D | 1 | NULL | 2016/01/14 | 2 |
5 | Pol002 | Hospital Cover | 2007/10/01 | 10 | Doc E | 1 | NULL | 2016/01/15 | 1 |
Workaround #1: Pivot on Dates
A simple business case that can be used to address this workaround is that business may be trying to determine a breakdown of documents captured by date. Ultimately, the aim of such a business case would to have a view of data that looks as shown in Table 2:
Policy | PolType | Effective Date | Doc A | Doc B | Doc C | Doc D | Doc E |
Pol002 | Hospital Cover | 01-Oct-07 | 0 | 0 | 13 Jan 2016 | 14 Jan 2016 | 15 Jan 2016 |
The first step to pivoting on date fields is to convert your date field to an integer value. Figure 1 shows how we have achieved this through a derived integer field, [Submitted Date INT], which is based off the [Submitted Date] field.
The complete pivot script is depicted in Figure 2:
The execution of the script in Figure 3 result into the data shown in Table 3:
Policy | PolType | Effective Date | Doc A | Doc B | Doc C | Doc D | Doc E |
Pol002 | Hospital Cover | 01-Oct-07 | 0 | 0 | 20160113 | 20160114 | 20160115 |
However, often business people prefer to view data related dates in a familiar format (i.e. dd/mm/yyyy, dd-mm-yyyy etc.). So we can further extend our script to format the dates according to business friendly format.
There are two ways to achieve this:
1. Formatting SQL Server Date Using Date Dimension
If you write your scripts within a data warehouse or have access to a data warehouse or have a database environment that has a date dimension, then you can make use of the dimension to format the date. Date dimension is a database object that is setup according to a Ralph Kimball’s Data Warehouse design methodology. The dimension usually contain attributes such as date name, quarter, year, and calendar months. Date dimension allows for the breakdown of report data by period over time.
Figure 3 shows some of the content contained in a date dimension.
One of the common ways to lookup data in the date dimension is to use a numeric date key. In our case, we already have derived date column ([Submitted Date INT]) which is in an integer format. Figure 4, shows our revised pivoted script joined to the date dimension.
As it can be seen there are several instances of the dim date to the fields that we are using as pivots. For instance, the first condition is to join dim date to doc a and so on. As argued in the aforementioned article, this way of doing this only suffer from one thing: it is not dynamic – meaning, if business later introduce Doc F as a document that should be submitted in order to process a claim, than a developer would have to refactor the script to include the new doc column and another join condition to dim date to display a custom date format for the newly added column.
The results of the script depicted in Figure 4 are shown in Table 4:
Policy | PolType | Effective Date | Doc A | Doc B | Doc C | Doc D | Doc E |
Pol002 | Hospital Cover | 01 Oct 2007 | 0 | 0 | 13 Jan 2016 | 14 Jan 2016 | 15 Jan 2016 |
2. Convert Integer to SQL Server Date
An alternative to converting a SQL Server date is not to use a join to date dimension instead you convert the numeric values back to business friendly date using T-SQL Convert function. This is done through a nested convert function in which you first convert the numeric value to variable character (varchar), then convert varchar value to date (which is usually in the format yyyy-mm-dd) and finally convert the format of the date to dd mmm yyyy. The complete syntax looks as shown in Figure 5:
The complete script is shown in Figure 6.
Workaround #2: Pivot on Varchar Data Types
In my experience working within an insurance industry, I have noticed that there are often business users whose job is solely to receive and capture claim documents submitted by clients. Thus, another fitting business case would be to have a breakdown of claim documents captured by users as shown in Table 5.
Policy PolType Effective Date Doc A Doc B Doc C Doc D Doc E | |||||||
Pol002 | Hospital Cover | 01 Oct 2007 | 0 | 0 | Jane Doe | John Doe | Jane Doe |
Similar to pivoting on date data types, the key to pivoting on varchar is identifying a field that can be used as an input parameter in the aggregate function. In our case, every user is assigned a numeric user ID as shown in Figure 7 (this user id can then be used to look-up user full name).
The revised script that now pivots on captured by ID is shown in Figure 8. It can also be seen that after the pivoting of data, we then join back on user id to the User dimension to retrieve associated user full name.
Conclusion
In this article we’ve had a look at available workarounds to pivoting using non-numeric fields. The key to pivoting using non-numeric fields is that you need to find a way to trick the aggregate function as it strictly takes in numeric values. Thus you ought to find a way to either find a numeric surrogate field (i.e. corresponding user id of user field) or derive such a field (i.e. converting date to integer) to be used as input parameters in the aggregate function.