In this article, we will first give an overview of SSIS derived column transformation, then we will run an experiment to check if there is any difference between adding multiple expressions within one derived column transformation and adding a derived column transformation for each expression.
This article is the sixth article in the SSIS feature series which aims to remove confusion and to illustrate some of the differences between similar features provided by SQL Server Integration Services.
Derived Column Transformation
SSIS Derived Column Transformation is one of the SQL Server Integration Services that can be added within a Data Flow Task, it is used to add a new column to the data pipeline by applying SSIS expressions. The developer can choose whether to add a new derived column or to replace an existing column.
SSIS expressions are a combination of literals, functions, operators that yields a single data value. An expression can be composed of a single value (“abc”) or a variable (@[User::Variable1]) or a function (GETDATE()), or it can be more complex and contains some conditionals (CASE statement) or some mathematical operations or some string manipulation. To learn more about SSIS expressions you can refer to the following article from earlier in the series: SSIS Expression Tasks Vs Evaluating variables as expressions
SSIS Derived Column Transformation can be added from the SSIS toolbox.
Figure 1 – SSIS Derived Column description from SSIS toolbox
The SSIS Derived Column editor is composed of 4 parts as shown in the image below:
- Columns and Variables List
- Expressions built-in functions
- Derived Columns Grid
- Error Handling
Figure 2 – SSIS Derived Column Editor
The first two parts are explained in the previous article that we mentioned above. In this section, we will describe the other parts and we will try to give some of the most known expressions and use cases.
Derived Columns Grid
The Derived Columns Grid contains the following configurations:
- Derived Column Name: Specify the derived column name
- Derived Column: Select between <add new column> and <replace an existing column> options
- Expression: Write the SSIS expression that produce the derived column
- Data Type: This column is read-only and it shows the data type of the expression result
- Length: This column is read =-only and it shows the column length based on the expression result
- Scale: This column is read-only and it automatically sets the scale for numeric data based on the data type
- Precision: This column is read-only and it automatically sets the precision for numeric data based on the data type
- Code Page: This column can be edited and it automatically sets code page for the DT_STR (string) data type
The Derived Columns Grid contains an empty line that is used to add a new derived column. When start filling the derived column properties within this line a new empty line is added to allow the user to add more columns.
If the expression cannot be applied, its color is changed to red as shown in the image below:
Figure 3 – Invalid expression example
When you put the mouse cursor over the expression, a tooltip appears with the error description:
Figure 4 – Expression error message tooltip
To learn more about SSIS Derived Column Transformation you can refer to the following official documentation:
Error Handling
In SSIS Derived column, errors are mostly caused by expression evaluation errors, which occur if expressions that are evaluated at a run time perform invalid operations or become syntactically incorrect because of missing or incorrect data values.
Error output configuration is an option that is included in most of the Data Flow Task components. In this configuration you can select how this component behaves when an evaluation error or a truncation occurs:
- Fail component
- Ignore Failure
- Redirect Row (Error output Flow)
Figure 5 – Error output configuration
In this article, we have considered error output configuration as a part of SSIS Derived Column Transformation since it is required to implement many popular use cases such as TRY_PARSE SQL function (will be described in the next section).
To learn more about Error output configuration in SSIS, you can refer to the following documentation:
Examples and Use cases
In this section, we will list some of the most popular SSIS Derived Column examples and use cases and provide an expression or an external link that contains an example:
-
Concatenating first name and last name into one column:
[FirstName] + “ “ + [LastName]
-
Extract a part of a string:
SUBSTRING([Address],1,5)
-
Change date to integer in yyyyMMdd format:
(DT_I4)((DT_WSTR,4)YEAR([DateColumn]) +
RIGHT( “00” + (DT_WSTR,2)MONTH([DateColumn]),2) +
RIGHT( “00” + (DT_WSTR,2)DAY([DateColumn]),2))
- Extract file name from full file path:
- Implementing Try parse logic:
-
Replace Nulls:
REPLACENULL([Address],”Unknown”)
-
Remove Leading and Trailing spaces from string:
LTRIM(RTRIM([Address]))
Multiple Expressions Vs Multiple Derived Column Transformation
After describing the SSIS Derived Column Transformation component, now we will try to make a comparison between adding multiple expressions into one Derived Column Transformation or adding multiple Derived Column Transformations.
Performance
To compare both approaches from a performance perspective we have run the following experiment; we created two packages each one contains one Data Flow Task. We have used the [Person].[Person] table from the AdventureWorks2017 database as source. And we applied the following derived columns expressions in each package:
-
Name:
[FirstName] + ” ” + [LastName]
-
Full Name:
[FirstName] + ” ” + [MiddleName] + ” ” + [LastName]
-
Father Full Name:
[MiddleName] + ” ” + [LastName]
-
NumericDate:
(DT_I4)((DT_WSTR,4)YEAR(ModifiedDate) + RIGHT(“00” + (DT_WSTR,2)MONTH(ModifiedDate),2) + RIGHT(“00” + (DT_WSTR,2)DAY(ModifiedDate),2))
In the first package we added only one Derived column transformation as shown in the images below:
Figure 6 – First package derived column editor
Figure 7 – First package data flow task
And in the second package, each expression was added in a separate SSIS derived column transformation as shown in the image below:
Figure 8 – Second package data flow task
After executing both packages, the results show that adding multiple expressions into one SSIS derived column transformation lasted about 02.797 seconds while adding multiple SSIS derived column transformation lasted 03.329 seconds.
The main cause was that each component has its own execution phases (Validation, Pre Execute, Execute, Post Execute) which requires more time.
Dependency
Note that, we can put all derived columns within one transformation component if they are independent, because if one derived column expression used another one, we must add a component for each one.
Error Handling
The last thing to mention is that using one Transformation component with multiple expression will make the error handling process more difficult when we need to redirect erroneous rows since the error is caused by one of the expressions and it is harder to know which expression caused the error. While when using multiple transformation components, error handling is easier.
Conclusion
In this article, we have described the Derived Column Transformation in SSIS, and we ran an experiment to illustrate the difference between adding multiple expression within one derived column transformation and adding one derived column transformation component for each expression. The results showed that from a performance perspective using the first approach is more efficient while the second provides easier error handling.
Table of contents
- An overview of SQL Server monitoring tools - December 12, 2023
- Different methods for monitoring MongoDB databases - June 14, 2023
- Learn SQL: Insert multiple rows commands - March 6, 2023