Introduction
SSIS Script component is one data transformation tasks in SQL Server Integration Services (SSIS). SSIS is an integration tool in the Microsoft BI family to extract data from heterogeneous data sources and transform it to your need. Apart from the standard data sources such as databases, text files, excel files, and web services, there can be instances where you need to retrieve non-traditional data sources. For example, let us say you want to extract the details of text files such as file sizes, created date, etc. In these types of scenarios, traditional data sources cannot be used.
This article explains how the SSIS Script component can be used as a data source in such instances. In simple terms, this component will use the functions and commands of the C# or VB.Net languages.
Before we start the discussion, it is important to note that there are two script related components in SSIS. One is the Script Task in the control flow, whereas the other is the script component in the Data flow task. We will be discussing the script component data flow task in this article.
Let us look at how the SSIS Script component can be used with a real-world example. Let us say we have a set of different files in multiple folders. Now we need to capture data such as file name, size, created date, and other details, as shown in the below screenshot.
Let us start the SSIS solution and drag and drop a data flow task to the control flow, and your screen should look like below.
Next, double click the data flow task and drag and drop a script component flow to the data flow task. As soon as you drag and drop the script component, the following screen will appear for you to choose.
This will decide what the type of the script component is. In this example, we will be using a Source Script Component Type. It is important to note that this configuration cannot be changed later. If you need to change the script component type, you need to drag and drop another component and reconfigure again. Therefore, before configuring the script component type, make sure you are selecting the correct configuration as per your requirement.
Let us see the configuration of the SSIS Script Component as a data source. First, we will configure the Script tab.
We will be using Microsoft Visual C# 2017 as the scripting language, whereas you have the option of writing the script in Visual VB Script as well. However, once you start to write the script, you are not allowed to change the script language.
You can pass the read-only variable into the script so that there is higher maintainability. For example, we can configure the file path into an SSIS variable, and that can be modified from the package. If not, this has to be hardcoded into the script component that will become difficult to manage. This SSIS variable is configured in ReadOnlyVariables, as shown in the above screenshot. Further, you can use the ReadWriteVariables to write a value that was generated inside the script component. We will not be using the ReadWriteVariables option as it is not relevant to the example that we are discussing.
Next, we will configure the Inputs and Outputs tabs. Since the SSIS Script Component is configured as a source, we will have only output to be configured.
In this configuration, output columns are defined along with the data type and data length. The following table shows the data type and length for each output column.
Output Column Name | Data Type | Length |
FileName | String | 1,500 |
FileType | String | 50 |
DateModifed | Database timestamp | |
IsReadOnly | Boolean | |
Size | Eight bytes signed integer | |
DateCreated | Database timestamp |
Next is to create the script of the SSIS script component, which is the important configuration. By clicking the Edit Script button, you will be taken to the script editor, which is the visual studio editor. The following code is included.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
public override void CreateNewOutputRows() { // Lock the variable for write VariableDispenser variableDispenser = (VariableDispenser)this.VariableDispenser; variableDispenser.LockForRead("User::FilePath"); IDTSVariables100 vars; variableDispenser.GetVariables(out vars); string filePath = vars["User::FilePath"].Value.ToString(); // Unlock the variable vars.Unlock(); foreach (string file in Directory.EnumerateFiles(filePath,"*.*",SearchOption.AllDirectories)) { FileInfo oFileInfo = new FileInfo(file); OutputBuffer.AddRow(); OutputBuffer.FileName = oFileInfo.FullName; OutputBuffer.DateCreated = oFileInfo.CreationTime; OutputBuffer.DateModified = oFileInfo.LastWriteTime; OutputBuffer.FileType = oFileInfo.Extension; OutputBuffer.Size = oFileInfo.Length; OutputBuffer.IsReadOnly = oFileInfo.IsReadOnly; } } |
In the above script, FilePath is the SSIS variable that is used to store the file path. To implement the recursive search of files, SearchOption.AllDirectories is selected. OutputBuffer is the buffer that will be used to add the records for the source. Since this C#, you have the luxury of using many C# language features.
Now let us create a table as shown in the below script, and data will be written to the database after the package is executed.
1 2 3 4 5 6 7 8 |
CREATE TABLE [FileList] ( [FileName] varchar(1500), [FileType] varchar(50), [DateModified] datetime, [IsReadOnly] bit, [Size] bigint, [DateCreated] datetime ) |
Following is the file SSIS data flow after SQL Server destination is included.
When the package is executed, data will be stored into the FileList table, as shown in the below screenshot.
Additional Configurations in SSIS Script Components
After the script component is configured, typical data flow tasks can be used. For example, let us say we want to split the data set, depending on the size of the file.
Let us use SSIS Conditional Split to split the above data stream.
The Conditional Split data flow task can be configured, as shown in the following screenshot.
As shown in the above screen, files that have a size of more than 2KB will be sent to one output, and the rest of the data will be into another output name.
Following is the SSIS data flow task after the inclusion of Conditional Split.
In this configuration, both Morethan2KB and LessThan2KB output have the same attributes.
Similarly, all the data flow tasks can be used to the Output of the script component task.
Multiple Output paths in the SSIS Script Component
The above implementation can be done differently inside the script component by defining multiple outputs.
Following is the way to configure multiple outputs in the SSIS script component.
In the above configuration, two outputs, FileSizeLess2KB and FileSizeMore2KB, were added with the same columns and the same data types.
Next, we will be adding the script as shown below:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
foreach (string file in Directory.EnumerateFiles(filePath,"*.*",SearchOption.AllDirectories)) { FileInfo oFileInfo = new FileInfo(file); if (oFileInfo.Length > 2048) { FileSizeMore2KBBuffer.AddRow(); FileSizeMore2KBBuffer.FileName = oFileInfo.FullName; FileSizeMore2KBBuffer.DateCreated = oFileInfo.CreationTime; FileSizeMore2KBBuffer.DateModified = oFileInfo.LastWriteTime; FileSizeMore2KBBuffer.FileType = oFileInfo.Extension; FileSizeMore2KBBuffer.Size = oFileInfo.Length; FileSizeMore2KBBuffer.IsReadOnly = oFileInfo.IsReadOnly; } else { FileSizeLess2KBBuffer.AddRow(); FileSizeLess2KBBuffer.FileName = oFileInfo.FullName; FileSizeLess2KBBuffer.DateCreated = oFileInfo.CreationTime; FileSizeLess2KBBuffer.DateModified = oFileInfo.LastWriteTime; FileSizeLess2KBBuffer.FileType = oFileInfo.Extension; FileSizeLess2KBBuffer.Size = oFileInfo.Length; FileSizeLess2KBBuffer.IsReadOnly = oFileInfo.IsReadOnly; } } |
In the above script, two buffers were added, depending on the size of the file. After both outputs were added, there will be two outputs from the SSIS Script component, as shown in the bellow SSIS package.
Similar to the SSIS conditional split, you will see the same numbers for the two outputs. Similar to the above example, there are similar instances where traditional data sources cannot be used.
In the script component, you have the ability to create two different outputs with different attributes, which was not possible with the SSIS Conditional Split data flow task. For example, for the image files, you might need width and height that are not required for the other files. In this example, we can add height and width attributes to the required output buffers.
However, during the development of the SSIS script component, it is important to make a copy of the script as a backup option.
Conclusion
In this article, we discussed the use of the SSIS Script component in order to generate non-traditional data sources apart from the traditional data sources. Since C# and vb.net has a rich set of commands, developers can use it for different purposes. This script component can be used to generate multiple outputs. In these multiple outputs, you can add different parameters to the different buffer outputs so that users have higher flexibility that the SSIS Conditional split.
See more
For an SSIS and database documentation tool, consider ApexSQL Doc, which can document SSIS packages from file system, SSIS package store, SQL Server, SSISDB catalog and SSIS project files in different output formats e.g. HTML, PDF, Word etc.
- Testing Type 2 Slowly Changing Dimensions in a Data Warehouse - May 30, 2022
- Incremental Data Extraction for ETL using Database Snapshots - January 10, 2022
- Use Replication to improve the ETL process in SQL Server - November 4, 2021