Every production ETL (Extract, Transform, Load) solution is often intrinsically linked to a scheduling mechanism that is used to execute that ETL solution. In a SQL Server-based environment, SQL Server Agent is one of the scheduling mechanism that can be utilized to schedule an execution of ETL solutions such as SQL Server Integration Service (SSIS) packages. In the organization that I currently work for, we’ve had several instances (for various reasons) whereby as the data team we’ve been required to provide a platform for business users to execute an ETL at their own convenience (i.e. on-demand). In this article, we will demonstrate on how we went about delivering self-service ETL execution requirement.
The Notion of Self-Service ETL Execution
At the outset, leaving it to business users to run ETL solutions should not be the preferred way of doings. Instead, business users should only be given control of manipulating and reporting on the data. This is where the notion of self-service reporting becomes useful as power users from the business can connect to a data model and slice and dice according to their requirement. Nevertheless, there could be viable circumstances that force data teams to extend business users control of reporting to executing ETL solutions.
Whatever your reasons for allowing business to execute ETL solutions, you should try to keep it simple for business users to do so. In our case, we devised a way for self-service ETL execution using SQL Server Reporting Services (SSRS) as most business users are already familiar with the tool.
In the following sections, we will take a look at three steps required to setup self-service ETL execution using SSRS as a business interface.
Step 1: Setup a SQL Agent Job
The first step involves setting up a SQL Server agent job that will execute an ETL solution. You can skip this step if you already have a SQL Server Agent job that executes some form of an ETL solution. For the purposes of this discussion, I set up a sample SQL Server agent job, referred to as My Dummy Job. The properties of this sample job are shown in Figure 1.
Figure 1: Properties of My Dummy Job
The complete T-SQL script for My Dummy Job can be found in the Downloads section at the bottom of this article.
Step 2: Configuring SSRS Dataset
Now that we have configured out SQL Server agent job we switch our focus to SSRS development and deployment. In this step, we set up a sample stored procedure – RunMyJob – that will be embedded in SSRS datasets. The definition of this sample stored procedure is shown in Script 1.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
CREATE PROCEDURE [dbo].[RunMyDummyJob] (@RunOption VARCHAR(100)) WITH EXECUTE AS CALLER AS BEGIN SET NOCOUNT ON; --Part A DECLARE @ConditionCheck INT = ( SELECT TOP 1 CASE WHEN stop_execution_date IS NULL AND start_execution_date IS NOT NULL THEN 1 ELSE 0 END FROM msdb.dbo.sysjobactivity WHERE job_id in ( SELECT job_id FROM msdb.dbo.sysjobs WHERE name like 'My Dummy Job' ) ) --Part B IF @ConditionCheck = 0 BEGIN IF (@RunOption='Load') BEGIN EXEC msdb.dbo.sp_start_job 'My Dummy Job' END END --Part C EXEC msdb.dbo.sp_help_jobactivity @job_name = 'My Dummy Job' END GO |
Script 1: Definition of RunMyDummyJob Stored Procedure
The stored procedure accepts a string parameter that has two possible values:
- Refresh – retrieves My Dummy Job latest execution history
- Load – executes My Dummy Job
As it can be seen, the stored procedure is divided into three parts. Part A declares and uses local variable @ConditionCheck variable to determine whether or not My Dummy Job is currently being executed. This is done to prevent executing a job that is already executing. Part B is used to execute My Dummy Job provided the @ConditionCheck variable value is 0 and the @RunOption variable value is Load. The last part of the stored procedure is executed regardless of the value of the @ConditionCheck variable. This last part retrieves latest activities relating to My Dummy Job.
Step 3: Report Development and Deployment
Now that we have set up the Job and the stored procedure that will execute that job, we move on to developing the actual SSRS report that will be used as the interface for business users to execute an ETL on their own. As I go through some of the items in the design view of the report, you can always access the complete report .rdl file in the Downloads section, located at the bottom of this article.
Visual Studio was used an IDE to develop this report and Figure 2 shows an SSDT 2013 design view of the report which consists of a single tablix control with 7 columns.
The tablix is populated by dataset – RunMyDummyJob. Figure 3 shows the properties of RunMyDummyJob dataset and you will notice that the dataset is in turn based off the RunMyDummyJob stored procedure which was created in the previous step (Step 2).
Figure 3: Dataset Properties
The dataset has a single parameter which is mapped to a report parameter – @Operation, as shown in Figure 4.
Figure 4: Report Parameter Mapping
As shown in Figure 5, the @Operation parameter has two hard-coded available values, namely, Refresh as well as Load. Refresh is set as the default value as a means of precaution so as not to execute the report every time it is opened/previewed.
By default, a preview of the report displays job activity information as shown in Figure 6.
For a business users to execute the ETL (MyDummyJob), they will have to change the Operation parameter value from Refresh to Load and then click View Report button. Once the View Report button has been clicked, the report will display latest job activity details with the run status column indicating that the job is running as shown in Figure 7.
In order to get the progress of the job execution, you will have to change the Operation parameter value from Load back to Refresh and click View Report button again. When the job has completed running, the report will display information similar to what is shown in Figure 8.
Another way of confirming that the job history displayed in the SSRS report is accurate is by viewing SQL Server Management Studio’s Job Activity Monitor and the Last run column should match to job start date field in the SSRS report.
Figure 9
Conclusion
Unlike self-service reporting, self-service ETL execution should not be a preferred way of running your ETL solutions as there are several steps and checks involved in getting it set up. However, in circumstances where a requirement for self-service ETL execution is justified, you can use the steps described in this article as a guide to assist you in setting up such a process.