This is a conceptual article consisting of two parts with enough supported material for any data professional or enthusiast with databases or data warehouse development background willing to go a step ahead by using an industry-recognized SQL unit testing framework called tSQLt.
This article also highlights the importance of a not so well known innovative development strategy called Proof of Concept used by specialists (including programmers) to test a new idea or concept by developing the model rather than the actual solution.
Additionally, the readers of this article are going to get familiar with the method of mapping existing database development practices to a basic data warehouse business intelligence solution using tSQLt.
The Concept (Proof of Concept)
Let us begin our journey with the concept or the proof of concept.
What is proof of concept
A proof of concept is like a pilot project, which is all about developing (testing) an idea or concept to check its feasibility or potential to use it as a base to begin actual work if all goes well.
Example 1
One of the most interesting examples of proof of concept is a concept car that is specifically designed to test new technology or design, which may or may not become part of commercial manufacturing to be available for the general public.
Example 2
In the software world, proof of concept is whether the new idea of developing your software is workable or not.
Example 3
In the database or data warehouse world, we can say the proof of concept is to check if the newly proposed way of developing a database or data warehouse is feasible or not.
Example 4
The proof of concept can also be whether the new database development strategy works or test the new way of using existing database development and testing tools (such as tSQLt) to see if this is more productive for the development and testing team and business or not.
Proof of Concept strategies
There are a number of ways called strategies that can be used as a basis to develop or test the proof of concept. Let us have a look at a few of them.
Reverse Engineering Strategy
This strategy is based on reverse engineering of an existing process to improve it further.
Mapping Strategy
This is the strategy used in most common scenarios where the developers map the existing process to a newly proposed process in proof of concept.
Mapping Strategy Example 1
One of the examples of mapping strategies, in general, is, think of some database development scenarios when we map requirements to the stored procedures such that a business requirement received by a development team is generally met by writing a stored procedure to be tested by SQL unit testing framework tSQLt which then satisfies the business requirement when it runs successfully.
Mapping Strategy Example 2
Another good example of mapping strategy is in the context of Data Warehouse business intelligence solutions when business requirements are mapped to data models, which are then exposed to internal and external business users to be used for analysis and reporting.
Point of Interest
An interesting point to note is that we are using a highly productive and commercially in use SQL unit testing framework tSQLt to work on our conceptual model, thereby leaving enough room for implementation, as mentioned in the introductory passage of this article.
Test-Driven Data Warehouse Development (TDWD)
Let us now focus on the test-driven data warehouse development with tSQLt proof of concept by first designing a toolkit also serving as pre-requisites.
Test-driven data warehouse development Tool Kit
The following are required to work on test-driven data warehouse development using tSQLt proof of concept:
-
SQL Database and Data Warehouse concepts and understanding
Please refer to SQL Unit Testing Data Warehouse Extracts with tSQLt article
- Strong T-SQL skills
-
SQL unit testing using tSQLt skills
Please refer to tSQLt – A Forgotten Treasure in Database Unit Testing article
-
Test-driven database development concepts and implementation
Please refer to Fundamentals of Test-Driven Database Development (TDDD) with tSQLt unit testing article
Mission Statement
Evaluating Test-driven data warehouse development with tSQLt proof of concept is based on Test-driven database development methodology.
Scope
It is very important to define the scope of the work, and in our case, the scope of the proof of concept is limited to the following essential processes of a traditional data warehouse:
- Staging extracts
- Transform-Loads
They can also be represented by ETL workflows where E stands for extract(s), and TL stands for Transform-Load(s).
If you would like to know more staging extracts, please refer to SQL Unit Testing Data Warehouse Extracts with tSQLt.
In simple words, a typical data warehouse environment from the data point of view mainly consists of two phases:
- Staging environment
- BI (business intelligence) environment
In the first phase, data is copied from source to staging environment and then from the staging environment to the BI environment.
A polite reminder for the beginners to please at this point not to confuse ETL (Extract Transform Load) with ELT (Extract Load Transform) since they are both data movement strategies, and both can work equally with data warehouse solution.
Mapping test-driven database development
We are going to use test-driven database development as a reference to work on test-driven data warehouse development with tSQLt proof of concept.
Set up a sandbox (environment)
The first thing in order to start working on the proof of concept is to set up your sandbox environment, which is going to serve as your mini-research lab.
Set up sample source database
We need to set up a sample database called SQLDevArticlesV5 which is going to serve as a source by running the following T-SQL script against any dev environment:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 |
-- 1 Create SQLDevArticlesV5 source database CREATE DATABASE SQLDevArticlesV5; GO USE SQLDevArticlesV5; GO -- 2 Create author table CREATE TABLE [dbo].[Author] ( [AuthorId] INT IDENTITY (1, 1) NOT NULL, [Name] VARCHAR (40) NOT NULL, [RegistrationDate] DATETIME2 (7) NULL ); -- 3 Create article tables CREATE TABLE [dbo].[Article] ( [ArticleId] INT IDENTITY (1, 1) NOT NULL, [Title] VARCHAR (300) NOT NULL, [Published_Date] DATETIME2 (7) NOT NULL ); -- 4 Populate author table SET IDENTITY_INSERT [dbo].[Author] ON INSERT INTO [dbo].[Author] ([AuthorId], [Name], [RegistrationDate]) VALUES (1, N'Abid', N'2019-01-01 00:00:00') INSERT INTO [dbo].[Author] ([AuthorId], [Name], [RegistrationDate]) VALUES (2, N'Zia', N'2019-02-01 00:00:00') INSERT INTO [dbo].[Author] ([AuthorId], [Name], [RegistrationDate]) VALUES (3, N'Robin', N'2019-03-02 00:00:00') INSERT INTO [dbo].[Author] ([AuthorId], [Name], [RegistrationDate]) VALUES (4, N'Mehta', N'2019-04-02 00:00:00') INSERT INTO [dbo].[Author] ([AuthorId], [Name], [RegistrationDate]) VALUES (5, N'Florence', N'2020-01-01 00:00:00') SET IDENTITY_INSERT [dbo].[Author] OFF -- 5 Populate article table SET IDENTITY_INSERT [dbo].[Article] ON INSERT INTO [dbo].[Article] ([ArticleId], [Title], [Published_Date]) VALUES (1, N'Database Analysis with T-SQL', N'2018-01-02 00:00:00') INSERT INTO [dbo].[Article] ([ArticleId], [Title], [Published_Date]) VALUES (2, N'Designing Data Models for Reporting', N'2018-01-03 00:00:00') INSERT INTO [dbo].[Article] ([ArticleId], [Title], [Published_Date]) VALUES (3, N'SQL Database Development Guide', N'2018-02-02 00:00:00') INSERT INTO [dbo].[Article] ([ArticleId], [Title], [Published_Date]) VALUES (4, N'Build your first Azure SQL Database', N'2018-03-02 00:00:00') INSERT INTO [dbo].[Article] ([ArticleId], [Title], [Published_Date]) VALUES (5, N'Fundamentals of T-SQL', N'2019-01-02 00:00:00') SET IDENTITY_INSERT [dbo].[Article] OFF; GO |
Please run the following script to check your source sample database:
1 2 |
-- View all authors (table) SELECT a.AuthorId,a.Name,a.RegistrationDate FROM Author a |
Set up a sample data warehouse
Create a sample database warehouse database called SQLArticlesV5DW as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
-- 1 Create SQLDevArticlesV5DW database CREATE DATABASE SQLDevArticlesV5DW; GO USE SQLDevArticlesV5DW; GO -- 2 Create Staging schema CREATE SCHEMA Staging AUTHORIZATION dbo; GO -- 2 Create staging author table CREATE TABLE [Staging].[Author] ( [AuthorId] INT IDENTITY (1, 1) NOT NULL, [Name] VARCHAR (40) NOT NULL, [RegistrationDate] DATETIME2 (7) NULL ); -- 3 Create article tables CREATE TABLE [Staging].[Article] ( [ArticleId] INT IDENTITY (1, 1) NOT NULL, [Title] VARCHAR (300) NOT NULL, [Published_Date] DATETIME2 (7) NOT NULL ); |
Add tSQLt to the source database and data warehouse
The next thing is to download and install the tSQLt (SQL unit testing framework) for both source and data warehouse databases from the tsqlt.org.
Please follow the instructions in the article Conventional SQL Unit Testing with tSQLt in Simple Words to install tSQLt if you have not installed it before.
The following output upon running the tSQLt.class.sql script indicates that tSQLt has been installed successfully onto your desired database(s):
The sandbox environment to begin the proof of concept with tSQLt is ready to be used if all the above requirements are met.
Multiphase Requirements
The requirements from the business point of view can be as follows:
As a business user, I would like to view strictly tested authors report preferably from a system optimized for reporting and analysis so that further reports can be added to it
Your business analyst receives the requirements and modifies them for you as follows:
The clients are in need of a data warehouse which must be thoroughly tested to be eventually used for multiple reports and analysis beginning with authors report.
You have had a quick chat with your team and come up with a plan as follows:
We must try to adopt a test-driven approach to meet these requirements, which means a proof of concept to see test-driven data warehouse development in operation if approved can be used to meet these specific business requirements considering the available time and effort.
At this point, if you are not already familiar with, then please review the test-driven database development concept by going through the article Fundamentals of Test-Driven Database Development (TDDD) with tSQLt unit testing.
Mapping potential object TDDD
Just like test-driven database development where we assume a database object is going to meet the requirements and testing that object with a SQL unit testing framework such as tSQLt can help us even create that object properly since it has to pass to prove that it can meet the requirements.
However, the data warehouse is a bit more complex as compared to a database where we can start with a potential object by creating a SQL unit test to check its functionality, which has to meet the specification, and this effort ultimately helps us to build the requirement specific object without over-engineering.
Yes, we can think of an object called AuthorsReport as in the case of test-driven database development, but then how does it fit in a data warehouse scenario and establishing the proof of concept is the most important question to move forward.
Choosing AuthorsReport Object
There is no problem with choosing the same object in test-driven data warehouse development as we would choose in test-driven database development; however, the logic required to build this object has to go through a couple of more steps.
Create SQL unit test Class
Once tSQLt is installed please create a SQL unit test class called AuthorReportTests in data warehouse database as follows:
1 2 3 4 5 6 7 8 |
USE SQLDevArticlesV5DW; GO -- Creating unit test class AuthorsReportTests CREATE SCHEMA [AuthorsReportTests] Authorization dbo GO EXECUTE sp_addextendedproperty @name='tSQLt.TestClass',@value=1,@level0type='SCHEMA',@level0name='AuthorsReportTests' |
Create the first SQL unit test
Create the first SQL unit test for the data warehouse database SQLDevArticlesV5DW to check if an object exists as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
USE SQLDevArticlesV5dw GO CREATE PROCEDURE AuthorsReportTests.[test to check AuthorsReport exists] AS BEGIN --Assemble --Act --Assert EXEC tSQLt.AssertObjectExists @ObjectName = N'AuthorsReport' END; GO |
Run SQL unit test(s)
Now please run all SQL unit tests by issuing the following tSQLt command:
1 2 3 4 5 |
USE SQLDevArticlesV5DW GO -- Run all unit tests tSQLt.RunAll |
As per test-driven database development principles that test must fail since the object is not yet created as follows:
Create Object (AuthorsReport)
Now we are going to create the desired object in the same data warehouse database SQLDevArticlesV5DW so the basic test to check its presence must pass:
1 2 3 4 5 6 7 8 |
USE SQLDevArticlesV5dw GO -- Creating database object AuthorsReport stub (placeholder) CREATE View AuthorsReport AS SELECT 1 AS Stub ; |
Rerun SQL unit test(s)
Please run the SQL unit tests for the sample data warehouse:
1 2 3 4 5 |
USE SQLDevArticlesV5DW GO -- Run all unit tests tSQLt.RunAll |
The test output should be as follows:
So far, so good, the proof of concept with tSQLt is on the run, but it is a slightly long way before we see it working more robustly.
Conclusion
Congratulations, you have just gone through the halfway, and things seem to be fine, but please be prepared to do some serious work in the next part of this article to get the things on track since this trial must show that it is worth the effort. Stay tuned
Table of contents
- How to create and query the Python PostgreSQL database - August 15, 2024
- SQL Machine Learning in simple words - May 15, 2023
- MySQL Cluster in simple words - February 23, 2023