In this article, we will guide you through tracking data changes on the Azure SQL database and introduce common use case scenarios and learn how to get the most out of system-versioned tables.
Data is always in motion, learning about data movement, changing trends is an important requirement for some application to be effective, and data change tracking provides that ability while logging database activities metadata. Besides data analytics, the history of changes is also a valuable piece of information for auditing and data recovery.
As it is mentioned in the Azure SQL Database auditing article, tracking data value changes were challenging because the data auditing does not provide such information by default. Therefore, this article is meant to bridge that challenge and introduce some of the options we have for data changes tracking in a database.
Temporal tables
Temporal tables as a database feature are introduced with the SQL Server 2016 version and provide the in-time data storage overview. This feature is also known as system-versioned tables that are designed to keep a full history of changes and enable creating result-sets based on a period or any specific moment in time.
System-versioned tables are handy Azure SQL Database feature, and for each DML transaction, the row version is saved in a dedicated history table. Each row version has the own period of data validity defined via the two dedicated columns of datetime2 type. These two columns indicate the lifetime of a record which helps the database engine to know about the time when data is being effective while generating the result.
Enabling temporal tables and usage examples
Before we dive deeper into usage scenarios here are some important implications:
- History of data changes will be stored in a dedicated table which is automatically created upon enabling this feature
- To enable system-versioning on an existing table, the history table has to be schema-aligned in terms of column names, ordering, data types, and the number of columns with the candidate table
- System versioned tables are required to employ two columns of datetime2 type as period columns that will indicate data time effectiveness
- The candidate table get some extra decorative elements, while also losing some options in the SQL Server Management Studio GUI
Let us quickly create a sample table in an Azure SQL database:
1 2 3 4 5 |
CREATE TABLE [dbo].[Products]( [ProductID] INT NOT NULL PRIMARY KEY CLUSTERED, [ProductName] VARCHAR(50) NULL, [Price] DECIMAL(4, 2) NULL, [quantity] SMALLINT NULL,); |
To enable the temporal tables feature, the candidate table is required to structurally support it by having the period columns that will indicate the data validity period:
1 2 3 4 5 6 7 |
ALTER TABLE Products ADD ValidFrom datetime2 (2) GENERATED ALWAYS AS ROW START HIDDEN constraint DF_ValidFrom DEFAULT GETUTCDATE() ,ValidTo datetime2 (2) GENERATED ALWAYS AS ROW END HIDDEN constraint DF_ValidTo DEFAULT '9999.12.31 23:59:59.99' , PERIOD FOR SYSTEM_TIME (ValidFrom, ValidTo); |
Now when the table meets structural readiness, we can step into enabling the feature, and specify the history table name via the below command:
1 2 |
ALTER TABLE Products SET (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.Products_History)); |
Organizations more often rely on data evaluation insights when making decisions, while the real data sources are dynamic and constantly changing. Temporal tables are the feature that is efficient in the following use cases…
Data auditing
Temporal tables are a unique feature that can be adopted in an early development stage, but also in any given time when auditing data is required. This feature is handy when you are required to keep track of what was changed in a database, when the change took effect, and to perform the forensic analysis in-time.
Querying for history data can be achieved by utilizing the specialized sub-clauses and point in a specific time or time range. As it is indicated in the diagram below, the history of data changes in an Azure SQL database can be reconstructed from the logs that are created over time which allows in-time forensic analysis. Audit data changes logs can be reconstructed via the following sub-clauses:
- ALL – Retrieves complete history of data changes including the actual data
- AS OF <date_time> – Extracts result in specific point in time
- FROM <start_date_time> TO <end_date_time> – Create result for a specific period including start and end dates
- BETWEEN <start_date_time> AND <end_date_time> – Provides the data overview in specific time range between start and end dates
- CONTAINED IN (<start_date_time> , <end_date_time>) – Return rows that were opened and closed in a given time range
The below diagram explains a Product table scenario with two rows being historically changed and how the specialized clauses would reflect on this specific scenario and create result sets:
After enabling system-versioning audit data review for an Azure SQL database is one query ahead. System-versioned tables store data values for period columns in UTC format, which is an important detail to consider when querying for the results. Below query seeks for rows being active for products with ID = 10, within a period from 1st of November to 10th of November 2020:
1 2 3 4 5 |
SELECT *, ValdiFrom, Valid To FROM dbo.Products FOR SYSTEM_TIME FROM '2020-11-01' TO '2020-11-10' WHERE ProductID = 10 ORDER BY ValidFrom |
In terms of auditing usage scenario, system-versioned tables provide a subset of options when opting for historical data changes logs, to learn more on each specialized sub-clauses and in-depth specifics visit this query data in the system-versioned temporal table article.
In-time data analysis
In the data audit scenario, the main focus is on the context of the change that occurred to individual database records, while for the in-time data analysis, the user aspiration is to see how data changes over time and learn about changing trends in an Azure SQL database.
Time travel analysis may include multiple system-versioned tables with independent individual changing trends for which user desire to analyze:
- Differences between specific periods with one another (e.g. last year vs three years ago)
- Return results on exact time in the past (e.g. “as of” 15th of July 2012)
- Changing trends and important data indicators
There are a lot of different usages in real-world scenarios using point-in-time analysis. In the below diagram, we have used the data from the history of changes for a particular product to render a quick overview of quantity change over time using Power BI:
Detection of unusual changing trends (Anomaly detection)
Anomaly detection is another usage scenario that users can benefit from the system-versioned tables and easily identify any item that does not conform to the expected pattern and changing trend over time.
It is a great mechanism to inspect what you inspect to happen in an Azure SQL database, depending on the data types and expectations on how it should change, anomaly detection does always mean “spikes” in the graphs, but let us assume that the product quantity change has significantly dropped in unreasonable time accordingly to prior changing trend and recognize it as an outlier, as illustrated in below graph:
Recovery from data corruption
Historical data changes can also be used to reconstruct the data of individual rows to any previous row state. Temporal tables in Azure SQL databases are efficient when the undesired data change time is known, or the change can be easily tracked in the history records, and there’s no need to deal with transactional logs review or backups.
Using system-versioned tables to recover from data corruption has few advantages:
- Data repairing can be done very precisely without affecting the records that need to remain as of the latest record state. This requirement is often critical
- No downtimes. The database is operative and online for all workloads throughout the course of recovery
- The recovery operation is trailed and versioned so it can be analyzed and investigate what happened if necessary
The data recovery can be easily automated and become a part of data bulk loading and act as a corrector whenever the updated data is incorrect. As an example, the below store procedure is built to revert the latest state to a row version from a specific time in the Products table:
1 2 3 4 5 6 7 8 9 10 11 12 |
DROP PROCEDURE IF EXISTS sp_RepairProductRow; GO CREATE PROCEDURE RepairProductRow @ProductID INT, @asOf datetime2 AS UPDATE dbo.Products SET [ProductName] = H.[ProductName], [Price] = H.Price, [Quantity] = H.[Quantity] FROM Products AS P JOIN Products FOR SYSTEM_TIME AS OF @asOf AS H ON P.ProductID = H.ProductID WHERE P.ProductID = @ProductID |
We used the same Products table sample to build this procedure, and illustrate the repair scenario based on a specific time in the past that is highlighted in the following screenshot:
Summary
In this article, we have introduced the Temporal tables feature as a suitable mechanism to track data changes in an Azure SQL database and provided insight into the most important implications to consider, how to configure and use the feature in multiple different aspects which makes it a sweet tool to achieve many results.
- How to track data changes on Azure SQL Database - December 21, 2020
- Getting started with Azure SQL Database auditing using Blob storage - November 16, 2020