This article on Change Data Capture will provide an overview, notes on installation, architecture, enabling and disabling, auditing DML statements and limitations
Before reading this article, I recommend that you to go through the previous articles of this series (see the TOC at the bottom), to understand the concept of the SQL Server Audit, the reasons behind auditing the SQL Server instances and databases, and the different methods that can be used to audit the SQL Server databases. In this particular article, we will discuss how to perform SQL Server database tables audit using Change Data Capture.
Overview
Change Data Capture, also known as CDC, introduced the first time in SQL Server 2008 version, as a helpful feature to track and capture the changes that are performed on the SQL Server database tables, with no additional programming efforts. Before SQL Server 2016, Change Data Capture could be enabled on a SQL Server database only under the SQL Server Enterprise edition, which is not required starting from SQL Server 2016.
Change Data Capture tracks the INSERT, UPDATE and DELETE operations on the database table, and records detailed information about these changes in a mirrored table, with the same columns structure of the source tables, and additional columns to record the description of these changes. SQL Server writes one record for each INSERT statement showing the inserted values, on record for each DELETE statement showing the deleted data and two records for each UPDATE statement, the first one showing the data before the change and the second one showing the data after performing the change.
- Note: See Monitoring changes in SQL Server using change data capture for more information about CDC.
The additional columns include
- __$start_lsn and __$end_lsn that show the commit log sequence number (LSN) assigned by the SQL Server Engine to the recorded change
- __$seqval that shows the order of that change related to other changes in the same transaction, __$operation that shows the operation type of the change, where 1 = delete, 2 = insert, 3 = update (before change), and 4 = update (after change)
- __$update_mask that is a bit mask defined for each captured column, identifying the updating columns
This detailed information makes it easier to monitor the database changes for security or auditing purposes, or incrementally load these changes from the OLTP source to the target OLAP data warehouse, using T-SQL or ETL methods.
Installation and architecture
Change Data Capture requires that a SQL Server Agent is running on a SQL Server instance. When the feature is enabled on a SQL Server database table, two SQL Server Agent jobs will be created for that database; the first job is responsible for populating database change tables with the changes information and the second job is responsible for the cleaning up the change tables by deleting the records older than the configurable retention period of 3 days.
Change Data Capture depends on the SQL Server Transaction Log as the source of the data changes. When a change is performed, this change will be written to the Transaction Log file.
- Check SQL Server Transaction Log Architecture for more information.
If CDC feature is enabled on that table, the transaction log replication Log reader agent, which acts as the capture process for CDC feature, will read the change logs from the Transaction Log file, adds the metadata information about these changes and write it to the associated CDC change tables, as shown below:
Change Data Capture as an Audit solution
Change Data Capture can be used as an asynchronous SQL Server Audit solution, to track and audit the table’s DML changes, such as INSERT, UPDATE or DELETE operations, with no option to track the SELECT statements.
What makes Change Data Capture a good SQL Server Audit solution is that, it can be configured easily using few T-SQL commands, it provides historical information about the values before the modification process and provides detailed information about the data modification process.
Let us see how we can use the Change Data Capture to audit the SQL Server DML changes.
Enabling CDC
In order to enable the Change Data capture on a specific table, for auditing purposes, CDC should be enabled first at the database level first, by a member of the SYSADMIN fixed server role, using the sys.sp_cdc_enable_db system stored procedure, as shown below:
1 2 3 4 |
USE [CDCAudit] GO EXEC sys.sp_cdc_enable_db GO |
To make sure that CDC is enabled on that database, we will query the sys.databases DMV for the list of databases, with CDC enabled, as below:
After enabling CDC at the database level, we can enable it to track and audit the DML changes on the database tables, by a member of db_owner fixed database role, using the sys.sp_cdc_enable_table system stored procedure, with the ability to enable it to track changes on list of columns specified by the @captured_column_list parameter, and create change tables on a separate filegroup specified by the @filegroup_name parameter, as shown below:
1 2 3 4 5 6 7 8 9 |
USE [CDCAudit] GO EXEC sys.sp_cdc_enable_table @source_schema = N'dbo', @source_name = N'Employee_Main', @role_name = NULL, @filegroup_name = NULL, @supports_net_changes = 0 GO |
To check that CDC is enabled on that table, we will query the sys.tables DMV for all tables under the current database with CDC enabled as below:
Once CDC is enabled on the table, a number of system tables will be created under the CDC schema of the database to store the CDC related information. These tables include the following
- CDC.captured_columns table that contains the list of captured column
- CDC.change_tables table that contains the list of tables that are enabled for capture
- CDC.ddl_history table that records the history of all the DDL changes since capture data enabled
- CDC.index_columns table that contains all the indexes that are associated with change table
- CDC.lsn_time_mapping table that is used to map the LSN number with the time and finally one change table for each CDC enabled table that is used to capture the DML changes on the source table, as shown below:
… and the SQL Agent jobs associated to CDC enabled table, the capture and cleanup jobs, will be created like below:
Disabling CDC
The Change Data Capture can be easily disabled on a specific table using the sys.sp_cdc_disable_table system stored procedure, as shown below:
… or disabled completely at the database level, without the need to disable it on CDC enabled tables one by one, using the sys.sp_cdc_disable_db system stored procedure, as shown below:
Auditing DML Changes
After enabling CDC on the database table, let us perform the below three database DML changes, INSERT, UPDATE then DELETE, on that table and check how these changes will be audited using CDC feature, as below:
As mentioned previously, the data modifications will be written in the change table associated to CDC enabled table, which is the [cdc].[dbo_Employee_Main_CT] table in our case here. To view all records inserted recently to the source table, we can query the change table for all operations with type 2, and complete information about the INSERT operation, including the inserted values, will be displayed as shown below:
Querying the change table for all operations with type 1, will return all records deleted recently from the source table, with the values of the deleted records, will be displayed as shown below:
And finally, you can query the change table for the operations with types 3 and 4 to track the UPDATE statement, that will display the values before the update, under the operation type 3 and the value after the change, under the operation type 4, as shown below:
Querying of the change tables is not recommended by Microsoft. Instead, you can query CDC.fn_cdc_get_all_changes system function associated to CDC enabled table as below:
CDC.fn_cdc_get_all_changes function can be queried by providing the @from_lsn, @to_lsn, @row_filter_option parameters, that will retrieve all the DML changes information, as shown below:
Limitations
Change Data Capture can be easily used to audit only the database DML changes, albeit it with no option to monitor SELECT statement, with the negligible configuration effort. On the other hand, to consider CDC as a SQL Server Audit solution, it requires significant maintenance and administration effort. This includes automating an archiving mechanism, as the tracking data will be kept in the change table for a configurable number of days and will be stored in the same or different data file, that should be also monitored and maintained.
In addition, the change tables will be stored under each database, and a function will be created for each tracked table. This makes it cumbersome and requires significant programming effort to create a consolidated auditing report that reads the DML changes information from all tables under the same database, from all databases under the same instance, or cross multiple instance.
Another limitation for CDC feature as a SQL Server Audit solution is difficult process that is required to handle the DDL changes on CDC enabled table, as having the Change Data Capture enabled on the source table will not prevent performing DDL changes on that table.
Also, if the SQL Server Agent service is not running, CDC capture job will not work, and the database log file will grow rapidly, even if the database recovery model is Simple, as the log truncation will not advance, even if a CHECKPOINT is performed, till all the changes that are waiting for capture will be gathered by CDC capture process.
In the next article, we will discuss how to audit the SQL Server instance using the Change Tracking feature. Stay tuned!
Table of contents
- Azure Data Factory Interview Questions and Answers - February 11, 2021
- How to monitor Azure Data Factory - January 15, 2021
- Using Source Control in Azure Data Factory - January 12, 2021