Sometimes we require tracking data change activity (Insert, update and deletes) in SQL Server tables. SQL Server 2008 introduced Change Data Capture (CDC) to track these changes in the user-defined tables. SQL Server tracks the defined table with a mirrored table with same column structure; however; it adds additional metadata fields to track these changes. We can use table-valued function to access this changed data.
In the series of articles on SQL Server FILESTREAM (see TOC at bottom), we are continuously exploring the SQL FILESTREAM feature along with its combination with other SQL Server features. Using SQL FILESTREAM, we can store the objects into the file system instead of SQL Server relational tables. We can also perform changes to the SQL FILESTREAM tables using T-SQL. In this article, we will explore the change data capture on the SQL FILESTREAM tables.
Prerequisites
- A SQL Server instance with FILESTREAM feature enabled
- A SQL FILESTREAM database
- A SQL FILESTREAM table and data
In this article, we will use CDC along with SQL Server FILESTREAM.
There are two steps to configure CDC.
Enabling CDC at the database level
We need to enable the CDC at the database level first.
Execute the below query in the FILESTREAM database. It enables the CDC in the FILESTREAM database.
1 2 3 4 |
USE FileStreamDemoDB_test GO EXEC sys.sp_cdc_enable_db GO |
We can use the column ‘is_CDC_enabled’ in the sys.databases to check about the status of CDC on a particular database. In the following query, we checked the CDC on the FILESTREAM database.
1 2 3 4 5 6 |
USE master GO SELECT [name], database_id, is_cdc_enabled FROM sys.databases where name='FileStreamDemoDB_test' GO |
In the output, Value ‘1’ shows that we have successfully enabled the CDC on FILESTREAM database.
.
Once we have enabled CDC, it creates a new schema ‘CDC’ in the FILESTREAM database as shown below.
It creates a few system tables as well in the CDC schema.
Enabling CDC at the table level
We need to enable the CDC on the FILESTREAM table for which we want to track the changes. We can check it first using ‘sys.tables.’ whether any existing table is CDC enabled or not.
1 2 3 4 5 |
USE FileStreamDemoDB_test GO SELECT [name], is_tracked_by_cdc FROM sys.tables where name not like 'sys%' and name not like 'MS%' |
In the following screenshot, we can see that currently CDC is not enabled for our FILESTREAM table ‘ Tbl_Support_Documents.’
We can enable CDC on the FILESTREAM table using the below query. The SQL Server Agent service should be running before we execute this command. We also need to execute this query with the db_owner permission in the FILESTREAM table. We might have the requirement to track a few columns only from the FILESTREAM table. We can also specify a specific column name in the command with ‘captured_column_list’. In the below example, we want to track each column of the FILESTREAM table.
1 2 3 4 5 6 7 |
USE FileStreamDemoDB_test GO EXEC sys.sp_cdc_enable_table @source_schema = N'dbo', @source_name = N'Tbl_Support_Documents, @role_name = NULL GO |
We observed here that the CDC enable command created the ‘CDC.FileStreamDemoDB_test_cleanup’ job and started it. It should create the below two jobs.
- cdc.FileStreamDemoDB_test_capture: This job is responsible for tracking the changes and it runs the stored procedure ‘sys.sp_MScdc_capture_job’
- ‘CDC.FileStreamDemoDB_test_cleanup’: this job cleans up the CDC tables by running the stored procedure sys.sp_MScdc_cleanup_job. By default, it clears any data at least 3 days old from the CDC tracked tables
In our case, we do not see the capture job created. We did not receive any error message either. Let’s now try to create this capture job manually using the system procedure sys.sp_cdc_add_job.
Execute the below query in the FILESTREAM database
1 2 3 4 |
USE FileStreamDemoDB_test GO EXEC sys.sp_cdc_add_job @job_type = N'capture'; GO |
We get an error message this time. It shows that capture job cannot be used by CDC to extract the changes from the log because transaction replication is enabled on the FILESTREAM database. In the last article ‘ SQL Server FILESTREAM and Replication’, we configured the SQL Server replication on the FILESTREAM database. We cannot use CDC and Replication together because Replication also tracks the changes from the transaction log.
Expand the Replication folder in the SSMS and we can the replication is still configured for the FILESTREAM database. We need to drop it to use the CDC feature on the SQL Server FILESTREAM database.
Right click on the publication and Delete the publication
We need to confirm deleting the publication in the SQL FILESTREAM database. It will also delete the subscriptions configured for the particular publication as well. Click on ‘Yes’.
In the following screenshot, we can notice that it automatically creates the CDC capture job now and it starts the job as well. If the replication is configured on the CDC database, it does not create the CDC capture job.
We can verify that both the jobs are created in the SQL Server instance. The Capture job is scheduled to run continuously to track all the changes in the CDC enabled tables.
Once we have configured the CDC on a particular table, let us query that CDC table and we can see it showing 0 rows as of now. It does not show any record because we did not perform any changes to the SQL FILESTREAM table after we enabled this feature.
We can get the information about the CDC source table, capture instance, schema from the stored procedure ‘sys.sp_cdc_help_change_data_capture’. Execute the below query
1 2 |
EXEC sys.sp_cdc_help_change_data_capture GO |
In the output, we can get the list of the entire CDC enabled table, schema and captured instance.
If we are not sure of which are the columns being tracked using CDC, we can check it using the sys.sp_cdc_get_captured_columns stored procedure. We need to the pass the @capture_instance parameter value from the ‘sys.sp_cdc_help_change_data_capture’.
1 2 3 |
EXECUTE sys.sp_cdc_get_captured_columns @capture_instance = 'dbo_Tbl_Support_Documents'; GO |
In the below screenshot, we get the list of the all FILESTREAM table columns because we did not specify the column level filter initially.
We will now perform DML operations on the SQL Server FILESTREAM table and view the changes using CDC.
Tracking Insert activity on SQL FILESTREAM tables
Let us insert one object into the FILESTREAM table using the following query.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
DECLARE @File varbinary(MAX); SELECT @File = CAST( bulkcolumn as varbinary(max) ) FROM OPENROWSET(BULK 'C:\sqlshack\Draft articles\Products.txt', SINGLE_BLOB) as MyData; INSERT INTO [Tbl_Support_Documents] VALUES ( NEWID(), 'Products', '.txt', @File ) |
Once the insert is successful, query the CDC system table and we get one row in it. In this record, we can see the document_name and document_type values are similar to the insert statement values.
In this output, we have _$operation value as 2. This column shows the type of activity performed on the SQL FILESTREAM enabled CDC table.
We can see the following values in the column
Tracking Update activity on SQL FILESTREAM tabless
Let’s try to update the record in the FILESTREAM table. In this, we will replace the existing file in the FILESTREAM table using below query.
1 2 3 4 5 6 7 |
UPDATE [FileStreamDemoDB_test].[dbo].[Tbl_Support_Documents] SET [DocumentBin] = (SELECT * FROM OPENROWSET( BULK 'C:\sqlshack\Draft articles\filestream.txt', SINGLE_BLOB) AS Document) WHERE [Document_ID] = '5A4F3A0E-DE39-4B89-92F8-0DA4523D0D70' GO |
In the CDC table, you get two rows this time. If you look at the _$operation column values, we have value ‘3’ (old update values) and ‘4’ (update after values). In the above update query, we just updated the SQL FILESTREAM object; therefore, we do not see any changes in the document_name column. We always get two rows for the update statement in the CDC tracking table.
Let’s now update the document name as well in the SQL FILESTREAM table using below query.
1 2 3 4 |
UPDATE [FileStreamDemoDB_test].[dbo].[Tbl_Support_Documents] SET [Document_Name] = 'filestream' WHERE [Document_ID] = '5A4F3A0E-DE39-4B89-92F8-0DA4523D0D70' GO |
In the CDC system table, we can see the old and new value for the document_name column as well.
In the SQL FILESTREAM enabled database, it creates the table-valued functions as well. In the below screenshot, you can see the table-valued function.
- Cdc.fn_cdc_get_all_changes_dbo_Tbl_Support_Documents
- Cdc.fn_cdc_get_net_changes_dbo_Tbl_Support_Documents
We can use the function Cdc.fn_cdc_get_all_changes_dbo_Tbl_Support_Documents to get events in the CDC tables over a particular period.
In the below query, we used the function sys.fm_cdc_map_time_lsn to get the LSN information for a time period. We can have following values in this function.
- largest less than
- largest less than or equal
- smallest greater than
- smallest greater than or equal
1 2 3 4 5 6 7 8 9 10 11 12 |
DECLARE @LSNFrom BINARY(10) = sys.fn_cdc_map_time_to_lsn('smallest greater than or equal',GETDATE() - 1) , @LSNTo BINARY(10) = sys.fn_cdc_map_time_to_lsn('largest less than or equal',GETDATE()) SELECT CASE [__$operation] WHEN 1 THEN 'Delete' WHEN 2 THEN 'Insert' WHEN 3 THEN 'Old Values before update' WHEN 4 THEN 'New Values after update' END AS Operation , Document_ID , Document_Name , Document_Type, DATALENGTH(DocumentBin) AS ImageSize FROM CDC.fn_cdc_get_all_changes_dbo_Tbl_Support_Documents(@LSNFrom, @LSNTo,'all update old') ; GO |
Execute the query above, and you can see all the operation performed in the CDC enabled table. We can see the old values and new values after the update as well occurred during a specific period.
If we do not want to get the old values before the update, execute the below query with ‘all’ parameter.
1 2 3 4 5 6 7 8 9 10 11 12 |
DECLARE @LSNFrom BINARY(10) = sys.fn_cdc_map_time_to_lsn('smallest greater than or equal',GETDATE() - 1) , @LSNTo BINARY(10) = sys.fn_cdc_map_time_to_lsn('largest less than or equal',GETDATE()) SELECT CASE [__$operation] WHEN 1 THEN 'Delete' WHEN 2 THEN 'Insert' WHEN 3 THEN 'Old Values before update' WHEN 4 THEN 'New Values after update' END AS Operation , Document_ID , Document_Name , Document_Type, DATALENGTH(DocumentBin) AS ImageSize FROM CDC.fn_cdc_get_all_changes_dbo_Tbl_Support_Documents(@LSNFrom, @LSNTo,'all') ; GO |
This time you get only the updated values after the successful update statement execution on the FILESTREAM table.
Tracking Delete activity on SQL FILESTREAM tables
Let us delete the row from the FILESTREAM table now.
1 2 3 |
Delete from [FileStreamDemoDB_test].[dbo].[Tbl_Support_Documents] WHERE [Document_ID] = '5A4F3A0E-DE39-4B89-92F8-0DA4523D0D70' GO |
In the CDC output, you can see we can track the deleted object from the CDC tables as well. Document_ID is similar in the CDC table as of the delete statement.
Disabling CDC on SQL FILESTREAM tables
We can disable the CDC for the SQL FILESTREAM table using the system procedure ‘sys.sp_cdc_disable_table’. You need to pass the @capture_instance value using the stored procedure ‘sys.sp_cdc_help_change_data_capture’
1 2 3 4 5 |
EXECUTE sys.sp_cdc_disable_table @source_schema = N'dbo', @source_name = N'Tbl_Support_Documents', @capture_instance = N'dbo_Tbl_Support_Documents'; GO |
Once we execute the command it drops the CDC tables, functions, catalog views associated with the CDC table.
We can disable CDC on the database level as well using the below command. It removes all the data, jobs created during the CDC enable process. It might take a few seconds to complete depending upon the CDC data and objects.
1 2 |
EXEC sys.sp_cdc_disable_db GO |
Table of contents
- Understanding PostgreSQL SUBSTRING function - September 21, 2024
- How to install PostgreSQL on Ubuntu - July 13, 2023
- How to use the CROSSTAB function in PostgreSQL - February 17, 2023