This is the 8th article in the series for SQL Server Always On Availability Groups.
Introduction
SQL Server Always On Availability Groups create a group of databases that fails over together in case the primary replica is unavailable. These availability group databases failover manually or automatically depending upon your AG configuration. By default, SQL Server fails over all databases from the primary replica to the secondary replica if the primary replica is down or SQL Services stopped in the primary instance.
You can create multiple SQL Server Always On Availability Group in SQL Server configuration. Usually, we group the application dependent databases in an availability group. These databases must be failed over together for smooth working on the application.
Suppose you have two databases in an AG group. Your primary instance is online, but one of the AG databases becomes unhealthy on the primary replica. Your application reports issue because it could not access a database in the AG group. SQL Server does not perform a failover of the AG group to the secondary replica. It requires manual intervention to fix the issues at the primary replica or perform a manual failover to the secondary replica. Sometimes we want that SQL Server should check the database health as well in an AG group and perform automatic failover if any of the databases is not healthy.
Do we have the option to enable the database-level health detection in SQL Server Always ON? Let’s explore it in this article.
Prerequisites
You should configure SQL Server Always On Availability Groups in synchronized mode. You can refer to this article, Add a new node into existing SQL Server Always On Availability Groups for further insights.
In the earlier articles, we have created an availability group with the following configuration.
- Current primary replica: SQLNode1
- Secondary replica in Synchronous mode: SQLNode2
- Secondary replica in Asynchronous mode: SQLNode3
- Two availability group databases [MyNewDB] and [SQLShackDemo]
An Overview of the database-level health detection for SQL Server Always On Availability Groups
SQL Server 2016 introduced Enhanced database failover to monitor the database health and trigger the failover for that availability group. This feature is known as Database Level Health Detection in an availability group.
We can enable this feature at the availability group; therefore, it monitors all databases corresponding to that availability group. It cannot be enabled for an individual database in an AG group. Your database should be in synchronized data commit with automatic failover mode so that SQL Server can failover it in case of any issues.
SQL Server performs the following checks for the database-level health detection. If the database status is online and SQL Server could not write in the transaction log of an AG database, it initiates AG failover. It also checks for the following error ids as part of the database health check.
Suppose you have the following AG architecture with one secondary replica. Your AG database stores data files in D drive and log file in the L drive.
Now, on your primary replica, log drive L fails. SQL Server is not able to write in the AG database transaction log. It fails a successful database health detection. SQL Server availability group fails over from primary to the secondary replica.
In another scenario, your data drive fails on the primary replica, but your log drive is intact. In this case, no AG failover occurs because SQL Server can write in the transaction log.
Database-level health detection does not monitor the idle database or disk uptime. It does not directly monitor the database files. In the above image, we saw that disk failure causes the availability group automatic failover. Suppose your AG database is idle and does not have any active transactions. In the idle state, SQL Server does not perform any physical activity to the disk. Now, suppose, you have other databases as well in the disk and disk gets full due to those database transactions. Due to disk space issues, SQL Server might not be able to perform any transactions for the databases lying in that disk. Later. You perform a transaction and SQL Server not able to write database checkpoints and write the data into the disk. At that time, SQL Server finds unhealthy database status and perform a database failover as a result of the database health change.
Similarly, if your database can satisfy the data page from the memory, SQL Server does not require to perform a physical read from the disk. It gets the data from the buffer pool memory. It might also not cause your database to failover for SQL Server always-on availability group.
Enhanced database-level failover and SQL Server event ids
SQL Server monitors the following event ids for the enhanced database-level failover.
Event ID |
Description |
605 |
Database or page corruption |
823 |
Database checkpoint failure |
829 |
Disk corruption issues |
832 |
Hardware or memory-related issues |
1101,1105 |
Disk space issues in the filegroup. In this case, SQL Server not able to allocate a new page in the filegroup. |
5102 |
Missing filegroup ID requests |
5180 |
Wrong file ID requests |
5515 |
SQL Filestream container issues |
5534 |
Filestream log corruption |
5535 |
Filestream data container corruption |
9004 |
Transaction log corruption |
SQL Server logs get error id 41653 once availability group failovers due to database-level health detection.
Configure enhanced database failover in SQL Server Always On Availability Groups
You can configure the enhanced database failover using both the GUI and T-SQL. It is an optional feature. If you have an existing availability group, you can use the dynamic management view sys.availability_groups to check the enhanced database failover is enabled or not.
Execute the below database management view (DMV) in the primary replica instance to check database-level health detection status for your existing availability group. The DMV sys.availability_groups has a column [db_failover] to check the configuration. If this DMV returns value 1 for the [db_failover] column, it means database-level health detection is enabled else zero value shows the feature is not enabled for all databases in the availability group.
1 2 3 4 5 6 |
SELECT NAME, CASE [db_failover] WHEN 0 THEN 'Disabled' ELSE 'Enabled' END 'Database level health detection' FROM sys.availability_groups |
As shown below, currently, we are not using this feature.
We can enable this feature while configuring a new availability group or in an existing availability group.
Enable enhanced database failover using GUI in an existing availability group
To enable the enhanced database failover, connect to the primary replica using SSMS. In the Always On Availability Group, right-click on the availability group and open its properties.
In the AG property, you get an option as Database-level health detection.
Put a tick for the Database-level health detection and generate the T-SQL script.
It uses the Alter Availability Group script with the DB_FAILOVER argument.
1 2 3 4 |
USE [master] GO ALTER AVAILABILITY GROUP [SQLAG2019] SET(DB_FAILOVER = ON); GO |
Create a new SQL Server Always On Availability Group with enhanced database failover
To enable the database-level health detection while creating a new availability group, enable it in the Specify option page, as shown below.
Extended event to track the database health detection
SQL Server 2016 onwards we have a new extended event availability_replica_database_fault_reporting to track the replica restart in case of the unhealthy database. You can add this event to the extended event session on the primary replica.
Disable enhanced database failover for SQL Server Always On Availability Groups
You can disable the enhanced database failover without restarting SQL Services. To disable the feature, either remove the tick from the check from the database-level health detection or use the following query.
1 2 3 4 |
USE [master] GO ALTER AVAILABILITY GROUP [SQLAG2019] SET(DB_FAILOVER = OFF); GO |
Conclusion
In this article, we explored the useful enhanced database-level failover feature in an availability group. You should enable it for the production databases. You should also group the database appropriately in an AG group so that it gets failover in case any database reports fails the database-level health detection. You can create multiple AG groups and group databases as per their availability and application requirements.
Table of contents
- Understanding PostgreSQL SUBSTRING function - September 21, 2024
- How to install PostgreSQL on Ubuntu - July 13, 2023
- How to use the CROSSTAB function in PostgreSQL - February 17, 2023