Database-level health detection in SQL Server Always On Availability Groups

This is the 8^th article in the series for SQL Server Always On Availability Groups.

Introduction

SQL Server Always On Availability Groups create a group of databases that fails over together in case the primary replica is unavailable. These availability group databases failover manually or automatically depending upon your AG configuration. By default, SQL Server fails over all databases from the primary replica to the secondary replica if the primary replica is down or SQL Services stopped in the primary instance.

You can create multiple SQL Server Always On Availability Group in SQL Server configuration. Usually, we group the application dependent databases in an availability group. These databases must be failed over together for smooth working on the application.

Suppose you have two databases in an AG group. Your primary instance is online, but one of the AG databases becomes unhealthy on the primary replica. Your application reports issue because it could not access a database in the AG group. SQL Server does not perform a failover of the AG group to the secondary replica. It requires manual intervention to fix the issues at the primary replica or perform a manual failover to the secondary replica. Sometimes we want that SQL Server should check the database health as well in an AG group and perform automatic failover if any of the databases is not healthy.

Do we have the option to enable the database-level health detection in SQL Server Always ON? Let’s explore it in this article.

Prerequisites

You should configure SQL Server Always On Availability Groups in synchronized mode. You can refer to this article, Add a new node into existing SQL Server Always On Availability Groups for further insights.

In the earlier articles, we have created an availability group with the following configuration.

Current primary replica: SQLNode1
Secondary replica in Synchronous mode: SQLNode2
Secondary replica in Asynchronous mode: SQLNode3
Two availability group databases [MyNewDB] and [SQLShackDemo]

An Overview of the database-level health detection for SQL Server Always On Availability Groups

SQL Server 2016 introduced Enhanced database failover to monitor the database health and trigger the failover for that availability group. This feature is known as Database Level Health Detection in an availability group.

We can enable this feature at the availability group; therefore, it monitors all databases corresponding to that availability group. It cannot be enabled for an individual database in an AG group. Your database should be in synchronized data commit with automatic failover mode so that SQL Server can failover it in case of any issues.

SQL Server performs the following checks for the database-level health detection. If the database status is online and SQL Server could not write in the transaction log of an AG database, it initiates AG failover. It also checks for the following error ids as part of the database health check.

Suppose you have the following AG architecture with one secondary replica. Your AG database stores data files in D drive and log file in the L drive.

Now, on your primary replica, log drive L fails. SQL Server is not able to write in the AG database transaction log. It fails a successful database health detection. SQL Server availability group fails over from primary to the secondary replica.

In another scenario, your data drive fails on the primary replica, but your log drive is intact. In this case, no AG failover occurs because SQL Server can write in the transaction log.

Database-level health detection does not monitor the idle database or disk uptime. It does not directly monitor the database files. In the above image, we saw that disk failure causes the availability group automatic failover. Suppose your AG database is idle and does not have any active transactions. In the idle state, SQL Server does not perform any physical activity to the disk. Now, suppose, you have other databases as well in the disk and disk gets full due to those database transactions. Due to disk space issues, SQL Server might not be able to perform any transactions for the databases lying in that disk. Later. You perform a transaction and SQL Server not able to write database checkpoints and write the data into the disk. At that time, SQL Server finds unhealthy database status and perform a database failover as a result of the database health change.

Similarly, if your database can satisfy the data page from the memory, SQL Server does not require to perform a physical read from the disk. It gets the data from the buffer pool memory. It might also not cause your database to failover for SQL Server always-on availability group.

Enhanced database-level failover and SQL Server event ids

SQL Server monitors the following event ids for the enhanced database-level failover.

Event ID	Description
605	Database or page corruption
823	Database checkpoint failure
829	Disk corruption issues
832	Hardware or memory-related issues
1101,1105	Disk space issues in the filegroup. In this case, SQL Server not able to allocate a new page in the filegroup.
5102	Missing filegroup ID requests
5180	Wrong file ID requests
5515	SQL Filestream container issues
5534	Filestream log corruption
5535	Filestream data container corruption
9004	Transaction log corruption

SQL Server logs get error id 41653 once availability group failovers due to database-level health detection.

Configure enhanced database failover in SQL Server Always On Availability Groups

You can configure the enhanced database failover using both the GUI and T-SQL. It is an optional feature. If you have an existing availability group, you can use the dynamic management view sys.availability_groups to check the enhanced database failover is enabled or not.

Execute the below database management view (DMV) in the primary replica instance to check database-level health detection status for your existing availability group. The DMV sys.availability_groups has a column [db_failover] to check the configuration. If this DMV returns value 1 for the [db_failover] column, it means database-level health detection is enabled else zero value shows the feature is not enabled for all databases in the availability group.

SELECT NAME,

CASE [db_failover]

WHEN 0 THEN 'Disabled'

ELSE 'Enabled'

END 'Database level health detection'

FROM sys.availability_groups

As shown below, currently, we are not using this feature.

We can enable this feature while configuring a new availability group or in an existing availability group.

Enable enhanced database failover using GUI in an existing availability group

To enable the enhanced database failover, connect to the primary replica using SSMS. In the Always On Availability Group, right-click on the availability group and open its properties.

In the AG property, you get an option as Database-level health detection.

Put a tick for the Database-level health detection and generate the T-SQL script.

It uses the Alter Availability Group script with the DB_FAILOVER argument.

USE [master]

ALTER AVAILABILITY GROUP [SQLAG2019] SET(DB_FAILOVER = ON);

Create a new SQL Server Always On Availability Group with enhanced database failover

To enable the database-level health detection while creating a new availability group, enable it in the Specify option page, as shown below.

Extended event to track the database health detection

SQL Server 2016 onwards we have a new extended event availability_replica_database_fault_reporting to track the replica restart in case of the unhealthy database. You can add this event to the extended event session on the primary replica.

Disable enhanced database failover for SQL Server Always On Availability Groups

You can disable the enhanced database failover without restarting SQL Services. To disable the feature, either remove the tick from the check from the database-level health detection or use the following query.

USE [master]

ALTER AVAILABILITY GROUP [SQLAG2019] SET(DB_FAILOVER = OFF);

Conclusion

In this article, we explored the useful enhanced database-level failover feature in an availability group. You should enable it for the production databases. You should also group the database appropriately in an AG group so that it gets failover in case any database reports fails the database-level health detection. You can create multiple AG groups and group databases as per their availability and application requirements.

A comprehensive guide to SQL Server Always On Availability Groups on Windows Server 2016

Configure Domain Controller and Active Directory for SQL Server Always On Availability Groups

Configure failover clusters, storage controllers and quorum configurations for SQL Server Always On Availability Groups

Install SQL Server 2019 on Windows Server 2016 with SQL Server Always On Availability Groups

Add a new node into existing SQL Server Always On Availability Groups

Configure Managed Service Accounts for SQL Server Always On Availability Groups

Add or remove a node from SQL Server Always On Availability Groups using T-SQL scripts

Database-level health detection in SQL Server Always On Availability Groups

Automatic Page Repair in SQL Server Always On Availability Groups

Deploy a domain-independent Windows Failover Cluster for SQL Server Always On Availability Groups

Configure a SQL Server Always On Availability Group on the domain-independent Failover Cluster

An overview of distributed SQL Server Always On Availability Groups

Deploy a distributed SQL Server Always On Availability Group

Monitor and failover a Distributed SQL Server Always On Availability Group

Transparent Data Encryption for SQL Server Always On Availability Groups

Configure SQL Server replication for a database in SQL Server Always On Availability Groups

Configuring SQL Server replication for distribution databases in SQL Server Always On Availability Groups

Explore Cross-database MSDTC for distributed transactions in SQL Server Always On Availability Groups

Deploy MSDTC for distributed transactions in SQL Server Always On Availability Groups

Restore an existing availability group database participating in SQL Server Always On Availability Groups

Exploring AG dashboards for monitoring SQL Server Always On Availability Groups

Backup compression in TDE enabled databases in SQL Server Always On Availability Groups

Impact of dropping a login in the active directory tied to SQL Server Always On Availability Groups

Refresh SQL Server Always On Availability Group databases using DBATools PowerShell

Configure SQL Server Reporting Services databases in SQL Server Always On Availability Groups

Suspend and Resume Data Movement in SQL Server Always On Availability Groups

Explore failover types in SQL Server Always On Availability Groups

Explore SQL Server PolyBase External Tables in SQL Server Always On Availability Groups

SQL Server Always On Availability Groups for SQL Server Linux instances

Column-level SQL Server encryption with SQL Server Always On Availability Groups

Make the most of secondary replicas in SQL Server Always On Availability Groups

Apply SQL Server patches or cumulative updates in SQL Server Always On Availability Groups

Monitor SQL Server Always On Availability groups using extended events

The Hub and Spoke model of policy-based management for SQL Server Always On Availability Groups

Custom policies for AG dashboards of SQL Server Always On Availability Groups

Explore dynamic management views for monitoring SQL Server Always On Availability Groups

Dynamic management views for monitoring availability replicas and databases for SQL Server Always On Availability

Configure SQL Server Always On Availability Groups using Windows PowerShell scripts

Configure Integration Services Catalog Database SSISDB in SQL Server Always On Availability Groups

Synchronize logins between Availability replicas in SQL Server Always On Availability Group

Session timeouts in SQL Server Always On Availability Groups

Lease Timeouts and Health Checks in SQL Server Always On Availability Groups

Author
Recent Posts

Rajendra Gupta

Hi! I am Rajendra Gupta, Database Specialist and Architect, helping organizations implement Microsoft SQL Server, Azure, Couchbase, AWS solutions fast and efficiently, fix related issues, and Performance Tuning with over 14 years of experience.

I am the author of the book "DP-300 Administering Relational Database on Microsoft Azure". I published more than 650 technical articles on MSSQLTips, SQLShack, Quest, CodingSight, and SeveralNines.

I am the creator of one of the biggest free online collections of articles on a single topic, with his 50-part series on SQL Server Always On Availability Groups.

Based on my contribution to the SQL Server community, I have been recognized as the prestigious Best Author of the Year continuously in 2019, 2020, and 2021 (2nd Rank) at SQLShack and the MSSQLTIPS champions award in 2020.

Personal Blog: https://www.dbblogger.com
I am always interested in new challenges so if you need consulting help, reach me at rajendra.gupta16@gmail.com

View all posts by Rajendra Gupta

Latest posts by Rajendra Gupta (see all)

Understanding PostgreSQL SUBSTRING function - September 21, 2024
How to install PostgreSQL on Ubuntu - July 13, 2023
How to use the CROSSTAB function in PostgreSQL - February 17, 2023