Introduction
SQL Server Always On Availability Groups provides HADR solutions for the SQL databases. Here HA refers to high availability and DR refers to disaster recovery. The priority of this feature is to keep the database highly available and then provide Disaster recovery. Due to this reason, if the secondary replica goes down in a synchronous data commit mode, SQL Server changes commit mode to Asynchronous so that users can continue run the transactions and a secondary replica can be in sync later once it bought up. SQL Listener also points to the primary replica and continues redirects connection to the primary replica.
Suppose for a two-node synchronous data commit mode availability group, Secondary replica is down due to some hardware or power failure issues, and it might take longer for the respective team to fix it.
However, users will continue to use the primary replica database, but SQL Server holds the transaction log records on the primary replica. It cannot clear the transaction log despite the regular transaction log backups. It will cause transaction log growth, and we might face disk space-related issues if we do not have sufficient free space in the drive.
In this case, it is good to remove the secondary replica (unhealthy) from the SQL Server Always On Availability Group. Once we remove the unhealthy replica, it is not part of the AG group, and SQL Server does not need to hold the transaction log for late usage.
Remove a SQL Server Always On Availability Group
To remove an availability group, connect to primary replica and go to Availability Replicas. Right-click on the replica we want to remove and click on Remove from Availability Group.
Alternatively, you can execute the following query on the primary replica in SQL Server Always On Availability Group.
1 2 |
ALTER AVAILABILITY GROUP [TestAG] REMOVE REPLICA ON N'SQL2017Test'; GO |
Once you remove a replica from the availability replicas, all database status changes to Not Synchronizing.
Once the secondary database instance becomes available, connect to the secondary replica, you can drop an availability group.
1 |
DROP AVAILABILITY GROUP [TestAG]; |
It changes the database status to Restoring mode.
Add the database back in the SQL Server Always On Availability Group
Now we want to add the database back to the SQL Server Always On Availability Group. We can see the following scenarios to add this database into the AG group.
Scenario 1: No log backup occurred after the secondary replica is down
Suppose you have not taken any log backup after you removed the replica from the availability group replicas. Once the replica is up, we want to add it again in the availability group.
In the primary replica, expand Availability Groups and right-click on the availability replicas. Now click on Add replica. In the initial data synchronization page of SQL Server Always On Availability Group, select the JOIN ONLY data synchronization method. Primary replica holds all the transaction logs, and it should bring the secondary database to the state in sync with the primary replica.
It performs availability group validations and skips unwanted validations. We can ignore the warning for the listener because you can create SQL listener at any point after adding the replica and database in the AG group as well.
In the next step, we can see that it adds both the TestAG and AdventureWorks2014 database as part of the availability group.
We can verify the status of AG synchronization using the following query.
Scenario 2: Regular log backups occurred after the secondary replica is down
To demonstrate this scenario, I configured a maintenance plan to take regular log backups on a one-minute interval on the primary replica. Let’s replicate the scenario again by removing the secondary replica from the Availability replica group.
At this point, you only have one node SQL Server Always On Availability Group replica and regular backups happening the primary replica for AdventureWorks2014 database. We do not need transaction log backups for the TestDB database.
Suppose secondary replica is up again and you try to add the replica by using the JOIN ONLY method.
In the below screenshot, we can see the followings:
- Joining TestDB to the availability group is successful. You can recall that we have not taken any log backups for this database after removing the secondary replica instance
Joining AdventureWorks2014 database to the availability group is failed
Click on hyperlink Error in front of the failed entry. You get the following detailed error message.
The remote copy of database “Adventureworks2014” has not been rolled forward to a point in time that is encompassed in the local copy of the database log. (Microsoft SQL Server, Error: 1412)
Execute the query to check the synchronization status of both the databases between the primary and secondary replica in SQL Server Always On Availability replica. It shows the Synchronized and Healthy status for the TestDB database.
1 2 3 4 5 6 7 8 9 10 11 12 |
SELECT sadc.database_name, ag.name AS ag_name, dhrs.is_local, dhrs.is_primary_replica, dhrs.synchronization_state_desc, dhrs.is_commit_participant, dhrs.synchronization_health_desc FROM sys.dm_hadr_database_replica_states AS dhrs INNER JOIN sys.availability_databases_cluster AS sadc ON dhrs.group_id = sadc.group_id AND dhrs.group_database_id = sadc.group_database_id INNER JOIN sys.availability_groups AS ag ON ag.group_id = dhrs.group_id INNER JOIN sys.availability_replicas AS sar ON dhrs.group_id = sar.group_id AND dhrs.replica_id = sar.replica_id; |
The status for the adventureworks2014 database is still Not synchronizing and Not healthy for the secondary replica.
In the Availability databases, we can see a warning message in front of the adventureworks2014 database.
You can plan the following approaches to fix this issue.
Approach 1: Backup and Restore to sync the primary and secondary replica
- Take a Full database back of the database and corresponding log backups
- Restore database into each secondary replica ( in case you have multiple secondary replicas)
- Join the database into the availability group
Approach 2: Apply the transaction log backup and use JOIN ONLY method
In the secondary replica of SQL Server Always On Availability Group, check the SQL Server error logs. You can find the entry for the adventureworks2014 database similar to below.
You can note the last hardened LSN 97:5609:1 in this error logs. Execute the following query on MSDB database of the primary replica to check the LSN backup range.
1 2 3 4 5 6 7 8 9 |
SELECT name, backup_set_id, backup_start_date, backup_finish_date, first_lsn, last_lsn FROM msdb..backupset WHERE first_lsn < '97000000560900001' AND last_lsn > '97000000560900001'; |
We need to apply all transaction log backup that occurred after this log backup. We can find the list of all required log backups using the following query.
1 2 |
select name, backup_set_id, backup_start_date, backup_finish_date, first_lsn, last_lsn from msdb..backupset where last_lsn>'97000000560900001' |
We get a list of required transaction log backups that needs to apply on a secondary replica database before adding a database to the replica.
Let’s apply these database log backups on the secondary replica; database status should be NORECOVERY after all log backups restore.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
RESTORE LOG [adventureworks2014] FROM DISK = N'E:\TestDB\adventureworks2014_backup_2019_07_08_142702_1276318.trn' WITH FILE = 1, NORECOVERY, NOUNLOAD, STATS = 10 GO RESTORE LOG [adventureworks2014] FROM DISK = N'E:\TestDB\adventureworks2014_backup_2019_07_08_142801_2372113.trn' WITH FILE = 1, NORECOVERY, NOUNLOAD, STATS = 10 GO RESTORE LOG [adventureworks2014] FROM DISK = N'E:\TestDB\adventureworks2014_backup_2019_07_08_142901_7571708.trn' WITH FILE = 1, NORECOVERY, NOUNLOAD, STATS = 10 GO RESTORE LOG [adventureworks2014] FROM DISK = N'E:\TestDB\adventureworks2014_backup_2019_07_08_143002_0538116.trn' WITH FILE = 1, NORECOVERY, NOUNLOAD, STATS = 10 GO RESTORE LOG [adventureworks2014] FROM DISK = N'E:\TestDB\adventureworks2014_backup_2019_07_08_143101_6397792.trn' WITH FILE = 1, NORECOVERY, NOUNLOAD, STATS = 10 GO RESTORE LOG [adventureworks2014] FROM DISK = N'E:\TestDB\adventureworks2014_backup_2019_07_08_143202_1396004.trn' WITH FILE = 1, NORECOVERY, NOUNLOAD, STATS = 10 GO RESTORE LOG [adventureworks2014] FROM DISK = N'E:\TestDB\adventureworks2014_backup_2019_07_08_143301_4363776.trn' WITH FILE = 1, NORECOVERY, NOUNLOAD, STATS = 10 GO RESTORE LOG [adventureworks2014] FROM DISK = N'E:\TestDB\adventureworks2014_backup_2019_07_08_143401_9456439.trn' WITH FILE = 1, NORECOVERY, NOUNLOAD, STATS = 10 GO RESTORE LOG [adventureworks2014] FROM DISK = N'E:\TestDB\adventureworks2014_backup_2019_07_08_143501_5423424.trn' WITH FILE = 1, NORECOVERY, NOUNLOAD, STATS = 10 GO RESTORE LOG [adventureworks2014] FROM DISK = N'E:\TestDB\adventureworks2014_backup_2019_07_08_143601_9364591.trn' WITH FILE = 1, NORECOVERY, NOUNLOAD, STATS = 10 GO RESTORE LOG [adventureworks2014] FROM DISK = N'E:\TestDB\adventureworks2014_backup_2019_07_08_143701_4953797.trn' WITH FILE = 1, NORECOVERY, NOUNLOAD, STATS = 10 GO RESTORE LOG [adventureworks2014] FROM DISK = N'E:\TestDB\adventureworks2014_backup_2019_07_08_143802_0194152.trn' WITH FILE = 1, NORECOVERY, NOUNLOAD, STATS = 10 GO RESTORE LOG [adventureworks2014] FROM DISK = N'E:\TestDB\adventureworks2014_backup_2019_07_08_143901_3786076.trn' WITH FILE = 1, NORECOVERY, NOUNLOAD, STATS = 10 GO |
You get the output of each log backup restore similar to the below.
Once we restored all log backup, connect to the secondary replica and right-click on the adventureworks2014 database in SQL Server Always On Availability Group. Click on the Join to Availability Group.
It opens the following wizard to join the database to an existing availability group. In the error option, by default selected option is – Continue executing after error.
Click Ok and it completes the wizard. You can see that the warning message icon turns into a green icon.
Verify the database synchronization status for the adventureworks2014 database.
Conclusion
In this article, we walked through the process to resolve issues when the secondary replica is down. We also learned to bring the secondary database in sync in case it lags from the primary replica. You should be aware of these approaches. You need to be quick on resolving the issues, especially in the production environment. If you have any comments or questions, feel free to leave them in the comments below.
- Understanding PostgreSQL SUBSTRING function - September 21, 2024
- How to install PostgreSQL on Ubuntu - July 13, 2023
- How to use the CROSSTAB function in PostgreSQL - February 17, 2023