Tuesday, February 15, 2022

SQL Server fast failover in High Availability Groups

 I recently had the somewhat dubious honor of helping migrate a company from a physical data center to the Azure cloud.  I say dubious because for as many positives as there are about Azure, there are at least the same number of negatives.  The only feature Azure has that makes it worth migrating, in my opinion, for most smaller companies is the ability to work and see the environment the same way a data center would see it; so much more visibility, not good visibility, just visibility.

However, my feelings about Azure are not the point of this post.  As a part of this migration we went from a traditional failover SQL cluster to an Always on High Availability group; primarily because the dba doing the migration could not figure out how to do a failover cluster in Azure.

In my short exposure to the high availability option it is both good and bad.  The automated failover in SQL server is faster than it was using the old method, however the visibility into the individual server databases is worse.

The DBA handling the migration was unable to figure out how to manually fail back the system in a reasonable time frame.  He was using a SQL script that failed back one availability group at a time.  This approach would leave our system down for up to a minute, a time frame that would increase as we added more databases to the system, and this was a big problem.

To solve this problem I decided to go with a multi-threaded approach so I could fail back every availability group simultaneously.  Unfortunately SQL does not seem to have any form of reasonable multi-threaded capability available for a script to use.  So I turned to scripting languages.

While this script could be written in most scripting languages, I chose the older VBscript this time.  The general idea is that the script loops through all found availability groups, and spins up an instance of itself in a separate thread to start a failover for each one.


Const sqlUser = "<username>" ' add your server admin login username here
Const sqlPass = "<password>" ' add the password for the username here
Const Server = "<ip address>" ' enter the ip address of the DESTINATION server to fail to here

Const adOpenStatic = 3
Const adLockOptimistic = 3

'
' Here is where we fail over the current AG
'
If (WScript.Arguments.Count > 0) Then
    'Wscript.Echo WScript.Arguments.Item(0)
Set failConnection = CreateObject("ADODB.Connection")
    failConnection.Open _
        "Provider=SQLOLEDB;Data Source=" + Server + ";" & _
            "Trusted_Connection=No;Initial Catalog=master;" & _
                 "User ID=" + sqlUser + ";Password=" + sqlPass + ";"
failConnection.Execute "ALTER AVAILABILITY GROUP " + WScript.Arguments.Item(0) + " FAILOVER;"

failConnection.Close
Set failConnection = Nothing
    WScript.Quit 0
End If



'
' Below is where we get the list of AGs to fail over
'


Dim FSO
Set FSO = CreateObject("Scripting.FileSystemObject")
GetCurrentFolder = FSO.GetAbsolutePathName(".")

Set objConnection = CreateObject("ADODB.Connection")
Set WshShell = WScript.CreateObject("WScript.Shell")


objConnection.Open _
    "Provider=SQLOLEDB;Data Source=" + Server + ";" & _
        "Trusted_Connection=No;Initial Catalog=master;" & _
             "User ID=" + sqlUser + ";Password=" + sqlPass + ";"

Set objRecordSet = objConnection.Execute("SELECT name FROM master.sys.availability_groups")

objRecordSet.MoveFirst

Do Until objRecordSet.EOF
    'Wscript.Echo objRecordSet.Fields("name")
WshShell.Run Wscript.ScriptFullName + " " + objRecordSet.Fields("name")
objRecordSet.MoveNext
Loop

objRecordSet.Close
objConnection.Close
Set objConnection = Nothing

Wscript.Echo "All failover commands fired"


With this approach the entire fail back is reduced to a couple of seconds, and the time does not increase noticeably as additional groups get added in the future.

No comments: