When disks have different sector sizes for primary and secondary replica log files in an AlwaysOn Availability Group Config


Recently I was analyzing a stress test result for an AG configuration, and as usual I was reading to SQL error logs to see if there is anything out of normal happened during the stress window.

In one of the asynchronous replicas, I started noticing a series of messages related to log IOs:

disksec1

The keyword misaligned tempted me to check the sector size of the log file involved, and this is where things started interesting.

Log file on the primary:

disksec2

Log file on the secondary:

disksec3

From the above screenshots, you can clearly see that the disks are not aligned.

So, what’s a big deal about this?  When disks for primary and secondary are not aligned, then the AG synchronization process can run slow. This is not something which you would like to see in a Production server.

To ensure that you don’t have slow AG sync process,all the disks involved(specifically log file disks) in an AG configuration should have same sector size(Recommended).

Microsoft did released a hotfix to fix slow synchronization issues with misaligned disks, however I would still prefer setting the disks correctly rather than opting for this hotfix.

You can refer the hotfix details here – https://support.microsoft.com/en-us/kb/3009974

Trace flag 1800 has to be enabled, for this hotfix to work, and it’s applicable for SQL 2016 too.

Conclusion: 

AlwaysOn AG has multiple dependencies(like WSFC, Networks, Storage Subsystem etc) and setting some of the best practices for these dependencies will ensure that your AG is healthy and running at optimum levels.

Thanks for reading, and keep watching this space for more!

AlwaysOn Availability Group Database Level Health Detection – SQL Server 2016


SQL Server 2016 AlwaysOn AG got many improvements, and I’m pretty impressed with the log throughput improvements and redo improvements based on some of my test cases(A detailed blog post on that will follow soon). In this blog post we will look at another key improvement, ie Database Level Health Detection.

SQL Server 2016 enhances the AlwaysOn health diagnostics with database health detection. If AlwaysOn availability group database health detection has been selected for your availability group, and an availability database transitions out of the ONLINE state (sys.databases.state_desc), then entire availability group will failover automatically.

Let’s look at a demo to demonstrate the new behaviour –

I have enabled Database Level Health Detection for my AG as shown below –

AG20161Now, in theory if one of the disks which is hosting my database tpcc is gone(both data/log file), then it should trigger a failover.

Currently node 001 is the primary replica. Let’s see, if theory matches with reality.

I went ahead and inserted data continuously to one of the tables in the database, and took the disk(which hosts both data/log files) offline via Disk Management –

AG20162

That caused the database to go into Recovery Pending State –

AG20163

Note – This controlled failure simulation was not a straightforward process. Initially the database was not going to a degraded state. After my interaction with some of the top SQL Server Experts out there in our community, I was able to force the DB go into a degraded state.

Ref this twitter thread for more details – https://twitter.com/AnupWarrier/status/748364014682578949

This immediately caused a failover, and now 002 is the primary replica.

AG20164

Conclusion – 

This proves that Database level health detection works seamlessly and a failover is triggered when the state of a database changes from Online to other degraded states.

Thanks for reading, and keep watching this space for more!