How many times have you followed the best practices check list whenever you are deploying a mission critical solution like Database Mirroring,Log shipping or even Failover cluster instances ?
If the answer is No,”Have never followed any best practices checklist ! “,then its the right time to change that perspective.
Lets look at Wikipedia definition of best practice –
A best practice is a method or technique that has consistently shown results superior to those achieved with other means, and that is used as a benchmark. In addition, a “best” practice can evolve to become better as improvements are discovered. Best practice is considered by some as a business buzzword, used to describe the process of developing and following a standard way of doing things that multiple organizations can use.
This definition makes perfect sense with SQL Server too.We do have great set of information out there which clearly talks about best practices when you are dealing with SQL Server deployments.
A simple example would be – Your data and log file should not reside on the same drive.
There will always be a reason for a best practice and best practice is always not constant.It can evolve over time and can become better.(I like to stress this sentence a lot !)
Recently I have observed a question related to Database Mirroring(DBM) where in the user was getting an event logged in the sql error log
The mirroring connection to “TCP://abc.xyz.com:5022” has timed out for database “DB” after 10 seconds without a response. Check the service and network connections.
This is directly related to a best practice within DBM.The failover mechanism within database mirroring is designed such a way that the instances sends across a Ping within each other to understand if they are alive.
A simple pictorial representation of Principal Instance and Witness instance sending a ping to the Mirrored instance is shown below.This is a one way representation only.
A ping is sent every 1 second.
The mirrored/witness instance will also send across pings to check if the principal is alive.In the event of Principal not being available,Mirrored instance will talk to Witness and will get confirmation that Principal is indeed unavailable. Next course of action is initiated from then.
This course of action depends on something called as Partner Time out.The default is 10 seconds and you can change this value.As per best practices,the value should be 10 seconds and above.
Lets assume that your DBM environment is configured such a way that it utilizes synchronous mirroring with automatic failover.If the Mirrorred instance doesnt not receive any responses to Ping for 10 seconds from Principal,then a failover will be initiated.[Under default Partner Time out settings]
Lets go back to the error log message which was discussed earlier.In this situation the user was not using an Automatic failover(No witness) and the ping was between the Principal and Mirrored instance.
There were network latency between the servers,hence the Ping from Principal did not received any responses for 10 seconds and an event was logged.As there is no witness an automatic failover is not triggered.
If there was a witness for automatic failover,then it would have been pretty dangerous to initiate a failover because of network latency.
The solution for the above user is to set an upper limit for the Partner Time Out value.
ALTER DATABASE <dbname> SET PARTNER TIMEOUT 20
This command has to be ran at the Principal server end.
Lowering this number for achieving quick failure responses is a Myth ! The default value of 10 should never be changed to a lower value.Upper values can be set upon doing proper analysis of the DBM environment and failover requirements.
I am curious to learn if you have any upper limits for this value in your environment and if Yes,then what is the reason.Please share your thoughts as comments here.
Thanks for reading.
60 secs and still thinking of increasing it. The app DB keeps on reporting a swap eventhough the principal is actually up and running.