Automatic Page Repair – Smart fellow who does its work behind the scenes!


Automatic Page Repair is one feature which is not really famous, however pretty much known to most of us.

I wanted to write about Automatic Page Repair since a very long time. Today I decided to test this feature when I had to deploy and do some test cases with DBM.

In simple words Auto Page Repair feature will replace the corrupt page by requesting a readable page from the partner mirrored database.

We also need to take into consideration that not all pages can be repaired. Read more about this feature here.

Let’s now do some corruptions and see if this feature is smart enough to repair them!

Note  - Don’t try this at home (Production Environment) !

Stage 1

For the purpose of the demo we have a Mirroring Setup for the database AdventureWorks2012 (Downloaded from codeplex)

For the purpose of corrupting the page we will need to make the database offline first, hence will need to remove mirroring for a while as the database which is taking part in a mirroring session cannot be taken offline. We will re-establish DBM once corruption is completed.

We will pick one table from the database and will corrupt the index page of the same. The table which we are going to choose here is HumanResources.Employee

We will now pick one Index Page for the table

The page which we are going to corrupt is Page with ID 875

Note - I have noted to take required backups to ensure that proper rollback is possible.

 Using HexEdit tool we can enforce corruptions for page ID 875,and this action requires the database to be offline.

Once the corruptions are made(Ref to this post to understand how can we corrupt a page) we can bring the database Online and identify the corruptions using DBCC CHECKDB command

DBCC CHECKDB(AdventureWorks2012) WITH NO_INFOMSGS
 
Msg 8980, Level 16, State 1, Line 1
Table error: Object ID 1237579447, index ID 1, partition ID 72057594045136896, 
alloc unit ID 72057594050838528 (type In-row data). 
Index node page (0:0), slot 0 refers to child page (1:875) 
and previous child (0:0), but they were not encountered.

The above mentioned error message is taken from the DBCC result set.

Stage 2

Now that we have a corrupted page, we will proceed and re-establish mirroring.

After mirroring is re-established, we can try running DBCC CHECKDB once again on the same corrupted database.

DBCC CHECKDB(AdventureWorks2012) WITH NO_INFOMSGS

This time we will get a magical confirmation as below

Command(s) completed successfully.

Wow!!! Where did that corruption gone?

Yes, you guessed it right. Auto Page Repair is so smart that it replaced a clean page to the Principal Database from the Mirroring Partner.

No restore or what so ever, everything happened behind the scenes. Very neat, very smart !

Principal and Mirrored database is in sync and they help each other’s too.

We have a view called [sys].[dm_db_mirroring_auto_page_repair] which keeps track of all repair attempts which done behind the scenes.

Let’s quickly query it and see what’s in there

SELECT * FROM [sys].[dm_db_mirroring_auto_page_repair]

Bingo!!! The result came as

 The result is very clear. It says that page 875 was corrupted and it was replaced/repaired.The action was successful and the page is reusable

BOL talks about the below page_status possible values  -

2 = Queued for request from partner.

3 = Request sent to partner.

4 = Queued for automatic page repair (response received from partner).

5 = Automatic page repair succeeded and the page should be usable.

6 = Irreparable. This indicates that an error occurred during page-repair attempt, for example, because the page is also corrupted on the partner, the partner is disconnected, or a network problem occurred. This state is not terminal; if corruption is encountered again on the page, the page will be requested again from the partner.

In SQL 2012 we have an additional view for Always ON AG’s and it is

[sys].[dm_hadr_auto_page_repair]

Conclusion

In case you have a mirroring setup,then its worth querying the view and see how much the Auto Page Repair feature have helped you.

Thanks for reading.

Mirroring connection timeout – Looking back at the best practices !


How many times have you followed the best practices check list whenever you are deploying a mission critical solution like Database Mirroring,Log shipping or even Failover cluster instances ?

If the answer is No,”Have never followed any best practices checklist ! “,then its the right time to change that perspective.

Lets look at  Wikipedia definition of  best practice -

best practice is a method or technique that has consistently shown results superior to those achieved with other means, and that is used as a benchmark. In addition, a “best” practice can evolve to become better as improvements are discovered. Best practice is considered by some as a business buzzword, used to describe the process of developing and following a standard way of doing things that multiple organizations can use.

This definition makes perfect sense with SQL Server too.We do have great set of information out there which clearly talks about best practices when you are dealing with SQL Server deployments.

A simple example would be  - Your data and log file should not reside on the same drive.

There will always be a reason for a best practice and best practice is always not constant.It can evolve over time and can become better.(I like to stress this sentence a lot !)

Recently I have observed a question related to Database Mirroring(DBM) where in the user was getting an event logged in the sql error log

The mirroring connection to “TCP://abc.xyz.com:5022″ has timed out for database “DB” after 10 seconds without a response.  Check the service and network connections.

This is directly related to a best practice within DBM.The failover mechanism within database mirroring is designed such a way that the instances sends across a Ping within each other to understand if they are alive.

A simple pictorial representation of Principal Instance and Witness instance sending a ping to the Mirrored instance is shown below.This is a one way representation only.

A ping is sent every 1 second.

The mirrored/witness instance will also send across pings to check if the principal is alive.In the event of Principal not being available,Mirrored instance will talk to Witness and will get confirmation that Principal is indeed unavailable. Next course of action is initiated from then.

This course of action depends on something called as Partner Time out.The default is 10 seconds and you can change this value.As per best practices,the value should be 10 seconds and above.

Lets assume that your DBM environment is configured such a way that it utilizes synchronous mirroring with automatic failover.If the Mirrorred instance doesnt not receive any responses to Ping for 10 seconds from Principal,then a failover will be initiated.[Under default Partner Time out settings]

Lets go back to the error log message which was discussed earlier.In this situation the user was not using an Automatic failover(No witness) and the ping was between the Principal and Mirrored instance.

There were network latency between the servers,hence the Ping from Principal did not received any responses for 10 seconds and an event was logged.As there is no witness an automatic failover is not triggered.

If there was a witness for automatic failover,then it would have been pretty dangerous to initiate a failover because of network latency.

The solution for the above user is to set an upper limit for the Partner Time Out value.

ALTER DATABASE <dbname> SET PARTNER TIMEOUT 20

This command has to be ran at the Principal server end.

Conclusion

Lowering this number for achieving quick failure responses is a Myth ! The default value of 10 should never be changed to a lower value.Upper values can be set upon doing proper analysis of the DBM environment and failover requirements.

I am curious to learn if you have any upper limits for this value in your environment and if Yes,then what is the reason.Please share your thoughts as comments here.

Thanks for reading.

Dropping a mirrored database – It just wont ‘allow’ at all!


I have been doing some research with DBM(Database Mirroring) repair features and today I came across this little tricky situation and decided to write about it.Scenario is something like below  -

1. I tried to setup mirroring on a virtual environment and the whole setup was failing because of Port issues.

2. This means,Mirroring was never established and the database which I wanted to be the mirrored partner was in Recovering state.

3.I had to delete the Principal database for some testing.

4.I removed endpoints of both Principal and Mirrored Instance.

5.After a while I decided to just remove the ‘supposed to be’ mirrored database and fired the command

DROP DATABASE [CorruptMe]

Strange database name isnt? Yes,thats because I was really simulating some real bad page corruptions!

As soon as the command was executed,there was an error

Msg 3743, Level 16, State 1, Line 1
The database ‘CorruptMe’ is enabled for database mirroring. Database mirroring must be removed before you drop the database.

Wow!!! The mirroring session was not established because of issues,however its still enabled and somewhere there is a check which is why we are getting this error.

What ever option you try to Delete wont help here,and as the title says  - It just wont ‘allow’ at all!

I was completely aware of turning OFF Partner,however under normal circumstances this is done when Principal Server/Database is down for any reason and we need to bring Mirrored database manually(Assuming we dont have a witness server)

I decided to go ahead and try to wordaround this issue by using the command.

ALTER DATABASE [CorruptMe] SET PARTNER OFF

This ran without any issues.I wanted to recover the database before I proceed and delete it,so used

RESTORE DATABASE [CorruptMe] WITH RECOVERY

This was also successful and database was now completely Online.Drop action just worked as expected.

DROP DATABASE [CorruptMe]
Command(s) completed successfully.

Conclusion - Even though a mirroring session was not established,its still required to remove the partners for a scenario like this.

Thanks for reading.

Automatic Page Repair,works behind the scene and a wonderful friend of DBA


In a SQL Server 2008 mirroring environment when the primary server detected a corrupt page on disk,it asks the mirror to provide the copy of the page and then correct the damage.

This works behind the scene and no intervention is required.

DMV sys.dm_db_mirroring_auto_page_repair can be used to detect any repair attempts which have been made.

Database Mirroring – some misunderstood concepts


Database mirroring is a wonderful concept and I have observed that there are some misunderstood concepts too within Mirroring.Below are couple of them :

Witness server – There is a misconception that witness server is responsible for a failover, however that is not correct. Witness server will help the principal server and mirrored server form what is called quorum and it agrees to Mirrored server to become Principal during an event of failure (One of the failure scenario’s)

High performance mode is available in all the version – This is available only in Enterprise version.

Setup 2005 version as principal – 2008 version as mirror is not possible – Yes, this setup is possible.

What should be the version of Witness server, do we need this to be the same version as rest 2 servers – Witness can be of any edition, including free Express edition.

Failure detection is spontaneous – This depends on how much transaction log on mirror is to be replayed.

We can create a snapshot in mirror server and can use it without any issues interms with license – When you create a snapshot for the mirror the license is no longer free.