Max Text Replication Size – When you might have to care about this number !


One of the advanced Server level options for SQL Server is Max Text Replication Size and this is really not a common one. In most of cases the default value is not changed.

This configuration specifies the maximum size of data that can be added to a replicated column in a single INSERT, UPDATE, WRITETEXT or UPDATETEXT statement. This applies to data type text, ntext, varchar (max), nvarchar (max), varbinary (max), xml, and image.

The default value for this configuration is 65536 bytes = 0.0625 MB.

Why we care about this value? Here is a reason why we might have to carefully change this value according to our requirements.

Recently I had setup a database for Citrix XenApp 6.5 farm data store requirements and configured transactional replication as part of the DR requirement.

After configuring replication the Citrix team where able to read data out of the database, however nothing was getting published. They were consistently getting errors like “Unknown error occurred: error code 0x82060035”

Things where working great before and this error started popping up after replication was setup. The only change made was setting replication for this database.

Upon checking the database closely I found out one table called dbo.KEYTABLE which had a column called data which was of type Varbinary (max).

As this column was part of transactional replication the value of Max Text Replication Size came into picture and anything above 65536 bytes on a single Insert was not allowed and Citrix faced issues while publishing new apps.

Carefully choosing the best possible value for this configuration setting fixed the problem.

It has a max value of 2147483647 bytes which is 2 GB.

I really don’t recommend directly increasing the value to 2147483647 bytes, and I would always test a good value which works fine for the environment.

High value will allow huge Inserts,Update in a single statement and can bring in network latency while replicating.

I had a twitter discussion with SQLServer expert Robert L Davis(B/T) to double check if changing this server level value has any other direct impact, and he confirmed that there aren’t any. Many thanks to Robert!

Thanks for reading.

Advertisements

Why AD level permissions are important – The cluster resource ‘SQL Server’ could not be brought online !


When your cluster install fails, then there is lot to learn!!!

Today I am writing about my very recent experience working on a clustering deployment. It was for a two node cluster with single SQL Instance.

I stopped using Active/Passive terminology long back as it is not the right usage. Clustering MVP Allen Hirt (B/T) has pointed out this fact much time via his blog posts and through SQL forums.

There were no errors returned during the initial stages (Rule checks) of SQL cluster install. The setup apparently gave the below error at one point during the final configuration process and the Database Engine Install was failed.

The cluster resource ‘SQL Server’ could not be brought online.  

Error: The resource failed to come online due to the failure of one or more provider resources.

(Exception from HRESULT: 0x80071736)

There were no specific details on the SQL error log (Available under the Setup Bootstrap folder) which I could observe which eventually will lead me to find the reason for the error.

I kept checking the Windows error logs and hit this event right away –

[Click the picture for full view]

The reason for the error is the CNO (cluster computer account) don’t have the create computer perms at OU level.

We can test this by doing a simple Client Access Point Test

We can provide a Name and an IP (which gets picked automatically).This will create a computer object just the same way SQL Server does.

In some cases the Cluster service account are blocked from creating a computer object. In that situation you will need to work with the domain administrator and they should pre-create the virtual server computer object, and then grant certain access rights to the Cluster service account on the pre-created computer object.

In my case the domain services team created the computer object manually and then granted the cluster account full permissions for the same.

Conclusion

Domain level permissions are really important during cluster deployments, hence the person responsible for setting up the SQL cluster should closely interact with both windows team and domain services team(In most of the cases, both operations are handled by one single team) to understand what level of permissions are required or closely work together to isolate and fix potential problems.

Automatic Page Repair – Smart fellow who does its work behind the scenes!


Automatic Page Repair is one feature which is not really famous, however pretty much known to most of us.

I wanted to write about Automatic Page Repair since a very long time. Today I decided to test this feature when I had to deploy and do some test cases with DBM.

In simple words Auto Page Repair feature will replace the corrupt page by requesting a readable page from the partner mirrored database.

We also need to take into consideration that not all pages can be repaired. Read more about this feature here.

Let’s now do some corruptions and see if this feature is smart enough to repair them!

Note  – Don’t try this at home (Production Environment) !

Stage 1

For the purpose of the demo we have a Mirroring Setup for the database AdventureWorks2012 (Downloaded from codeplex)

For the purpose of corrupting the page we will need to make the database offline first, hence will need to remove mirroring for a while as the database which is taking part in a mirroring session cannot be taken offline. We will re-establish DBM once corruption is completed.

We will pick one table from the database and will corrupt the index page of the same. The table which we are going to choose here is HumanResources.Employee

We will now pick one Index Page for the table

The page which we are going to corrupt is Page with ID 875

Note I have noted to take required backups to ensure that proper rollback is possible.

 Using HexEdit tool we can enforce corruptions for page ID 875,and this action requires the database to be offline.

Once the corruptions are made(Ref to this post to understand how can we corrupt a page) we can bring the database Online and identify the corruptions using DBCC CHECKDB command

DBCC CHECKDB(AdventureWorks2012) WITH NO_INFOMSGS
 
Msg 8980, Level 16, State 1, Line 1
Table error: Object ID 1237579447, index ID 1, partition ID 72057594045136896, 
alloc unit ID 72057594050838528 (type In-row data). 
Index node page (0:0), slot 0 refers to child page (1:875) 
and previous child (0:0), but they were not encountered.

The above mentioned error message is taken from the DBCC result set.

Stage 2

Now that we have a corrupted page, we will proceed and re-establish mirroring.

After mirroring is re-established, we can try running DBCC CHECKDB once again on the same corrupted database.

DBCC CHECKDB(AdventureWorks2012) WITH NO_INFOMSGS

This time we will get a magical confirmation as below

Command(s) completed successfully.

Wow!!! Where did that corruption gone?

Yes, you guessed it right. Auto Page Repair is so smart that it replaced a clean page to the Principal Database from the Mirroring Partner.

No restore or what so ever, everything happened behind the scenes. Very neat, very smart !

Principal and Mirrored database is in sync and they help each other’s too.

We have a view called [sys].[dm_db_mirroring_auto_page_repair] which keeps track of all repair attempts which done behind the scenes.

Let’s quickly query it and see what’s in there

SELECT * FROM [sys].[dm_db_mirroring_auto_page_repair]

Bingo!!! The result came as

 The result is very clear. It says that page 875 was corrupted and it was replaced/repaired.The action was successful and the page is reusable

BOL talks about the below page_status possible values  –

2 = Queued for request from partner.

3 = Request sent to partner.

4 = Queued for automatic page repair (response received from partner).

5 = Automatic page repair succeeded and the page should be usable.

6 = Irreparable. This indicates that an error occurred during page-repair attempt, for example, because the page is also corrupted on the partner, the partner is disconnected, or a network problem occurred. This state is not terminal; if corruption is encountered again on the page, the page will be requested again from the partner.

In SQL 2012 we have an additional view for Always ON AG’s and it is

[sys].[dm_hadr_auto_page_repair]

Conclusion

In case you have a mirroring setup,then its worth querying the view and see how much the Auto Page Repair feature have helped you.

Thanks for reading.

Distribution clean up: distribution job failing with error – Could not remove directory !


Today I came across an error for the Distribution clean up: distribution job in one of the environments

Message – 
Executed as user: Domain\Sqlagentaccount. Could not remove directory ‘\\MachineName\ReplData\unc\MachineName_Replica_Replica10_PUB\20120403180393\’. Check the security context of xp_cmdshell and close other processes that may be accessing the directory. [SQLSTATE 42000] (Error 20015). The step failed.

[Note – I have altered the error message contents for security reasons]

This job was running fine and it started giving issues suddenly.

Troubleshooting steps – 

1. As a first step I checked if xp_cmdshell is configured or not.It was found that xp_cmdshell was indeed enabled.

2.I started to dig into the job to see what it runs.The job runs a stored procedure

EXEC dbo.sp_MSdistribution_cleanup @min_distretention = 0, @max_distretention = 72

3.When this is triggered from a job the Agent account is used,hence I decided to run this from SSMS query window.

I got the same error message as mentioned above,along with this

Replication-@rowcount_only parameter must be the value 0,1, or 2. 0=7.0 compatible checksum. 1=only check rowcou: agent distribution@rowcount_only parameter must be the value 0,1, or 2. 0=7.0 compatible checksum. 1=only  scheduled for retry. Could not clean up the distribution transaction tables.

4. I was pretty sure that these error messages are little misleading type and decided to go ahead with verifying the security permission of the UNC share.

\\MachineName\ReplData

5. Initially I was focusing on the security permissions of the unc share,and assigned both the Agent account and the Database Engine account full_control

6. Ran the job again and it failed yet again.

7. I decided to do some R&D via web and found this blog post from SQL Server Support team.This post was pointing that the SQL Account should also have full_control for the UNC share.

8. I went ahead and granted full_control to SQL Account for the UNC share and the issue was resolved.

This was indeed a strange behavior because the job was running fine before with the SQL Account being part of the UNC share full_control list.

The only change which had happened within the environment is SP4 upgrade and this should not have caused this trouble.

As a test case I removed the permission of the SQL Account  once again for the UNC share and tried to run the job.This was successful,which was yet again a strange behaviour.

Conclusion

This particular behavior is not documented anywhere nor this has been noticed by many people within the SQL Family,hence in case you face the same situation,then you might have to double check the permissions for the UNC share to isolate the issue and get a quick solution.

Thanks for reading.

Deleted the TUF file!!! Boy, that’s trouble


Just 2 days back I wrote a post of TUF files related to log shipping. You can read the post here 

Today we will see what is going to happen if someone deleted the TUF file accidentally or by any chance it got missed.

I tried to simulate this on my test machine which had log shipping configured. Below are the steps which I followed –

1. Deleted the TUF file which was available in the secondary server.

2. The delete operation was successful.

3. Checked log shipping status and found that the health is ‘Good’

4. Both primary and secondary databases are synced and both have got same set of data. Row by row,Col by Col.

Note – Ideally deleting the TUF file should also cause issues to log shipping secondary restores, however my simulation did not faced that behavior.

All looks good, and you might be wondering that deleting a TUF file is easy and it’s not going to hurt me much!!!

Now, let’s assume that we lost our primary database server due to Memory burn(Short circuit) and we are in need of the Secondary database.

The RTO and RPO matrix is quite okay and we are allowed to bring the secondary database up within 30 minutes. Walk in the park right? We just have to bring the database up, the users/jobs/other objects are already taken care and just the database needs to be up.

Let’s write this simple 6 word TSQL to bring our database up.

RESTORE DATABASE [XenDevDS] WITH RECOVERY

XenDevDS is my test database which is available in the secondary server and its primary copy was the one which was residing on the server which just went for a trip(Memory burn!)

As soon as we execute this command with a big smile assuming that the database will be up, we will get this message –

Msg 3013, Level 16, State 1, Line 1
RESTORE DATABASE is terminating abnormally.
Msg 3441, Level 17, State 1, Line 1
During startup of warm standby database ‘XenDevDS’ (database ID 7), its standby file (‘C:\Program Files\Microsoft SQL Server\MSSQL11.SERVER2012B\MSSQL\DATA\XenDevDS_20120112191505.tuf’) was inaccessible to the RESTORE statement. The operating system error was ‘2(The system cannot find the file specified.)’. Diagnose the operating system error, correct the problem, and retry startup.

What does it mean – It simply means that you have done a good job by deleting the TUF file and now please bring it back.

TUF file is required for the Stand by database to recover and we will not be able to bring the database up without the same.

As the simulation was in a very controlled environment, I brought back the TUF file and ran the restore command once again.

RESTORE DATABASE [XenDevDS] WITH RECOVERY

RESTORE DATABASE successfully processed 0 pages in 0.908 seconds (0.000 MB/sec).

The database was recovered and was accepting new connections.

Conclusion – TUF file is a very important part of recovery of a stand by database and we have to educate server ops team or anyone who is responsible for cleaning up files and make sure that this is un-touched.

Do you have any ways to recover a stand by database in log shipping secondary without TUF file.If Yes,then please share your experience in the comments section of this post.

Thanks for reading.

TUF File – Not a very famous member,but does his job pretty well!


I have seen various questions related to TUF files,and one of the discussion was interesting and it was something like below-

<Start>

John  – I don’t understand why we need this TUF file in SQL Server, what does it do? I have been looking around for more information, but seems there is no great information around the same.

Kim – Are you talking about .TRN files?

John – No, I am talking about .TUF files. Trust me it’s there!

Kim – Oh, then I am missing something. Let me check that out.

</End of discussion>

So what is this TUF file is all about?

I was also not very sure of what TUF file deals with, however after some research I was able to understand the concept of TUF files and decided to write this post.

TUF file or a Transaction Undo File is created when performing log shipping to a server in Standby mode. This file contains information on all the modifications performed at the time backup is taken.

This file is important in Standby mode of log shipping were you can access the secondary database. Database recovery is done in standby mode when log is restored.

While restoring the log backup, un-committed transactions will be recorded to the undo file and only committed transactions will be written to disk there by making users to read the database. When we restore next Tlog backup SQL server will fetch the un-committed transactions from undo file and check with the new Tlog backup whether the same is committed or not. If its committed the transactions will be written to disk else it will be stored in undo file until it gets committed or rolled back.

A small graphical representation of the above statement is shown below –

I configured log shipping to test TUF file and created a scenario like below –

1. Created a primary database.

2. Configured log shipping to another Instance within the same box.

3. Backup, Copy and Restore to happen every 15 minutes.

4. Continuously inserted data to the primary database to simulate TUF creation.

5. I was able to find TUF file created under the same path were I had placed my system databases files.

There seems to be changes in this path were we can find the TUF files. It will be available in the root as mentioned above for SQL Server 2008 above and used to be in the LS_Copy folder for earlier versions.

 

 

Coming up next – What happens when I delete this file? So please stay tuned my friends.

Thanks for reading.

Partial database availability – A walk through


Partial database availability is an exiting feature,and I decided to write this blog post after observing many doubts related to this feature in forums.

Lets assume a situation like mentioned below –

We have a database with multiple file groups and data files reside separately in respective file groups.Now assume a situation where we have a severe disk failure and one of the .ndf file residing drive is corrupted!

This will make the database inaccessible.We have multiple options to recover from this situation and one among them is to do a restore of the database using backup sets.Think of a situation where our database is super large and a restore will take around 30 – 45mins.

Do we really want our database users to wait until we complete the restore? What if we give them a portion of the database online,while we work on the recovery part and bring everything online slowly.

Wow!!! (Business will just love these ideas as soon as I tell them).However this one solution is not so simple and require lot of planning,testing and the application should be able to work without a portion of data.

Lets do a demo of this situation and understand how we can achieve partial database availability –

1. We will create a demo database

--Created a Database
CREATE DATABASE TEST_FILEGROUP

2. Create a new file group

--Create a new FileGroup
ALTER DATABASE TEST_FILEGROUP
ADD FILEGROUP ADDITIONAL

3. Add one additional data file to the database

--Add an additional data file to the database
ALTER DATABASE TEST_FILEGROUP
ADD FILE (NAME='NEW_DATA_FILE',
FILENAME='D:\Program Files\Microsoft SQL Server\MSSQL10_50.SQL2008R2RD\MSSQL\DATA\NEW_DATA_FILE.ndf')
TO FILEGROUP ADDITIONAL

4. Validate them

sp_helpdb TEST_FILEGROUP
Name
------------------
TEST_FILEGROUP
 TEST_FILEGROUP_log
 NEW_DATA_FILE

5. Create a table [Employee] on primary file group and insert some data

--Create a Table on primary file group and Insert some data rows
 USE [TEST_FILEGROUP]
 CREATE TABLE Employee(ID Int Identity(1000,1),Name Varchar(20))
USE [TEST_FILEGROUP]
 INSERT INTO Employee (Name)
 SELECT 'John'
 UNION ALL
 SELECT 'Tim'
 UNION ALL
 SELECT 'Tracy'
 UNION ALL
 SELECT 'Jim'
 UNION ALL
 SELECT 'Ancy'

6. Create another table [HRRECORDS] on the additional file group and Insert some data

--Create another table on the additional file group and Insert some data
 USE [TEST_FILEGROUP]
 CREATE TABLE HRRECORDS(ID Int Identity(1000,1),Description Varchar(20))
 ON ADDITIONAL

7. Now we will proceed to take a file group backup for the purpose of this demo

--Take a ADDITIONAL filegroup backup for purpose
 BACKUP DATABASE TEST_FILEGROUP
 FILEGROUP='ADDITIONAL'
 TO DISK='C:\TestBackup\Additional_FileGroup.bak'
Processed 16 pages for database 'TEST_FILEGROUP', file 'NEW_DATA_FILE' on file 1.
 Processed 5 pages for database 'TEST_FILEGROUP', file 'TEST_FILEGROUP_log' on file 1.
 BACKUP DATABASE...FILE=<name> successfully processed 21 pages in 0.318 seconds (0.495 MB/sec).

Now this is the real interesting part of this demo.We are going to simulate an error situation –

We are going to stop the SQL Engine service to delete the additional data file(.ndf file).Once the service is stopped we will be able to delete the ndf file.

Note – This is just for a demo purpose and should not be simulated in real time production environment.[Word of caution before the CTO/Manager gives you surprises! ]

Once the ndf file is deleted,start the engine and you will observe the below error straight away if you try to access our demo database.

This was expected and simply means that deleting ndf file caused failure for the database.What are we going to do now to bring this database up and running?

Definitely we can restore the backup to bring this up,however just think about this situation.Your database backups are huge as we might be dealing with a huge database and users have to wait until the whole backup set is restored.

Do we have a RTO of around 45 mins – 1 hr? Do we really need to wait for the whole restore to complete to fix and issue with another file group before users can connect to the database and access tables which are residing in Primary file group?

The short and sweet answer to this question is  – NO,starting SQL 2005,database can be made available to users as soon as the primary file group is up and running for a database.

Now lets go back to our situation were database is offline because of a corrupted/missing .ndf file.

Users compromised(This should be actually part of DR strategy and should not be decided at the last minute) that they can work with out table HRRECORDS,the one which was residing in the additional file group which just failed. The users just need Employee  table to continue their work.

Wow!!! Now we can feel some fresh air to breath.

8. We can acheive this by taking the additional data file offline

Note – If we make this file offline,we can bring this back only using a file/file grp backup or a regular database full backup.

--Taking additional file offline
ALTER
DATABASE TEST_FILEGROUP
MODIFY
FILE (NAME='NEW_DATA_FILE', offline);

We will need to recycle the service once again for changes to take effect and we can verify this change by checking the sys.database_files table

SELECT name,state_desc from
sys.database_files
Name                         state_desc
 TEST_FILEGROUP              ONLINE
 TEST_FILEGROUP_log          ONLINE
 NEW_DATA_FILE               OFFLINE

9. Now as the file is offline the database is accessable.

Our query to Employee Table will give details like

SELECT TOP 100 [ID]
 ,[Name]
 FROM [TEST_FILEGROUP].[dbo].[Employee]
ID    Name
1000 John
1001 Tim
1002 Tracy
1003 Jim
1004 Ancy

If we attempt to query the table on the additional data file will give an error like

SELECT TOP 100 [ID]
 ,[Description]
 FROM [TEST_FILEGROUP].[dbo].[HRRECORDS]

Msg 8653, Level 16, State 1, Line 1
The query processor is unable to produce a plan for the table or view ‘HRRECORDS’ because the table resides in a filegroup which is not online.

This is Partial Database Availability were one whole database is available without some tables and we have achieved this using File Groups/Files.

Now how can we bring this ndf file back from the backup? Here is the process to show that to you –

1.We will restore the file group from the backup which we had taken earlier

RESTORE DATABASE TEST_FILEGROUP
FILEGROUP = 'ADDITIONAL' FROM DISK = 'C:\TestBackup\Additional_FileGroup.bak' WITH RECOVERY

Oops we are missing something here –

/*Msg 3159, Level 16, State 1, Line 1
The tail of the log for the database “TEST_FILEGROUP” has not been backed up. Use BACKUP LOG WITH NORECOVERY to backup the log if it contains work you do not want to lose. Use the WITH REPLACE or WITH STOPAT clause of the RESTORE statement to just overwrite the contents of the log.
Msg 3013, Level 16, State 1, Line 1
RESTORE DATABASE is terminating abnormally. */

What does the message says – It says that the tail of the log is not backed up and it needs to be done before we do a restore of the file group.

2. So lets go ahead and backup the tail of the log

BACKUP LOG TEST_FILEGROUP TO DISK='C:\TestBackup\Tail.trn' WITH NORECOVERY
/*Processed 10 pages for database 'TEST_FILEGROUP', file 'TEST_FILEGROUP_log' on file 1.
BACKUP LOG successfully processed 10 pages in 0.223 seconds (0.345 MB/sec).*/

3. Lets try to restore the file Group now.

RESTORE DATABASE TEST_FILEGROUP
FILEGROUP = 'ADDITIONAL' FROM DISK = 'C:\TestBackup\Additional_FileGroup.bak' WITH RECOVERY

I specifically used RECOVERY here for the restore command to show the error message and show what is the need of tail of log backup.

As soon as we run the above command we will get another message

Processed 16 pages for database 'TEST_FILEGROUP', file 'NEW_DATA_FILE' on file 1.
Processed 5 pages for database 'TEST_FILEGROUP', file 'TEST_FILEGROUP_log' on file 1.
The roll forward start point is now at log sequence number (LSN) 21000000019100001. 
Additional roll forward past LSN 21000000024300001 is required to complete the restore sequence.
This RESTORE statement successfully performed some actions, 
but the database could not be brought online because one or more RESTORE steps are needed. 
Previous messages indicate reasons why recovery cannot occur at this point.
RESTORE DATABASE ... FILE=<name> successfully processed 21 pages in 0.214 seconds (0.736 MB/sec).

4.Finally we will bring the database up by restoring the tail of log backup

RESTORE DATABASE TEST_FILEGROUP
FROM DISK = 'C:\TestBackup\Tail.trn' WITH RECOVERY
Processed 0 pages for database 'TEST_FILEGROUP', file 'TEST_FILEGROUP' on file 1.
Processed 0 pages for database 'TEST_FILEGROUP', file 'NEW_DATA_FILE' on file 1.
Processed 7 pages for database 'TEST_FILEGROUP', file 'TEST_FILEGROUP_log' on file 1.
RESTORE LOG successfully processed 7 pages in 0.117 seconds (0.463 MB/sec).

Now our database is completely available with both tables and as a test case we can just query HRRECORDS table to validate data

SELECT TOP 100 [ID]
 ,[Description]
 FROM [TEST_FILEGROUP].[dbo].[HRRECORDS]
ID   Description
1000 IT Spec
1001 DBA
1002 Developer
1003 Java Guy
1004 .NetSpec

Conclusion  – Partial database availability is very much useful for huge databases and you have all your secondary file groups to store historical data and primary file group is critical for business.

Backup and restore of file/file groups is a very interesting topic and I will simulate this feature in SQL 2012 to see if there are any changes and will come up with more details.

I would love to hear your experience dealing with file groups and thanks for reading.