Partial database availability – A walk through

Partial database availability is an exiting feature,and I decided to write this blog post after observing many doubts related to this feature in forums.

Lets assume a situation like mentioned below –

We have a database with multiple file groups and data files reside separately in respective file groups.Now assume a situation where we have a severe disk failure and one of the .ndf file residing drive is corrupted!

This will make the database inaccessible.We have multiple options to recover from this situation and one among them is to do a restore of the database using backup sets.Think of a situation where our database is super large and a restore will take around 30 – 45mins.

Do we really want our database users to wait until we complete the restore? What if we give them a portion of the database online,while we work on the recovery part and bring everything online slowly.

Wow!!! (Business will just love these ideas as soon as I tell them).However this one solution is not so simple and require lot of planning,testing and the application should be able to work without a portion of data.

Lets do a demo of this situation and understand how we can achieve partial database availability –

1. We will create a demo database

--Created a Database
CREATE DATABASE TEST_FILEGROUP

2. Create a new file group

--Create a new FileGroup
ALTER DATABASE TEST_FILEGROUP
ADD FILEGROUP ADDITIONAL

3. Add one additional data file to the database

--Add an additional data file to the database
ALTER DATABASE TEST_FILEGROUP
ADD FILE (NAME='NEW_DATA_FILE',
FILENAME='D:\Program Files\Microsoft SQL Server\MSSQL10_50.SQL2008R2RD\MSSQL\DATA\NEW_DATA_FILE.ndf')
TO FILEGROUP ADDITIONAL

4. Validate them

sp_helpdb TEST_FILEGROUP
Name
------------------
TEST_FILEGROUP
 TEST_FILEGROUP_log
 NEW_DATA_FILE

5. Create a table [Employee] on primary file group and insert some data

--Create a Table on primary file group and Insert some data rows
 USE [TEST_FILEGROUP]
 CREATE TABLE Employee(ID Int Identity(1000,1),Name Varchar(20))
USE [TEST_FILEGROUP]
 INSERT INTO Employee (Name)
 SELECT 'John'
 UNION ALL
 SELECT 'Tim'
 UNION ALL
 SELECT 'Tracy'
 UNION ALL
 SELECT 'Jim'
 UNION ALL
 SELECT 'Ancy'

6. Create another table [HRRECORDS] on the additional file group and Insert some data

--Create another table on the additional file group and Insert some data
 USE [TEST_FILEGROUP]
 CREATE TABLE HRRECORDS(ID Int Identity(1000,1),Description Varchar(20))
 ON ADDITIONAL

7. Now we will proceed to take a file group backup for the purpose of this demo

--Take a ADDITIONAL filegroup backup for purpose
 BACKUP DATABASE TEST_FILEGROUP
 FILEGROUP='ADDITIONAL'
 TO DISK='C:\TestBackup\Additional_FileGroup.bak'
Processed 16 pages for database 'TEST_FILEGROUP', file 'NEW_DATA_FILE' on file 1.
 Processed 5 pages for database 'TEST_FILEGROUP', file 'TEST_FILEGROUP_log' on file 1.
 BACKUP DATABASE...FILE=<name> successfully processed 21 pages in 0.318 seconds (0.495 MB/sec).

Now this is the real interesting part of this demo.We are going to simulate an error situation –

We are going to stop the SQL Engine service to delete the additional data file(.ndf file).Once the service is stopped we will be able to delete the ndf file.

Note – This is just for a demo purpose and should not be simulated in real time production environment.[Word of caution before the CTO/Manager gives you surprises! ]

Once the ndf file is deleted,start the engine and you will observe the below error straight away if you try to access our demo database.

This was expected and simply means that deleting ndf file caused failure for the database.What are we going to do now to bring this database up and running?

Definitely we can restore the backup to bring this up,however just think about this situation.Your database backups are huge as we might be dealing with a huge database and users have to wait until the whole backup set is restored.

Do we have a RTO of around 45 mins – 1 hr? Do we really need to wait for the whole restore to complete to fix and issue with another file group before users can connect to the database and access tables which are residing in Primary file group?

The short and sweet answer to this question is – NO,starting SQL 2005,database can be made available to users as soon as the primary file group is up and running for a database.

Now lets go back to our situation were database is offline because of a corrupted/missing .ndf file.

Users compromised(This should be actually part of DR strategy and should not be decided at the last minute) that they can work with out table HRRECORDS,the one which was residing in the additional file group which just failed. The users just need Employee table to continue their work.

Wow!!! Now we can feel some fresh air to breath.

8. We can acheive this by taking the additional data file offline

Note – If we make this file offline,we can bring this back only using a file/file grp backup or a regular database full backup.

--Taking additional file offline
ALTER
DATABASE TEST_FILEGROUP
MODIFY
FILE (NAME='NEW_DATA_FILE', offline);

We will need to recycle the service once again for changes to take effect and we can verify this change by checking the sys.database_files table

SELECT name,state_desc from
sys.database_files
Name                         state_desc
 TEST_FILEGROUP              ONLINE
 TEST_FILEGROUP_log          ONLINE
 NEW_DATA_FILE               OFFLINE

9. Now as the file is offline the database is accessable.

Our query to Employee Table will give details like

SELECT TOP 100 [ID]
 ,[Name]
 FROM [TEST_FILEGROUP].[dbo].[Employee]
ID    Name
1000 John
1001 Tim
1002 Tracy
1003 Jim
1004 Ancy

If we attempt to query the table on the additional data file will give an error like

SELECT TOP 100 [ID]
 ,[Description]
 FROM [TEST_FILEGROUP].[dbo].[HRRECORDS]

Msg 8653, Level 16, State 1, Line 1
The query processor is unable to produce a plan for the table or view ‘HRRECORDS’ because the table resides in a filegroup which is not online.

This is Partial Database Availability were one whole database is available without some tables and we have achieved this using File Groups/Files.

Now how can we bring this ndf file back from the backup? Here is the process to show that to you –

1.We will restore the file group from the backup which we had taken earlier

RESTORE DATABASE TEST_FILEGROUP
FILEGROUP = 'ADDITIONAL' FROM DISK = 'C:\TestBackup\Additional_FileGroup.bak' WITH RECOVERY

Oops we are missing something here –

/*Msg 3159, Level 16, State 1, Line 1
The tail of the log for the database “TEST_FILEGROUP” has not been backed up. Use BACKUP LOG WITH NORECOVERY to backup the log if it contains work you do not want to lose. Use the WITH REPLACE or WITH STOPAT clause of the RESTORE statement to just overwrite the contents of the log.
Msg 3013, Level 16, State 1, Line 1
RESTORE DATABASE is terminating abnormally. */

What does the message says – It says that the tail of the log is not backed up and it needs to be done before we do a restore of the file group.

2. So lets go ahead and backup the tail of the log

BACKUP LOG TEST_FILEGROUP TO DISK='C:\TestBackup\Tail.trn' WITH NORECOVERY
/*Processed 10 pages for database 'TEST_FILEGROUP', file 'TEST_FILEGROUP_log' on file 1.
BACKUP LOG successfully processed 10 pages in 0.223 seconds (0.345 MB/sec).*/

3. Lets try to restore the file Group now.

RESTORE DATABASE TEST_FILEGROUP
FILEGROUP = 'ADDITIONAL' FROM DISK = 'C:\TestBackup\Additional_FileGroup.bak' WITH RECOVERY

I specifically used RECOVERY here for the restore command to show the error message and show what is the need of tail of log backup.

As soon as we run the above command we will get another message

Processed 16 pages for database 'TEST_FILEGROUP', file 'NEW_DATA_FILE' on file 1.
Processed 5 pages for database 'TEST_FILEGROUP', file 'TEST_FILEGROUP_log' on file 1.
The roll forward start point is now at log sequence number (LSN) 21000000019100001. 
Additional roll forward past LSN 21000000024300001 is required to complete the restore sequence.
This RESTORE statement successfully performed some actions, 
but the database could not be brought online because one or more RESTORE steps are needed. 
Previous messages indicate reasons why recovery cannot occur at this point.
RESTORE DATABASE ... FILE=<name> successfully processed 21 pages in 0.214 seconds (0.736 MB/sec).

4.Finally we will bring the database up by restoring the tail of log backup

RESTORE DATABASE TEST_FILEGROUP
FROM DISK = 'C:\TestBackup\Tail.trn' WITH RECOVERY
Processed 0 pages for database 'TEST_FILEGROUP', file 'TEST_FILEGROUP' on file 1.
Processed 0 pages for database 'TEST_FILEGROUP', file 'NEW_DATA_FILE' on file 1.
Processed 7 pages for database 'TEST_FILEGROUP', file 'TEST_FILEGROUP_log' on file 1.
RESTORE LOG successfully processed 7 pages in 0.117 seconds (0.463 MB/sec).

Now our database is completely available with both tables and as a test case we can just query HRRECORDS table to validate data

SELECT TOP 100 [ID]
 ,[Description]
 FROM [TEST_FILEGROUP].[dbo].[HRRECORDS]
ID   Description
1000 IT Spec
1001 DBA
1002 Developer
1003 Java Guy
1004 .NetSpec

Conclusion – Partial database availability is very much useful for huge databases and you have all your secondary file groups to store historical data and primary file group is critical for business.

Backup and restore of file/file groups is a very interesting topic and I will simulate this feature in SQL 2012 to see if there are any changes and will come up with more details.

I would love to hear your experience dealing with file groups and thanks for reading.