Today I came across an error for the Distribution clean up: distribution job in one of the environments
Executed as user: Domain\Sqlagentaccount. Could not remove directory ‘\\MachineName\ReplData\unc\MachineName_Replica_Replica10_PUB\20120403180393\’. Check the security context of xp_cmdshell and close other processes that may be accessing the directory. [SQLSTATE 42000] (Error 20015). The step failed.
[Note – I have altered the error message contents for security reasons]
This job was running fine and it started giving issues suddenly.
Troubleshooting steps –
1. As a first step I checked if xp_cmdshell is configured or not.It was found that xp_cmdshell was indeed enabled.
2.I started to dig into the job to see what it runs.The job runs a stored procedure
EXEC dbo.sp_MSdistribution_cleanup @min_distretention = 0, @max_distretention = 72
3.When this is triggered from a job the Agent account is used,hence I decided to run this from SSMS query window.
I got the same error message as mentioned above,along with this
Replication-@rowcount_only parameter must be the value 0,1, or 2. 0=7.0 compatible checksum. 1=only check rowcou: agent distribution@rowcount_only parameter must be the value 0,1, or 2. 0=7.0 compatible checksum. 1=only scheduled for retry. Could not clean up the distribution transaction tables.
4. I was pretty sure that these error messages are little misleading type and decided to go ahead with verifying the security permission of the UNC share.
5. Initially I was focusing on the security permissions of the unc share,and assigned both the Agent account and the Database Engine account full_control
6. Ran the job again and it failed yet again.
7. I decided to do some R&D via web and found this blog post from SQL Server Support team.This post was pointing that the SQL Account should also have full_control for the UNC share.
8. I went ahead and granted full_control to SQL Account for the UNC share and the issue was resolved.
This was indeed a strange behavior because the job was running fine before with the SQL Account being part of the UNC share full_control list.
The only change which had happened within the environment is SP4 upgrade and this should not have caused this trouble.
As a test case I removed the permission of the SQL Account once again for the UNC share and tried to run the job.This was successful,which was yet again a strange behaviour.
This particular behavior is not documented anywhere nor this has been noticed by many people within the SQL Family,hence in case you face the same situation,then you might have to double check the permissions for the UNC share to isolate the issue and get a quick solution.
Thanks for reading.