Greetings. We've got a 2012 Availability Group that has been well established for several years. We have 4 different jobs to backup Transaction Logs, each job backing up 4 databases. This will run fine > 99% of the time, then all of a sudden one of the backups will error out with the following message:
Executed as user: mydomain\mylogin. The connection to the primary replica is not active. The command cannot be processed. [SQLSTATE 42000] (Error 35250) BACKUP LOG is terminating abnormally. [SQLSTATE 42000] (Error 3013). NOTE: The step was retried the requested number of times (2) without succeeding. The step failed.
Then all of a sudden it will start working again. It may be on the next attempt, or it may be an hour later.
Here are several important fun facts:
- This issue just started a couple weeks ago.
- So far it's happened on 2 of the jobs, for a total of 3 DB's.
- There are no useful messages anywhere other than what I posted (Event Viewer, Cluster Manager, sql log, etc).
- As mentioned, this will suddenly start working again on it's own.
- While this job is failing, a different backup job can be running w/o issue (for another DB in the same AG).
- While this job is failing, I can take the same backup for the same DB on the Primary node w/o issue.
- The AG Dashboard show all green lights while this is happening.
Any ideas?
Thanks in advance! ChrisRDBA