SQL Cluster unexpected failover

So we had one of our SQL clusters unexpectedly failover recently. Second time in a few months. Two node active/passive SQL 2012 cluster running on Windows 2012 Standard.

Here's what we could cull from the application/system logs?

1. "

Cluster resource 'SQLServer' of type 'SQL Server' in clustered role 'SQLServerRole' failed.

Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it. Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet."

2. "

Cluster resource 'SQLServer' (resource type 'SQL Server', DLL 'sqsrvres.dll') did not respond to a request in a timely fashion. Cluster health detection will attempt to automatically recover by terminating the Resource Hosting Subsystem (RHS) process running this resource. This may affect other resources hosted in the same RHS process. The resources will then be restarted.

The suspect resource 'SQLServer' will be marked to run in an isolated RHS process to avoid impacting multiple resources in the event that this resource failure occurs again. Please ensure services, applications, or underlying infrastructure (such as storage or networking) associated with the suspect resource is functioning properly."

3. "The cluster Resource Hosting Subsystem (RHS) stopped unexpectedly. An attempt will be made to restart it. This is usually associated with recovery of a crashed or deadlocked resource. Please determine which resource and resource DLL is causing the issue and verify it is functioning properly."

4. "A timeout (30000 milliseconds) was reached while waiting for a transaction response from the MSSQLSERVER service."

Cluster.log wasn't much more helpful on the root cause either:

00000f28.00001c78::2014/12/04-21:25:54.662 INFO [RES] Network Name <Cluster Name>: Netbios: Slow Operation, FinishWithReply: 0
00000f28.00001c78::2014/12/04-21:25:54.662 INFO [RES] Network Name: [NN] got sync reply: 0
00000f28.00001c78::2014/12/04-21:25:54.662 INFO [RES] Network Name <Cluster Name>: Netbios: End of Slow Operation, state: Initialized/Idle, prevWorkState: Idle
00000f20.00000e94::2014/12/04-21:25:55.240 INFO [RES] SQL Server Agent <SQL Server Agent>: [sqagtres] IsAlive request.
00000f20.00000e94::2014/12/04-21:25:55.240 INFO [RES] SQL Server Agent <SQL Server Agent>: [sqagtres] CheckServiceAlive: returning TRUE (success)
00001134.000001d8::2014/12/04-21:25:57.287 ERR [RES] SQL Server <SQLServer>: [sqsrvres] Failure detected, diagnostics heartbeat is lost
00001134.000001d8::2014/12/04-21:25:57.287 INFO [RES] SQL Server <SQLServer>: [sqsrvres] IsAlive returns FALSE
00001134.000001d8::2014/12/04-21:25:57.287 WARN [RHS] Resource SQLServer IsAlive has indicated failure.
00000880.0000161c::2014/12/04-21:25:57.303 INFO [NM] Received request from client address HOST-XXX-SQL02.
00000880.0000161c::2014/12/04-21:25:57.303 INFO [RCM] HandleMonitorReply: FAILURENOTIFICATION for 'SQLServer', gen(3) result 1/0.
00000880.000023a4::2014/12/04-21:25:57.303 INFO [GEM] Sending 1 messages as a batched GEM message
00000880.0000161c::2014/12/04-21:25:57.303 INFO [RCM] Res SQLServer: Online -> ProcessingFailure( StateUnknown )
00000880.0000161c::2014/12/04-21:25:57.303 INFO [RCM] TransitionToState(SQLServer) Online-->ProcessingFailure.
00000880.0000161c::2014/12/04-21:25:57.318 INFO [RCM] rcm::RcmGroup::UpdateStateIfChanged: (SQLServerRole, Online --> Pending)
00000880.00001db8::2014/12/04-21:25:57.334 INFO [GEM] Sending 1 messages as a batched GEM message
00000880.0000161c::2014/12/04-21:25:57.334 ERR [RCM] rcm::RcmResource::HandleFailure: (SQLServer)
00000880.00001db8::2014/12/04-21:25:57.334 INFO [GEM] Sending 1 messages as a batched GEM message
00000880.00000bac::2014/12/04-21:25:57.334 INFO [RCM] ignored non-local state Pending for group SQLServerRole
00000880.0000161c::2014/12/04-21:25:57.350 INFO [RCM] resource SQLServer: failure count: 1, restartAction: 2 persistentState: 1.
00000880.0000161c::2014/12/04-21:25:57.350 INFO [RCM] Greater than restartPeriod time has elapsed since first failure of SQLServer, resetting failureTime and failureCount.
00000880.0000161c::2014/12/04-21:25:57.350 INFO [RCM] Will queue immediate restart (500 milliseconds) of SQLServer after terminate is complete."

Any ideas? Anywhere we could look for more specific info? Any preventative measures we could take?

Thanks,

Ryan

SQL Cluster unexpected failover

Trending Articles

RAMAYAMPET Mandal Sarpanch | Upa-Sarpanch | Ward member Mobile Numbers Medak...

लड़कियां सेक्स के दौरान क्यों करती है उह! आह!लड़कियां सेक्स के दौरान क्यों करती...

Neem Baba Extra Questions Answer Class 6 English Poorvi

Throw Back: 4×4 — Sikilitele (Ft Castro) Prod by JQ

Rajasthan Board 10th Result 2016 Roll No wise & Name Wise

Lowe faces four theft charges

Practice Sheet of Right form of verbs for HSC Students

Mafia, Murder & Mayhem In The Motor City: Detroit Mob Hit Timeline (1937-2007)

The 10 Tennessee Cities With The Largest Black Population For 2021

Materials Around Us Class 6 Worksheet Science Chapter 6

デスクトップヒープの枯渇

Best Suvichar in Hindi |बेस्ट सुविचार |शुभ विचार हिंदी में

Kanulanu Thaake Lyrics and translation | Manam (2014)

Korean Sex Porn Videos: XXX Videos & Free Porn Movies

Teen Shot In Miami Drive-By Dies From Injuries

Download: IQ Muzatasha feat Shy D & Pmj – Ulesi NiFertilizer Yamavuto

Mahakal Attitude Status

Property developer set up cannabis factory to help pay off debts...

♡

KB: How to troubleshoot issues when adding a Hyper-V host in System Center...