I have a 3-node Windows Server 2012 R2 cluster with a number of SQL 2012 SP2 instances installed. I've recently deployed some of the cumulative updates to two of the nodes as part of an upgrade process. All of my SQL instances fail over on to the upgrade nodes and come up online without incident, bar one! This instance is used by a SharePoint 2013 SP1 July 2014 CU application and has a number of configuration and content databases deployed to it. If I try and fail this cluster resource on to one of the upgraded nodes, the cluster groups moves over, the disk resources, VNN and IP address all come online without a hitch, but when the database engine tries to start the cluster resource just hangs in its "Online Pending" state until the 3 minute timeout is reached, the resource goes in to a failed state and the cluster group fails back on to the other node. <o:p></o:p>
The other node in this case has SQL 2012 SP2 installed with no cumulative updates applied. The resource comes online successfully on this node without a hitch.<o:p></o:p>
Looking at the logs on the upgrade cluster node I can see the database engine service starting, and indeed it does startup fully. Therefore once I get the message that the SQL Server instance is ready to accept connections I can start a session and connect to the instance successfully. However, the cluster resource still remains in an "Online Pending" state.<o:p></o:p>
Once the timeout limit is reached, a message is logged in the Event Logs stating a "Failover clustering resource deadlock". I've included the contents of the Report.wer file for reference:<o:p></o:p>
Version=1
EventType=Failover clustering resource deadlock
EventTime=130785879000514759
ReportType=1
Consent=1
ReportIdentifier=a1205c49-1103-11e5-80cc-e83935146bc3
Response.type=4
Sig[0].Name=Resource Name
Sig[0].Value=SQL Server (MYINSTANCE)
Sig[1].Name=Resource Type
Sig[1].Value=SQL Server
Sig[2].Name=Call Type
Sig[2].Value=ONLINERESOURCE
DynamicSig[1].Name=OS Version
DynamicSig[1].Value=6.3.9600.2.0.0.272.7
DynamicSig[2].Name=Locale ID
DynamicSig[2].Value=2057
FriendlyEventName=Failover clustering resource deadlock
ConsentKey=Failover clustering Resource Host Monitor
AppName=Failover Cluster Resource Host Subsystem
AppPath=C:\Windows\Cluster\rhs.exe
ReportDescription=Failover clustering resource deadlock
ApplicationIdentity=00000000000000000000000000000000<o:p></o:p>
We do have other instances of the SharePoint application deployed elsewhere in our estate, and we are not experiencing these problems with any of the SQL Server instances that have had 2012 SP2 CU6 applied.<o:p></o:p>
If I run the Get-ClusterLog command on the upgraded cluster node following an attempted failover, the only messages of any note coming out of these logs are as follows:<o:p></o:p>
ERR [RES] SQL Server <SQL Server (MYINSTANCE)>: [sqsrvres] ODBC Error: [08001] [Microsoft][SQL Server Native Client 11.0]SQL Server
Network Interfaces: Error Locating Server/Instance Specified [xFFFFFFFF]. (268435455)
ERR [RES] SQL Server <SQL Server (MYINSTANCE)>: [sqsrvres] ODBC Error: [HYT00] [Microsoft][SQL Server Native Client 11.0]Login timeout expired (0)
ERR [RES] SQL Server <SQL Server (MYINSTANCE)>: [sqsrvres] ODBC Error: [08001] [Microsoft][SQL Server Native Client 11.0]A network-related or instance-specific error has occurred while establishing
a connection to SQL Server. Server is not found or not accessible. Check if instance name is correct and if SQL Server is configured to allow remote connections. For more information see SQL Server Books Online. (268435455)
INFO [RES] SQL Server <SQL Server (MYINSTANCE)>: [sqsrvres] Could not connect to SQL Server (rc -1)
INFO [RES] SQL Server <SQL Server (MYINSTANCE)>: [sqsrvres] SQLDisconnect returns following information
ERR [RES] SQL Server <SQL Server (MYINSTANCE)>: [sqsrvres] ODBC Error: [08003] [Microsoft][ODBC Driver Manager] Connection not open (0)<o:p></o:p>
So that would suggest an inability to connect to the SQL instance successfully (although as I have already mentioned, I was able to connect once I'd seen that the DB engine was up and running on the node using SQLCMD). So my guess is there's a cluster service account that's struggling to make its connection to the SQL instance but I can't tell what account that is.<o:p></o:p>
Any thoughts / pointers / fixes are very much appreciated!<o:p></o:p>
Phil<o:p></o:p>