Hi
I've been searching for an answer to this issue for days now and I can't find anything, I'll do my best to explain the situation and ask for your input.
We have 2 blade servers; both freshly installed with server 2012 R2 and updated to the lastest updates + special update Windows8.1-KB2962409-x64.
After I've added them both in a failover cluster with a EMC san as storage I can failover from 1 node to the other and vice versa, the storage follows.
Afterwards I installed SQL server 2014 on node2 and afterwards I used the same setup to add node1 to the SQL 2014 cluster. I get all green checkmarks at the end of the installation meaning everything went without errors. SQL is online on node 2 and can be interacted
with over the network or locally.
Then, when I try to failover from Node2 to Node1, my IP and Storage come online, but SQL server takes a very long time and then throws an error that it failed.
When I then use the powershell command to get the cluster logs, I see the follwing lines that are relevant to the error:
0000071c.00000bbc::2015/07/10-06:04:55.297 INFO [RCM] Res SQL Server: OnlineCallIssued -> OnlinePending( StateUnknown ) 0000071c.00000bbc::2015/07/10-06:04:55.297 INFO [RCM] TransitionToState(SQL Server) OnlineCallIssued-->OnlinePending. 00000dd8.00000ba0::2015/07/10-06:04:55.312 INFO [RES] SQL Server <SQL Server>: [sqsrvres] XEvent session MSSQLSERVER is created with RolloverCount 10, MaxFileSizeInMBytes 100, and LogPath 'L:\MSSQL12.MSSQLSERVER\MSSQL\LOG\' 00000dd8.00000ba0::2015/07/10-06:04:55.312 INFO [RES] SQL Server <SQL Server>: [sqsrvres] Extended Event logging is started 00000dd8.00000ba0::2015/07/10-06:04:55.312 INFO [RES] SQL Server <SQL Server>: [sqsrvres] The private property VerboseLogging is 0 00000dd8.00000ba0::2015/07/10-06:04:55.312 INFO [RES] SQL Server <SQL Server>: [sqsrvres] The private property HealthCheckTimeout is 60000 00000dd8.00000ba0::2015/07/10-06:04:55.312 INFO [RES] SQL Server <SQL Server>: [sqsrvres] The private property FailureConditionLevel is 3 00000dd8.00000ba0::2015/07/10-06:04:55.312 INFO [RES] SQL Server <SQL Server>: [sqsrvres] The private property SqlDumperDumpFlags is 0x0 00000dd8.00000ba0::2015/07/10-06:04:55.312 INFO [RES] SQL Server <SQL Server>: [sqsrvres] The private property SqlDumperDumpTimeOut is 0 00000dd8.00000ba0::2015/07/10-06:04:55.312 INFO [RES] SQL Server <SQL Server>: [sqsrvres] The private property SqlDumperDumpPath is '' 00000dd8.00000ba0::2015/07/10-06:04:55.312 INFO [RES] SQL Server <SQL Server>: [sqsrvres] The property LogIsEnabled is 1 00000dd8.00000ba0::2015/07/10-06:04:55.312 INFO [RES] SQL Server <SQL Server>: [sqsrvres] The property LogFileRolloverCount is 10 00000dd8.00000ba0::2015/07/10-06:04:55.312 INFO [RES] SQL Server <SQL Server>: [sqsrvres] The property LogMaxFileSizeInMBytes is 100 00000dd8.00000ba0::2015/07/10-06:04:55.312 INFO [RES] SQL Server <SQL Server>: [sqsrvres] The property LogPath is '' 00000dd8.00000ba0::2015/07/10-06:04:55.312 INFO [RES] SQL Server <SQL Server>: [sqsrvres] Server name is SQL2014TEST 00000dd8.00000ba0::2015/07/10-06:04:55.312 INFO [RES] SQL Server <SQL Server>: [sqsrvres] Service name is MSSQLSERVER 00000dd8.00000ba0::2015/07/10-06:04:55.312 INFO [RES] SQL Server <SQL Server>: [sqsrvres] Dependency expression for resource 'SQL Network Name (SQL2014test)' is '([1f0618d0-e95e-4e40-b14e-66252b010030])' 00000dd8.00000ba0::2015/07/10-06:04:55.312 INFO [RES] SQL Server <SQL Server>: [sqsrvres] Starting service MSSQLSERVER... 0000071c.00000bbc::2015/07/10-06:04:55.562 INFO [NM] Received request from client address NODE1. 0000071c.00000cd0::2015/07/10-06:04:56.437 INFO [NM] Received request from client address NODE1. 00000dd8.00000ba0::2015/07/10-06:04:56.547 INFO [RES] SQL Server <SQL Server>: [sqsrvres] Service is started. SQL Server pid is 524 00000dd8.00000ba0::2015/07/10-06:04:56.547 INFO [RES] SQL Server <SQL Server>: [sqsrvres] Connect to SQL Server ... 00000dd8.00000ba0::2015/07/10-06:04:56.594 INFO [RES] SQL Server <SQL Server>: [sqsrvres] The connection was established successfully 00000dd8.00000ba0::2015/07/10-06:04:58.609 INFO [RES] SQL Server <SQL Server>: [sqsrvres] Run 'EXEC sp_server_diagnostics 20' returns following information 00000dd8.00000ba0::2015/07/10-06:04:58.609 ERR [RES] SQL Server <SQL Server>: [sqsrvres] ODBC Error: [42000] [Microsoft][SQL Server Native Client 11.0][SQL Server]Database 'mssqlsystemresource' is being recovered. Waiting until recovery is finished. (922) 00000dd8.00000ba0::2015/07/10-06:04:58.609 ERR [RES] SQL Server <SQL Server>: [sqsrvres] Failed to run diagnostics command. See previous log for error message 00000dd8.00000ba0::2015/07/10-06:04:58.609 INFO [RES] SQL Server <SQL Server>: [sqsrvres] Disconnect from SQL Server 00000d70.00001370::2015/07/10-06:04:59.609 INFO [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:a3b47601-8ad6-4fbb-815f-31d72569f541:Netbios 0000071c.00000988::2015/07/10-06:05:01.141 INFO [GUM] Node 2: executing request locally, gumId:2760, my action: /dm/update, # of updates: 1 00000dd8.00000ba0::2015/07/10-06:05:03.609 INFO [RES] SQL Server <SQL Server>: [sqsrvres] SQL Server was down 00000d70.00001370::2015/07/10-06:05:04.609 INFO [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:a3b47601-8ad6-4fbb-815f-31d72569f541:Netbios 0000071c.00000988::2015/07/10-06:05:06.141 INFO [GUM] Node 2: executing request locally, gumId:2761, my action: /dm/update, # of updates: 1 00000d70.00001370::2015/07/10-06:05:09.609 INFO [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:a3b47601-8ad6-4fbb-815f-31d72569f541:Netbios 00000d70.00001370::2015/07/10-06:05:14.609 INFO [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:a3b47601-8ad6-4fbb-815f-31d72569f541:Netbios 00000d70.00001370::2015/07/10-06:05:19.609 INFO [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:a3b47601-8ad6-4fbb-815f-31d72569f541:Netbios 00000d70.00000fcc::2015/07/10-06:05:19.656 INFO [RES] Network Name <SQL Network Name (SQL2014test)>: Dns: HealthCheck: SQL2014test 00000d70.00000fcc::2015/07/10-06:05:19.656 INFO [RES] Network Name <SQL Network Name (SQL2014test)>: Dns: End of Slow Operation, state: Initialized/Reading, prevWorkState: Reading 00000d70.00000fcc::2015/07/10-06:05:24.609 INFO [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:a3b47601-8ad6-4fbb-815f-31d72569f541:Netbios 00000d70.00000fcc::2015/07/10-06:05:29.609 INFO [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:a3b47601-8ad6-4fbb-815f-31d72569f541:Netbios 00000d70.00000fcc::2015/07/10-06:05:34.609 INFO [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:a3b47601-8ad6-4fbb-815f-31d72569f541:Netbios 00000d70.00000fcc::2015/07/10-06:05:39.609 INFO [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:a3b47601-8ad6-4fbb-815f-31d72569f541:Netbios 00000d70.00000fcc::2015/07/10-06:05:44.609 INFO [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:a3b47601-8ad6-4fbb-815f-31d72569f541:Netbios 00000d70.00000fcc::2015/07/10-06:05:49.609 INFO [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:a3b47601-8ad6-4fbb-815f-31d72569f541:Netbios 00000d90.00000edc::2015/07/10-06:05:53.453 INFO [RES] Physical Disk <Cluster Disk 1>: VolumeIsNtfs: Volume \\?\GLOBALROOT\Device\Harddisk2\ClusterPartition1\ has FS type NTFS 00000d70.00000fcc::2015/07/10-06:05:54.609 INFO [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:a3b47601-8ad6-4fbb-815f-31d72569f541:Netbios 00000d70.00000fcc::2015/07/10-06:05:59.609 INFO [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:a3b47601-8ad6-4fbb-815f-31d72569f541:Netbios 00000d70.00000fcc::2015/07/10-06:06:04.609 INFO [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:a3b47601-8ad6-4fbb-815f-31d72569f541:Netbios 0000071c.00001294::2015/07/10-06:06:09.234 INFO [DCM] HandleSweeperRecheck 0000071c.00001294::2015/07/10-06:06:09.234 INFO [CLI] LsaCallAuthenticationPackage: 0, 0 size: 4, buffer: HDL( 8191a20000 ) 0000071c.00001294::2015/07/10-06:06:09.281 ERR [RCM] [GIM] ResType Virtual Machine has no resources, not collecting local utilization info 0000071c.00001294::2015/07/10-06:06:09.281 INFO [RCM] [GIM] Scheduling Local Node Crawler to run in 300000 millisec. 00000d70.00000fcc::2015/07/10-06:06:09.609 INFO [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:a3b47601-8ad6-4fbb-815f-31d72569f541:Netbios 00000d70.00000fcc::2015/07/10-06:06:14.609 INFO [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:a3b47601-8ad6-4fbb-815f-31d72569f541:Netbios 0000071c.00001294::2015/07/10-06:06:19.094 INFO [NM] Received request from client address NODE1. 0000071c.00001294::2015/07/10-06:06:19.110 INFO [NM] Received request from client address NODE1. 00000d70.00000fcc::2015/07/10-06:06:19.610 INFO [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:a3b47601-8ad6-4fbb-815f-31d72569f541:Netbios 00000d70.000009a4::2015/07/10-06:06:19.656 INFO [RES] Network Name <SQL Network Name (SQL2014test)>: Dns: HealthCheck: SQL2014test 00000d70.000009a4::2015/07/10-06:06:19.656 INFO [RES] Network Name <SQL Network Name (SQL2014test)>: Dns: End of Slow Operation, state: Initialized/Reading, prevWorkState: Reading 00000d70.00001370::2015/07/10-06:06:24.610 INFO [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:a3b47601-8ad6-4fbb-815f-31d72569f541:Netbios 00000d70.00001370::2015/07/10-06:06:29.610 INFO [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:a3b47601-8ad6-4fbb-815f-31d72569f541:Netbios 00000d70.00001370::2015/07/10-06:06:34.610 INFO [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:a3b47601-8ad6-4fbb-815f-31d72569f541:Netbios 00000d70.00001370::2015/07/10-06:06:39.610 INFO [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:a3b47601-8ad6-4fbb-815f-31d72569f541:Netbios 00000d70.00001370::2015/07/10-06:06:44.610 INFO [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:a3b47601-8ad6-4fbb-815f-31d72569f541:Netbios 00000d70.00001370::2015/07/10-06:06:49.610 INFO [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:a3b47601-8ad6-4fbb-815f-31d72569f541:Netbios 00000d90.000013fc::2015/07/10-06:06:53.453 INFO [RES] Physical Disk <Cluster Disk 1>: VolumeIsNtfs: Volume \\?\GLOBALROOT\Device\Harddisk2\ClusterPartition1\ has FS type NTFS 00000d70.00001370::2015/07/10-06:06:54.610 INFO [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:a3b47601-8ad6-4fbb-815f-31d72569f541:Netbios 00000d70.00001370::2015/07/10-06:06:59.610 INFO [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:a3b47601-8ad6-4fbb-815f-31d72569f541:Netbios 00000d70.00001370::2015/07/10-06:07:04.610 INFO [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:a3b47601-8ad6-4fbb-815f-31d72569f541:Netbios 00000d70.00001370::2015/07/10-06:07:09.610 INFO [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:a3b47601-8ad6-4fbb-815f-31d72569f541:Netbios 00000d70.00001370::2015/07/10-06:07:14.610 INFO [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:a3b47601-8ad6-4fbb-815f-31d72569f541:Netbios 00000d70.00001370::2015/07/10-06:07:19.610 INFO [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:a3b47601-8ad6-4fbb-815f-31d72569f541:Netbios 00000d70.00001370::2015/07/10-06:07:19.657 INFO [RES] Network Name <SQL Network Name (SQL2014test)>: Dns: HealthCheck: SQL2014test 00000d70.00001370::2015/07/10-06:07:19.657 INFO [RES] Network Name <SQL Network Name (SQL2014test)>: Dns: End of Slow Operation, state: Initialized/Reading, prevWorkState: Reading 00000d70.00001370::2015/07/10-06:07:24.610 INFO [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:a3b47601-8ad6-4fbb-815f-31d72569f541:Netbios 00000d70.00001370::2015/07/10-06:07:29.610 INFO [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:a3b47601-8ad6-4fbb-815f-31d72569f541:Netbios 00000d70.00001370::2015/07/10-06:07:34.610 INFO [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:a3b47601-8ad6-4fbb-815f-31d72569f541:Netbios 00000d70.00001370::2015/07/10-06:07:39.610 INFO [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:a3b47601-8ad6-4fbb-815f-31d72569f541:Netbios 00000d70.00001370::2015/07/10-06:07:44.610 INFO [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:a3b47601-8ad6-4fbb-815f-31d72569f541:Netbios 00000d70.00001370::2015/07/10-06:07:49.610 INFO [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:a3b47601-8ad6-4fbb-815f-31d72569f541:Netbios 00000d90.00000484::2015/07/10-06:07:53.453 INFO [RES] Physical Disk <Cluster Disk 1>: VolumeIsNtfs: Volume \\?\GLOBALROOT\Device\Harddisk2\ClusterPartition1\ has FS type NTFS 00000d70.00001370::2015/07/10-06:07:54.610 INFO [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:a3b47601-8ad6-4fbb-815f-31d72569f541:Netbios 00000dd8.00000df4::2015/07/10-06:07:55.313 ERR [RHS] RhsCall::DeadlockMonitor: Call ONLINERESOURCE timed out by 16 milliseconds for resource 'SQL Server'. 00000dd8.00000df4::2015/07/10-06:07:55.313 ERR [RHS] Resource SQL Server handling deadlock. Cleaning current operation. 00000dd8.00000df4::2015/07/10-06:07:55.313 ERR [RHS] About to send WER report. 0000071c.00001294::2015/07/10-06:07:55.313 WARN [RCM] HandleMonitorReply: FAILURENOTIFICATION for 'SQL Server', gen(3) result 5018/0. 0000071c.00001294::2015/07/10-06:07:55.313 INFO [RCM] Res SQL Server: OnlinePending -> ProcessingFailure( StateUnknown ) 0000071c.00001294::2015/07/10-06:07:55.313 INFO [RCM] TransitionToState(SQL Server) OnlinePending-->ProcessingFailure. 0000071c.00001294::2015/07/10-06:07:55.313 ERR [RCM] rcm::RcmResource::HandleFailure: (SQL Server) 0000071c.00001294::2015/07/10-06:07:55.313 INFO [RCM] resource SQL Server: failure count: 1, restartAction: 2 persistentState: 1. 0000071c.00001294::2015/07/10-06:07:55.313 INFO [RCM] Resource SQL Server is causing group SQL Server CRG to failover. 0000071c.00001294::2015/07/10-06:07:55.313 INFO [RCM] rcm::RcmGroup::Failover: (SQL Server CRG) 0000071c.00001294::2015/07/10-06:07:55.313 INFO [RCM] time since last failure is greater than failover period; resetting failoverCount to 0. 0000071c.00001294::2015/07/10-06:07:55.313 WARN [RCM] Failing over group SQL Server CRG, failoverCount 1, last time 2015/07/09-15:34:51.750. 0000071c.00001294::2015/07/10-06:07:55.313 INFO [RCM-plcmt] This node is not director, node 1 is. Asking others for placement... 0000071c.00001294::2015/07/10-06:07:55.313 INFO [RCM-plcmt] asking node 1 placement decision, attempt 1 00000dd8.00000df4::2015/07/10-06:07:55.329 ERR [RHS] WER report is submitted. Result : WerReportQueued. 0000071c.00001294::2015/07/10-06:07:55.375 INFO [RCM-plcmt] done waiting... 0000071c.00001294::2015/07/10-06:07:55.375 INFO [RCM-plcmt] Node 1 replied to placement request g=SQL Server CRG tgt=1 wait=false 0000071c.00001294::2015/07/10-06:07:55.375 INFO MTimer(GetPlacementFromDirector): [Start to Multitimer_destroyed : 62 ms 0000071c.00001294::2015/07/10-06:07:55.375 INFO MTimer(GetPlacementFromDirector): [Total: 62 ms ( 0 s )] 0000071c.00001294::2015/07/10-06:07:55.375 INFO [RCM] Res SQL Server: ProcessingFailure -> WaitingToTerminate( Failed ) 0000071c.00000bbc::2015/07/10-06:07:55.375 INFO [RCM] rcm::RcmGroup::FailoverWorker: (SQL Server CRG) 0000071c.00001294::2015/07/10-06:07:55.375 INFO [RCM] TransitionToState(SQL Server) ProcessingFailure-->[WaitingToTerminate to Failed].
Especially see the line:
00000dd8.00000ba0::2015/07/10-06:04:58.609 ERR [RES] SQL Server <SQL Server>: [sqsrvres] ODBC Error: [42000] [Microsoft][SQL Server Native Client 11.0][SQL Server]Database 'mssqlsystemresource' is being recovered. Waiting until recovery is finished. (922)
Things I've tried:
- reinstalling from scratch for node 1, meaning Server 2012 R2 again and SQL 2014 again
- https://msdn.microsoft.com/en-us/library/ms714687.aspx <= 42000 ODBC error means Syntax error or access violation but isn't clear to me what I should change
- changing the account from which the SQL server runs to a domain admin / local admin / network service
...
Does anyone have an idea how to solve this?