Quantcast
Channel: SQL Server High Availability and Disaster Recovery forum
Viewing all articles
Browse latest Browse all 4689

SQL server 2014 fails to start on specific node, works on the other

$
0
0

Hi

I've been searching for an answer to this issue for days now and I can't find anything, I'll do my best to explain the situation and ask for your input.

We have 2 blade servers; both freshly installed with server 2012 R2 and updated to the lastest updates + special update Windows8.1-KB2962409-x64.
After I've added them both in a failover cluster with a EMC san as storage I can failover from 1 node to the other and vice versa, the storage follows.
Afterwards I installed SQL server 2014 on node2 and afterwards I used the same setup to add node1 to the SQL 2014 cluster. I get all green checkmarks at the end of the installation meaning everything went without errors. SQL is online on node 2 and can be interacted with over the network or locally.
Then, when I try to failover from Node2 to Node1, my IP and Storage come online, but SQL server takes a very long time and then throws an error that it failed.
When I then use the powershell command to get the cluster logs, I see the follwing lines that are relevant to the error:

0000071c.00000bbc::2015/07/10-06:04:55.297 INFO  [RCM] Res SQL Server: OnlineCallIssued -> OnlinePending( StateUnknown )
0000071c.00000bbc::2015/07/10-06:04:55.297 INFO  [RCM] TransitionToState(SQL Server) OnlineCallIssued-->OnlinePending.
00000dd8.00000ba0::2015/07/10-06:04:55.312 INFO  [RES] SQL Server <SQL Server>: [sqsrvres] XEvent session MSSQLSERVER is created with RolloverCount 10, MaxFileSizeInMBytes 100, and LogPath 'L:\MSSQL12.MSSQLSERVER\MSSQL\LOG\'
00000dd8.00000ba0::2015/07/10-06:04:55.312 INFO  [RES] SQL Server <SQL Server>: [sqsrvres] Extended Event logging is started
00000dd8.00000ba0::2015/07/10-06:04:55.312 INFO  [RES] SQL Server <SQL Server>: [sqsrvres] The private property VerboseLogging is 0
00000dd8.00000ba0::2015/07/10-06:04:55.312 INFO  [RES] SQL Server <SQL Server>: [sqsrvres] The private property HealthCheckTimeout is 60000
00000dd8.00000ba0::2015/07/10-06:04:55.312 INFO  [RES] SQL Server <SQL Server>: [sqsrvres] The private property FailureConditionLevel is 3
00000dd8.00000ba0::2015/07/10-06:04:55.312 INFO  [RES] SQL Server <SQL Server>: [sqsrvres] The private property SqlDumperDumpFlags is 0x0
00000dd8.00000ba0::2015/07/10-06:04:55.312 INFO  [RES] SQL Server <SQL Server>: [sqsrvres] The private property SqlDumperDumpTimeOut is 0
00000dd8.00000ba0::2015/07/10-06:04:55.312 INFO  [RES] SQL Server <SQL Server>: [sqsrvres] The private property SqlDumperDumpPath is ''
00000dd8.00000ba0::2015/07/10-06:04:55.312 INFO  [RES] SQL Server <SQL Server>: [sqsrvres] The property LogIsEnabled is 1
00000dd8.00000ba0::2015/07/10-06:04:55.312 INFO  [RES] SQL Server <SQL Server>: [sqsrvres] The property LogFileRolloverCount is 10
00000dd8.00000ba0::2015/07/10-06:04:55.312 INFO  [RES] SQL Server <SQL Server>: [sqsrvres] The property LogMaxFileSizeInMBytes is 100
00000dd8.00000ba0::2015/07/10-06:04:55.312 INFO  [RES] SQL Server <SQL Server>: [sqsrvres] The property LogPath is ''
00000dd8.00000ba0::2015/07/10-06:04:55.312 INFO  [RES] SQL Server <SQL Server>: [sqsrvres] Server name is SQL2014TEST
00000dd8.00000ba0::2015/07/10-06:04:55.312 INFO  [RES] SQL Server <SQL Server>: [sqsrvres] Service name is MSSQLSERVER
00000dd8.00000ba0::2015/07/10-06:04:55.312 INFO  [RES] SQL Server <SQL Server>: [sqsrvres] Dependency expression for resource 'SQL Network Name (SQL2014test)' is '([1f0618d0-e95e-4e40-b14e-66252b010030])'
00000dd8.00000ba0::2015/07/10-06:04:55.312 INFO  [RES] SQL Server <SQL Server>: [sqsrvres] Starting service MSSQLSERVER...
0000071c.00000bbc::2015/07/10-06:04:55.562 INFO  [NM] Received request from client address NODE1.
0000071c.00000cd0::2015/07/10-06:04:56.437 INFO  [NM] Received request from client address NODE1.
00000dd8.00000ba0::2015/07/10-06:04:56.547 INFO  [RES] SQL Server <SQL Server>: [sqsrvres] Service is started. SQL Server pid is 524
00000dd8.00000ba0::2015/07/10-06:04:56.547 INFO  [RES] SQL Server <SQL Server>: [sqsrvres] Connect to SQL Server ...
00000dd8.00000ba0::2015/07/10-06:04:56.594 INFO  [RES] SQL Server <SQL Server>: [sqsrvres] The connection was established successfully
00000dd8.00000ba0::2015/07/10-06:04:58.609 INFO  [RES] SQL Server <SQL Server>: [sqsrvres] Run 'EXEC sp_server_diagnostics 20' returns following information
00000dd8.00000ba0::2015/07/10-06:04:58.609 ERR   [RES] SQL Server <SQL Server>: [sqsrvres] ODBC Error: [42000] [Microsoft][SQL Server Native Client 11.0][SQL Server]Database 'mssqlsystemresource' is being recovered. Waiting until recovery is finished. (922)
00000dd8.00000ba0::2015/07/10-06:04:58.609 ERR   [RES] SQL Server <SQL Server>: [sqsrvres] Failed to run diagnostics command. See previous log for error message
00000dd8.00000ba0::2015/07/10-06:04:58.609 INFO  [RES] SQL Server <SQL Server>: [sqsrvres] Disconnect from SQL Server
00000d70.00001370::2015/07/10-06:04:59.609 INFO  [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:a3b47601-8ad6-4fbb-815f-31d72569f541:Netbios
0000071c.00000988::2015/07/10-06:05:01.141 INFO  [GUM] Node 2: executing request locally, gumId:2760, my action: /dm/update, # of updates: 1
00000dd8.00000ba0::2015/07/10-06:05:03.609 INFO  [RES] SQL Server <SQL Server>: [sqsrvres] SQL Server was down
00000d70.00001370::2015/07/10-06:05:04.609 INFO  [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:a3b47601-8ad6-4fbb-815f-31d72569f541:Netbios
0000071c.00000988::2015/07/10-06:05:06.141 INFO  [GUM] Node 2: executing request locally, gumId:2761, my action: /dm/update, # of updates: 1
00000d70.00001370::2015/07/10-06:05:09.609 INFO  [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:a3b47601-8ad6-4fbb-815f-31d72569f541:Netbios
00000d70.00001370::2015/07/10-06:05:14.609 INFO  [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:a3b47601-8ad6-4fbb-815f-31d72569f541:Netbios
00000d70.00001370::2015/07/10-06:05:19.609 INFO  [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:a3b47601-8ad6-4fbb-815f-31d72569f541:Netbios
00000d70.00000fcc::2015/07/10-06:05:19.656 INFO  [RES] Network Name <SQL Network Name (SQL2014test)>: Dns: HealthCheck: SQL2014test
00000d70.00000fcc::2015/07/10-06:05:19.656 INFO  [RES] Network Name <SQL Network Name (SQL2014test)>: Dns: End of Slow Operation, state: Initialized/Reading, prevWorkState: Reading
00000d70.00000fcc::2015/07/10-06:05:24.609 INFO  [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:a3b47601-8ad6-4fbb-815f-31d72569f541:Netbios
00000d70.00000fcc::2015/07/10-06:05:29.609 INFO  [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:a3b47601-8ad6-4fbb-815f-31d72569f541:Netbios
00000d70.00000fcc::2015/07/10-06:05:34.609 INFO  [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:a3b47601-8ad6-4fbb-815f-31d72569f541:Netbios
00000d70.00000fcc::2015/07/10-06:05:39.609 INFO  [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:a3b47601-8ad6-4fbb-815f-31d72569f541:Netbios
00000d70.00000fcc::2015/07/10-06:05:44.609 INFO  [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:a3b47601-8ad6-4fbb-815f-31d72569f541:Netbios
00000d70.00000fcc::2015/07/10-06:05:49.609 INFO  [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:a3b47601-8ad6-4fbb-815f-31d72569f541:Netbios
00000d90.00000edc::2015/07/10-06:05:53.453 INFO  [RES] Physical Disk <Cluster Disk 1>: VolumeIsNtfs: Volume \\?\GLOBALROOT\Device\Harddisk2\ClusterPartition1\ has FS type NTFS
00000d70.00000fcc::2015/07/10-06:05:54.609 INFO  [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:a3b47601-8ad6-4fbb-815f-31d72569f541:Netbios
00000d70.00000fcc::2015/07/10-06:05:59.609 INFO  [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:a3b47601-8ad6-4fbb-815f-31d72569f541:Netbios
00000d70.00000fcc::2015/07/10-06:06:04.609 INFO  [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:a3b47601-8ad6-4fbb-815f-31d72569f541:Netbios
0000071c.00001294::2015/07/10-06:06:09.234 INFO  [DCM] HandleSweeperRecheck
0000071c.00001294::2015/07/10-06:06:09.234 INFO  [CLI] LsaCallAuthenticationPackage: 0, 0 size: 4, buffer: HDL( 8191a20000 )
0000071c.00001294::2015/07/10-06:06:09.281 ERR   [RCM] [GIM] ResType Virtual Machine has no resources, not collecting local utilization info
0000071c.00001294::2015/07/10-06:06:09.281 INFO  [RCM] [GIM] Scheduling Local Node Crawler to run in 300000 millisec.
00000d70.00000fcc::2015/07/10-06:06:09.609 INFO  [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:a3b47601-8ad6-4fbb-815f-31d72569f541:Netbios
00000d70.00000fcc::2015/07/10-06:06:14.609 INFO  [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:a3b47601-8ad6-4fbb-815f-31d72569f541:Netbios
0000071c.00001294::2015/07/10-06:06:19.094 INFO  [NM] Received request from client address NODE1.
0000071c.00001294::2015/07/10-06:06:19.110 INFO  [NM] Received request from client address NODE1.
00000d70.00000fcc::2015/07/10-06:06:19.610 INFO  [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:a3b47601-8ad6-4fbb-815f-31d72569f541:Netbios
00000d70.000009a4::2015/07/10-06:06:19.656 INFO  [RES] Network Name <SQL Network Name (SQL2014test)>: Dns: HealthCheck: SQL2014test
00000d70.000009a4::2015/07/10-06:06:19.656 INFO  [RES] Network Name <SQL Network Name (SQL2014test)>: Dns: End of Slow Operation, state: Initialized/Reading, prevWorkState: Reading
00000d70.00001370::2015/07/10-06:06:24.610 INFO  [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:a3b47601-8ad6-4fbb-815f-31d72569f541:Netbios
00000d70.00001370::2015/07/10-06:06:29.610 INFO  [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:a3b47601-8ad6-4fbb-815f-31d72569f541:Netbios
00000d70.00001370::2015/07/10-06:06:34.610 INFO  [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:a3b47601-8ad6-4fbb-815f-31d72569f541:Netbios
00000d70.00001370::2015/07/10-06:06:39.610 INFO  [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:a3b47601-8ad6-4fbb-815f-31d72569f541:Netbios
00000d70.00001370::2015/07/10-06:06:44.610 INFO  [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:a3b47601-8ad6-4fbb-815f-31d72569f541:Netbios
00000d70.00001370::2015/07/10-06:06:49.610 INFO  [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:a3b47601-8ad6-4fbb-815f-31d72569f541:Netbios
00000d90.000013fc::2015/07/10-06:06:53.453 INFO  [RES] Physical Disk <Cluster Disk 1>: VolumeIsNtfs: Volume \\?\GLOBALROOT\Device\Harddisk2\ClusterPartition1\ has FS type NTFS
00000d70.00001370::2015/07/10-06:06:54.610 INFO  [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:a3b47601-8ad6-4fbb-815f-31d72569f541:Netbios
00000d70.00001370::2015/07/10-06:06:59.610 INFO  [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:a3b47601-8ad6-4fbb-815f-31d72569f541:Netbios
00000d70.00001370::2015/07/10-06:07:04.610 INFO  [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:a3b47601-8ad6-4fbb-815f-31d72569f541:Netbios
00000d70.00001370::2015/07/10-06:07:09.610 INFO  [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:a3b47601-8ad6-4fbb-815f-31d72569f541:Netbios
00000d70.00001370::2015/07/10-06:07:14.610 INFO  [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:a3b47601-8ad6-4fbb-815f-31d72569f541:Netbios
00000d70.00001370::2015/07/10-06:07:19.610 INFO  [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:a3b47601-8ad6-4fbb-815f-31d72569f541:Netbios
00000d70.00001370::2015/07/10-06:07:19.657 INFO  [RES] Network Name <SQL Network Name (SQL2014test)>: Dns: HealthCheck: SQL2014test
00000d70.00001370::2015/07/10-06:07:19.657 INFO  [RES] Network Name <SQL Network Name (SQL2014test)>: Dns: End of Slow Operation, state: Initialized/Reading, prevWorkState: Reading
00000d70.00001370::2015/07/10-06:07:24.610 INFO  [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:a3b47601-8ad6-4fbb-815f-31d72569f541:Netbios
00000d70.00001370::2015/07/10-06:07:29.610 INFO  [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:a3b47601-8ad6-4fbb-815f-31d72569f541:Netbios
00000d70.00001370::2015/07/10-06:07:34.610 INFO  [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:a3b47601-8ad6-4fbb-815f-31d72569f541:Netbios
00000d70.00001370::2015/07/10-06:07:39.610 INFO  [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:a3b47601-8ad6-4fbb-815f-31d72569f541:Netbios
00000d70.00001370::2015/07/10-06:07:44.610 INFO  [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:a3b47601-8ad6-4fbb-815f-31d72569f541:Netbios
00000d70.00001370::2015/07/10-06:07:49.610 INFO  [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:a3b47601-8ad6-4fbb-815f-31d72569f541:Netbios
00000d90.00000484::2015/07/10-06:07:53.453 INFO  [RES] Physical Disk <Cluster Disk 1>: VolumeIsNtfs: Volume \\?\GLOBALROOT\Device\Harddisk2\ClusterPartition1\ has FS type NTFS
00000d70.00001370::2015/07/10-06:07:54.610 INFO  [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:a3b47601-8ad6-4fbb-815f-31d72569f541:Netbios
00000dd8.00000df4::2015/07/10-06:07:55.313 ERR   [RHS] RhsCall::DeadlockMonitor: Call ONLINERESOURCE timed out by 16 milliseconds for resource 'SQL Server'.
00000dd8.00000df4::2015/07/10-06:07:55.313 ERR   [RHS] Resource SQL Server handling deadlock. Cleaning current operation.
00000dd8.00000df4::2015/07/10-06:07:55.313 ERR   [RHS] About to send WER report.
0000071c.00001294::2015/07/10-06:07:55.313 WARN  [RCM] HandleMonitorReply: FAILURENOTIFICATION for 'SQL Server', gen(3) result 5018/0.
0000071c.00001294::2015/07/10-06:07:55.313 INFO  [RCM] Res SQL Server: OnlinePending -> ProcessingFailure( StateUnknown )
0000071c.00001294::2015/07/10-06:07:55.313 INFO  [RCM] TransitionToState(SQL Server) OnlinePending-->ProcessingFailure.
0000071c.00001294::2015/07/10-06:07:55.313 ERR   [RCM] rcm::RcmResource::HandleFailure: (SQL Server)
0000071c.00001294::2015/07/10-06:07:55.313 INFO  [RCM] resource SQL Server: failure count: 1, restartAction: 2 persistentState: 1.
0000071c.00001294::2015/07/10-06:07:55.313 INFO  [RCM] Resource SQL Server is causing group SQL Server CRG to failover.
0000071c.00001294::2015/07/10-06:07:55.313 INFO  [RCM] rcm::RcmGroup::Failover: (SQL Server CRG)
0000071c.00001294::2015/07/10-06:07:55.313 INFO  [RCM] time since last failure is greater than failover period; resetting failoverCount to 0.
0000071c.00001294::2015/07/10-06:07:55.313 WARN  [RCM] Failing over group SQL Server CRG, failoverCount 1, last time 2015/07/09-15:34:51.750.
0000071c.00001294::2015/07/10-06:07:55.313 INFO  [RCM-plcmt] This node is not director, node 1 is.  Asking others for placement...
0000071c.00001294::2015/07/10-06:07:55.313 INFO  [RCM-plcmt] asking node 1 placement decision, attempt 1
00000dd8.00000df4::2015/07/10-06:07:55.329 ERR   [RHS] WER report is submitted. Result : WerReportQueued.
0000071c.00001294::2015/07/10-06:07:55.375 INFO  [RCM-plcmt] done waiting...
0000071c.00001294::2015/07/10-06:07:55.375 INFO  [RCM-plcmt] Node 1 replied to placement request g=SQL Server CRG tgt=1 wait=false
0000071c.00001294::2015/07/10-06:07:55.375 INFO  MTimer(GetPlacementFromDirector): [Start to Multitimer_destroyed : 62 ms
0000071c.00001294::2015/07/10-06:07:55.375 INFO  MTimer(GetPlacementFromDirector): [Total: 62 ms ( 0 s )]
0000071c.00001294::2015/07/10-06:07:55.375 INFO  [RCM] Res SQL Server: ProcessingFailure -> WaitingToTerminate( Failed )
0000071c.00000bbc::2015/07/10-06:07:55.375 INFO  [RCM] rcm::RcmGroup::FailoverWorker: (SQL Server CRG)
0000071c.00001294::2015/07/10-06:07:55.375 INFO  [RCM] TransitionToState(SQL Server) ProcessingFailure-->[WaitingToTerminate to Failed].

Especially see the line: 
00000dd8.00000ba0::2015/07/10-06:04:58.609 ERR   [RES] SQL Server <SQL Server>: [sqsrvres] ODBC Error: [42000] [Microsoft][SQL Server Native Client 11.0][SQL Server]Database 'mssqlsystemresource' is being recovered. Waiting until recovery is finished. (922)

Things I've tried:

- reinstalling from scratch for node 1, meaning Server 2012 R2 again and SQL 2014 again
- https://msdn.microsoft.com/en-us/library/ms714687.aspx <= 42000 ODBC error means Syntax error or access violation but isn't clear to me what I should change
- changing the account from which the SQL server runs to a domain admin / local admin / network service
...

Does anyone have an idea how to solve this?


Viewing all articles
Browse latest Browse all 4689

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>