Hi experts,
I run our system with SQL server 2012 AlwaysOn and I found I can't connect to standby SQL server 2012(Not Synchronizing/Recovery Pending) after an failure. Here is what I found in the logs. What happened? Please help.
---
- SQL server errorlog
2014-02-24 18:15:54.58 spid48s A connection timeout has occurred on a previously established connection to availability replica 'DL980-4' with id [AA216224-A495-4821-B121-F01FEF5132B8]. Either a networking or a firewall issue exists or the availability replica has transitioned to the resolving role.
2014-02-24 18:15:54.60 spid38s AlwaysOn Availability Groups connection with secondary database terminated for primary database 'TCP' on the availability replica with Replica ID: {aa216224-a495-4821-b121-f01fef5132b8}. This is an informational message only. No user action is required.
2014-02-24 18:16:04.60 spid38s A connection for availability group 'AGTCP' from availability replica 'DL980-3' with id [8BA51030-C95F-4944-A8EE-43C44241EC08] to 'DL980-4' with id [AA216224-A495-4821-B121-F01FEF5132B8] has been successfully established. This is an informational message only. No user action is required.
2014-02-24 18:16:04.60 spid41s AlwaysOn Availability Groups connection with secondary database established for primary database 'TCP' on the availability replica with Replica ID: {aa216224-a495-4821-b121-f01fef5132b8}. This is an informational message only. No user action is required.
2014-02-24 18:16:29.93 spid38s A connection timeout has occurred on a previously established connection to availability replica 'DL980-4' with id [AA216224-A495-4821-B121-F01FEF5132B8]. Either a networking or a firewall issue exists or the availability replica has transitioned to the resolving role.
2014-02-24 18:16:29.93 spid38s AlwaysOn Availability Groups connection with secondary database terminated for primary database 'TCP' on the availability replica with Replica ID: {aa216224-a495-4821-b121-f01fef5132b8}. This is an informational message only. No user action is required.
2014-02-24 18:16:35.82 spid17s SQL Server has encountered 1 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [L:\tcpdata4\TCPDATA39.ndf] in database [TCP] (5). The OS file handle is 0x000000000000A300. The offset of the latest long I/O is: 0x0000173cfd0000
2014-02-24 18:16:35.82 spid17s SQL Server has encountered 1 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [l:\tempdb4\tempdb8.ndf] in database [tempdb] (2). The OS file handle is 0x0000000000001930. The offset of the latest long I/O is: 0x000004994e0000
2014-02-24 18:16:35.82 spid17s SQL Server has encountered 1 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [l:\tempdb4\tempdb7.ndf] in database [tempdb] (2). The OS file handle is 0x0000000000001964. The offset of the latest long I/O is: 0x000004993e0000
2014-02-24 18:16:35.82 spid17s SQL Server has encountered 1 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [L:\tcpdata4\TCPDATA43.ndf] in database [TCP] (5). The OS file handle is 0x000000000000AEAC. The offset of the latest long I/O is: 0x000019efbf4000
2014-02-24 18:16:35.82 spid17s SQL Server has encountered 1 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [L:\tcpdata4\TCPDATA41.ndf] in database [TCP] (5). The OS file handle is 0x0000000000008914. The offset of the latest long I/O is: 0x0000270953c000
2014-02-24 18:16:35.82 spid17s SQL Server has encountered 1 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [L:\tcpdata4\TCPDATA48.ndf] in database [TCP] (5). The OS file handle is 0x0000000000002888. The offset of the latest long I/O is: 0x00000ec75a0000
2014-02-24 18:16:35.83 spid17s SQL Server has encountered 1 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [L:\tcpdata4\TCPDATA44.ndf] in database [TCP] (5). The OS file handle is 0x00000000000022A4. The offset of the latest long I/O is: 0x000002c5e60000
2014-02-24 18:16:35.83 spid17s SQL Server has encountered 1 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [L:\tcpdata4\TCPDATA40.ndf] in database [TCP] (5). The OS file handle is 0x0000000000000880. The offset of the latest long I/O is: 0x000022e1642000
2014-02-24 18:16:35.83 spid17s SQL Server has encountered 1 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [L:\tcpdata4\TCPDATA47.ndf] in database [TCP] (5). The OS file handle is 0x00000000000028F4. The offset of the latest long I/O is: 0x000027d8946000
2014-02-24 18:16:35.83 spid17s SQL Server has encountered 1 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [L:\tcpdata4\TCPDATA45.ndf] in database [TCP] (5). The OS file handle is 0x000000000000AC80. The offset of the latest long I/O is: 0x000022ea45c000
2014-02-24 18:16:35.83 spid17s SQL Server has encountered 1 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [L:\tcpdata4\TCPDATA42.ndf] in database [TCP] (5). The OS file handle is 0x0000000000001FA8. The offset of the latest long I/O is: 0x00002829b14000
2014-02-24 18:16:35.83 spid17s SQL Server has encountered 1 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [L:\tcpdata4\TCPDATA38.ndf] in database [TCP] (5). The OS file handle is 0x0000000000002874. The offset of the latest long I/O is: 0x0000207d4ba000
2014-02-24 18:16:35.83 spid17s SQL Server has encountered 1 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [L:\tcpdata4\TCPDATA37.ndf] in database [TCP] (5). The OS file handle is 0x0000000000001D5C. The offset of the latest long I/O is: 0x000025b1f1a000
---
2. AlwaysOn Extended Event Log - error_report @ 2014-02-24 18:15:54
A connection timeout has occurred on a previously established connection to availability replica 'DL980-4' with id [AA216224-A495-4821-B121-F01FEF5132B8]. Either a networking or a firewall issue exists or the availability replica has transitioned to the resolving role.
---
3. Cluster log
00002424.00002bb4::2014/02/24-10:16:58.565 INFO [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:6e26fe17-09c2-4f54-8aae-52678b351486:Netbios
00002424.00003284::2014/02/24-10:16:58.565 INFO [RES] Network Name <AGTCP_tccdb4>: Netbios: Slow Operation, FinishWithReply: 0
00002424.00003284::2014/02/24-10:16:58.565 INFO [RES] Network Name: [NN] got sync reply: 0
00002424.00003284::2014/02/24-10:16:58.565 INFO [RES] Network Name <AGTCP_tccdb4>: Netbios: End of Slow Operation, state: Initialized/Idle, prevWorkState: Idle
00002424.00003284::2014/02/24-10:17:03.566 INFO [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:6e26fe17-09c2-4f54-8aae-52678b351486:Netbios
00002424.00002bb4::2014/02/24-10:17:03.566 INFO [RES] Network Name <AGTCP_tccdb4>: Netbios: Slow Operation, FinishWithReply: 0
00002424.00002bb4::2014/02/24-10:17:03.566 INFO [RES] Network Name: [NN] got sync reply: 0
00002424.00002bb4::2014/02/24-10:17:03.566 INFO [RES] Network Name <AGTCP_tccdb4>: Netbios: End of Slow Operation, state: Initialized/Idle, prevWorkState: Idle
00002428.00000fc0::2014/02/24-10:17:06.046 ERR [RES] SQL Server Availability Group: [hadrag] Failure detected, diagnostics heartbeat is lost
00002428.00000fc0::2014/02/24-10:17:06.046 ERR [RES] SQL Server Availability Group <AGTCP>: [hadrag] Availability Group is not healthy with given HealthCheckTimeout and FailureConditionLevel
00002428.00000fc0::2014/02/24-10:17:06.046 ERR [RES] SQL Server Availability Group <AGTCP>: [hadrag] Resource Alive result 0.
00002428.00000fc0::2014/02/24-10:17:06.046 ERR [RES] SQL Server Availability Group: [hadrag] Failure detected, diagnostics heartbeat is lost
00002428.00000fc0::2014/02/24-10:17:06.046 ERR [RES] SQL Server Availability Group <AGTCP>: [hadrag] Availability Group is not healthy with given HealthCheckTimeout and FailureConditionLevel
00002428.00000fc0::2014/02/24-10:17:06.046 ERR [RES] SQL Server Availability Group <AGTCP>: [hadrag] Resource Alive result 0.
00002428.00000fc0::2014/02/24-10:17:06.046 WARN [RHS] Resource AGTCP IsAlive has indicated failure.
000016d4.00004e48::2014/02/24-10:17:06.046 INFO [RCM] HandleMonitorReply: FAILURENOTIFICATION for 'AGTCP', gen(2) result 1/0.
000016d4.00004e48::2014/02/24-10:17:06.046 INFO [RCM] Res AGTCP: Online -> ProcessingFailure( StateUnknown )
000016d4.00004e48::2014/02/24-10:17:06.046 INFO [RCM] TransitionToState(AGTCP) Online-->ProcessingFailure.
000016d4.00003640::2014/02/24-10:17:06.046 INFO [GEM] Sending 1 messages as a batched GEM message
000016d4.00004e48::2014/02/24-10:17:06.046 INFO [RCM] rcm::RcmGroup::UpdateStateIfChanged: (AGTCP, Online --> Pending)
000016d4.00004e48::2014/02/24-10:17:06.046 ERR [RCM] rcm::RcmResource::HandleFailure: (AGTCP)
000016d4.00004e48::2014/02/24-10:17:06.046 INFO [RCM] resource AGTCP: failure count: 2, restartAction: 2 persistentState: 1.
000016d4.00004e48::2014/02/24-10:17:06.046 INFO [RCM] numDependents is zero, auto-returning true
000016d4.00004e48::2014/02/24-10:17:06.046 INFO [RCM] Greater than restartPeriod time has elapsed since first failure of AGTCP, resetting failureTime and failureCount.
000016d4.00004e48::2014/02/24-10:17:06.046 INFO [RCM] Will queue immediate restart (500 milliseconds) of AGTCP after terminate is complete.
000016d4.00004e48::2014/02/24-10:17:06.046 INFO [RCM] Res AGTCP: ProcessingFailure -> WaitingToTerminate( DelayRestartingResource )
000016d4.00004e48::2014/02/24-10:17:06.046 INFO [RCM] TransitionToState(AGTCP) ProcessingFailure-->[WaitingToTerminate to DelayRestartingResource].
000016d4.00004e48::2014/02/24-10:17:06.047 INFO [RCM] Res AGTCP: [WaitingToTerminate to DelayRestartingResource] -> Terminating( DelayRestartingResource )
000016d4.00004e48::2014/02/24-10:17:06.047 INFO [RCM] TransitionToState(AGTCP) [WaitingToTerminate to DelayRestartingResource]-->[Terminating to DelayRestartingResource].
00002428.0000111c::2014/02/24-10:17:06.047 INFO [RES] SQL Server Availability Group: [hadrag] Stopping Health Worker Thread
00002428.000027e4::2014/02/24-10:17:06.047 INFO [RES] SQL Server Availability Group: [hadrag] Health worker was asked to terminate
000016d4.00002318::2014/02/24-10:17:06.047 INFO [GEM] Sending 1 messages as a batched GEM message
00002424.00003284::2014/02/24-10:17:06.050 INFO [RES] Network Name <AGTCP_tccdb4>: Getting Read/Write private properties
00002424.00002bb4::2014/02/24-10:17:06.052 INFO [RES] Network Name <AGTCP_tccdb4>: Getting Read/Write private properties
000016d4.00000dc0::2014/02/24-10:17:06.063 INFO [NM] Received request from client address DL980-3.
000016d4.00002ae8::2014/02/24-10:17:06.064 INFO [NM] Received request from client address DL980-3.
000016d4.0000212c::2014/02/24-10:17:06.067 INFO [RCM] ignored non-local state Pending for group AGTCP
00002424.00002bb4::2014/02/24-10:17:06.079 INFO [RES] Network Name <AGTCP_tccdb4>: Getting Read/Write private properties
00002424.00003284::2014/02/24-10:17:06.081 INFO [RES] Network Name <AGTCP_tccdb4>: Getting Read/Write private properties
00002424.00002bb4::2014/02/24-10:17:08.567 INFO [RES] Network Name: Agent: Sending request Netname/RecheckConfig to NN:6e26fe17-09c2-4f54-8aae-52678b351486:Netbios
00002424.00003284::2014/02/24-10:17:08.567 INFO [RES] Network Name <AGTCP_tccdb4>: Netbios: Slow Operation, FinishWithReply: 0
00002424.00003284::2014/02/24-10:17:08.567 INFO [RES] Network Name: [NN] got sync reply: 0
00002424.00003284::2014/02/24-10:17:08.567 INFO [RES] Network Name <AGTCP_tccdb4>: Netbios: End of Slow Operation, state: Initialized/Idle, prevWorkState: Idle