Quantcast
Channel: SQL Server High Availability and Disaster Recovery forum
Viewing all articles
Browse latest Browse all 4689

AlwaysOn - cluster lease timeouts and PREEMPTIVE_HADR_LEASE_MECHANISM

$
0
0

We have recently installed some WSUS updates + SQL 2012 SP3 (yes, all tested without a problem in UAT :) and since than it seems that AO and cluster is having few issues - it seems that cluster's lease is timing out and I am unable to figure out why.. ;/ this results in short blip and lost connectivity.

Any help would be appreciated!

AlwaysOn Extended Events:

availability_group_lease_expired; state: LeaseEpxired; Timestamp: 2016-06-12 04:58:40.34
availability_replica_state_change: current state: Resolving_Normal; previous_sate: Primary_Normal;Timestamp: 2016-06-12 04:58:40.34
..
availability_replica_state_change: current state: Primary_Normal; previous_sate: Primary_Pending;Timestamp: 2016-06-12 04:58:52.96

SQL Log:

Date: 12/06/2016 04:58:40; Error: 19421, Severity: 16, State: 1.
SQL Server hosting availability group did not receive a process event signal from the Windows Server Failover Cluster within the lease timeout period.

Date: 12/06/2016 04:58:40; Error: 19407, Severity: 16, State: 1.
The lease between availability group and the Windows Server Failover Cluster has expired. A connectivity issue occurred between the instance of SQL Server and the Windows Server Failover Cluster. To determine whether the availability group is failing over correctly, check the corresponding availability group resource in the Windows Server Failover Cluster.

Date: 12/06/2016 04:58:40
AlwaysOn: The local replica of availability group is going offline because either the lease expired or lease renewal failed. This is an informational message only. No user action is required.

Cluster log (do not ask my why it's -1h, date on all nodes is ok):

2016/06/12-03:58:40.587 INFO  [RCM] rcm::RcmApi::FailResource: (AlwaysOn)
2016/06/12-03:58:40.588 INFO  [RCM] HandleMonitorReply: FAILURENOTIFICATION for 'AlwaysOn', gen(3) result 0/0.
2016/06/12-03:58:40.588 INFO  [RCM] Res AlwaysOn: Online -> ProcessingFailure( StateUnknown )
2016/06/12-03:58:40.588 INFO  [RCM] TransitionToState(AlwaysOn) Online-->ProcessingFailure.
2016/06/12-03:58:40.588 INFO  [RCM] rcm::RcmGroup::UpdateStateIfChanged: (AlwaysOn, Online --> Pending)
2016/06/12-03:58:40.588 ERR   [RCM] rcm::RcmResource::HandleFailure: (AlwaysOn)
2016/06/12-03:58:40.588 INFO  [RCM] resource AlwaysOn: failure count: 1, restartAction: 2 persistentState: 1.
2016/06/12-03:58:40.588 INFO  [RCM] numDependents is zero, auto-returning true
2016/06/12-03:58:40.588 INFO  [RCM] Greater than restartPeriod time has elapsed since first failure of AlwaysOn, resetting failureTime and failureCount.
2016/06/12-03:58:40.588 INFO  [RCM] Will queue immediate restart (500 milliseconds) of AlwaysOn after terminate is complete.
2016/06/12-03:58:40.588 INFO  [RCM] Res AlwaysOn: ProcessingFailure -> WaitingToTerminate( DelayRestartingResource )
2016/06/12-03:58:40.588 INFO  [RCM] TransitionToState(AlwaysOn) ProcessingFailure-->[WaitingToTerminate to DelayRestartingResource].
2016/06/12-03:58:40.588 INFO  [RCM] Res AlwaysOn: [WaitingToTerminate to DelayRestartingResource] -> Terminating( DelayRestartingResource )
2016/06/12-03:58:40.588 INFO  [RCM] TransitionToState(AlwaysOn) [WaitingToTerminate to DelayRestartingResource]-->[Terminating to DelayRestartingResource].
2016/06/12-03:58:40.588 ERR   [RES] SQL Server Availability Group <AlwaysOn>: [hadrag] Lease Thread terminated
2016/06/12-03:58:40.588 ERR   [RES] SQL Server Availability Group <AlwaysOn>: [hadrag] The lease is expired. The lease should have been renewed by 2016/06/12-03:58:30.348
2016/06/12-03:58:40.588 INFO  [RES] SQL Server Availability Group: [hadrag] Stopping Health Worker Thread
2016/06/12-03:58:40.588 INFO  [RES] SQL Server Availability Group: [hadrag] Health worker was asked to terminate

Something odd - SQL wait times from last 12h:

wait type                        Wait Time      % of Total Wait
PREEMPTIVE_HADR_LEASE_MECHANISM  80,183,360 ms  39.09%
PREEMPTIVE_SP_SERVER_DIAGNOSTICS 80,183,265 ms  39.09%
HADR_CLUSAPI_CALL                40,534,655 ms  19.76%

Dodgy update somewhere? Let me know if you have any hints.

Thanks in advance, Tomasz


Viewing all articles
Browse latest Browse all 4689

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>