Hi,
I've been having issues with my Always on cluster configuration which baffle me.
The cluster consists of 4 SQL 2017 nodes:
Primary
Secondary - synchronous
Two more secondary instances who are a-sync
From time to time, the cluster itself fails due to health issues on one of the secondary a-sync nodes (I suspect that stress on the network). When such a failure occurs, the entire cluster fails and no instance is able to be promoted to primary unless I intervene manually and failover to one of the nodes.
The a-sync node that fails has no quorum vote (it's on a remote location) , and I would expect that if it has any issues then the cluster would simply disregard it but for some reason it causes the entire cluster's health to fail.
The cluster is based on Windows server 2012.
Has anyone ever encountered such a behavior ? Would there be any benefit in upgrading to Windows server 2016 ? Has anyone done a zero downtime migration from 2012 to 2016 (on the cluster level - my SQL is already 2017) ?
I've contacted MS support and it's been a few months yet no solution was provided.
Thanks ! :)