Quantcast
Channel: SQL Server High Availability and Disaster Recovery forum
Viewing all 4689 articles
Browse latest View live

Secondary goes behind primary

$
0
0

its a 4 node cluster. 2 nodes in one location and two nodes in second location. There are AG configured on multiple instances across this cluster.

Sometimes once in a week or twice the secondary instances(some times replica in the same location) goes behing primary.

  1. There is no blocking in primary
  2. No long running SPID’ s found in primary
  3. Nothing specific in error logs on primary and delayed replica.
  4. There is no open tran

Some times i see Redo queue is not sent to Secondary and piled up at primary. Another time,it looks like Redo is populated but seconday is not processing it.

When i reboot the secondary issue gets resovled. How do i find the reason for this issue? I tried to suspend data movement and resume but didnt help..Tried switching AYNC to SYNC.


    Best Regards, Arun http://whynotsql.blogspot.com/



    Domain independent sql availability groups on Azure VMs (multi-region)

    $
    0
    0

    Hello Group,

    As part of a POC, I am in the process of creating a domain-independent (or) AD detached cluster and then create a SQL availability group on Azure VMs (using windows 2016 & SQL 2016)...Performed the following activities...

    1) Created a windows fail-over cluster with 2 VMs located in different regions. Created a VM for DNS server in one of the regions and both the VMs were configured to point to the same DNS (using VNET-VNET integration).

    2) Created a SQL availability group & listener from SSMS on the primary SQL VM.

    3) Created internal load balancer (ILB) for each of the VMs and configured these ILB IP as part of listener dependency.

    4) When i do a fail-over from SQL primary SSMS, i see that the respective Primary ILB endpoint is going Offline as expected, and the secondary's ILB endpoint is coming Online (as expected) in the listener section from "Failover cluster Manager". I am also able to connect to the end point using IP address & we are good so far...

    My understanding is - when i do a fail over, i should be able to connect to the SQL server using listener name itself  (and listener will internally take me to the ILB endpoint that is "Online").

    My problem now is - i am not able to get the listener name (which is a DNS name) return the IP address of the ILB IP address that is "Online".  In other words i am not sure how/where to establish a mapping between listener name/DNS name to the ILB IP address that changes dynamically (when a fail-over is performed). 

    As of now, when i do a "ping <<listener>" from the primary SQL VM, i get a response "Ping request could not find host <<listener>>. Please check the name and try again."

    Please advise on how to resolve this issue & any inputs on this would be of great help...

    Thanks & Regards

    Kiran...




    SQL service pack upgrade on fail over cluster with log shipping

    $
    0
    0
    Hello,
    We have 3 SQL servers with fail over cluster and two of them are configured with log shipping.
    Current Version: SQL Server 2012 Service Pack 3
    Upgrading to SQL Server 2012 Service Pack 4

    Server A - Part of Fail over cluster with log shipping (Primary server)
    Server B - Part of fail over cluster
    Server C - Part of fail over cluster with log shipping (Secondary server).

    This is the first time i am upgrading fail over cluster with log shipping and as this is a very critical server, i have to be extra careful.
    Can anyone please provide the correct way to upgrade all three SQL server? 

    connection syntax

    $
    0
    0

    Hi Team,

    What is the parameter name for applicationintent=readonly?

    :Connect server_name[\instance_name] [-l timeout] [-U user_name [-P password].

    Tried below syntax.but didnt work.

    :connect server_name -K applicationintent=readonly


    Best Regards, ACDBA http://whynotsql.blogspot.com/


    Cluster failover question

    $
    0
    0

    I have a 4 node cluster. Server A1, Server A2 in one location and B1 and B2 in second location. They are windows failover clusters.Cloud witness is configured.  There are two instances in first location.first instance in A1 and second instance in A2.Their corresponding AG replica instances are in B1 and B2. For first instance i have set prefered owner as A1 and A2.(B1 and B2 in second location is unchecked)

    OS Version-SQL 2016 Ent

    Windows-2016 STD.

    When there was a network flapbetween two locations,below events are triggered.

    1. Microsoft Failover Cluster Virtual Adapter (NetFT) has missed more than 40 percent of consecutive heartbeats.
    2. Cluster has lost the UDP connection from local endpoint 
    3. Group 'Cluster Group' has transitioned from state 'Online' to state 'Orphaned'.
    4. Clustered role 'Cluster Group' is moving from cluster node 'A1' to cluster node 'A2'.
    5. Cluster has established a UDP connection from local endpoint 
    6. Clustered role 'First Instance' is moving from cluster node 'B1' to cluster node 'A1'.(Since preferred owner for first instance is set to A1 AND A2 WHY IS FIRST INSTANCE IS SHOWING MOVING FROM B1 TO A1

    once reconncted i saw both first and second instance in A2.

    Can you tell me any weblinks to understand the sequence of events post a network disconnection in cluster and why in the event its showing moving from B1 to A1. 

    Manual failover is configured..




    Best Regards, ACDBA http://whynotsql.blogspot.com/


    Replication on Always On

    $
    0
    0

    I have configured Replication on Always On environment. Did failover, primary changed to secondary and secondary to primary.

    I am noticing that replication is still flowing from secondary replica only. I will be adding another replica and will shut down secondary replica soon. If I shutdown secondary replica which was primary replica will replication still work?

    I have configured all replica as publishers and followed all the steps that are in the document:

    https://blogs.msdn.microsoft.com/alwaysonpro/2014/01/30/setting-up-replication-on-a-database-that-is-part-of-an-alwayson-availability-group/

    Application connection string using MultiSubnetFailover

    $
    0
    0

    the connection string variable MultiSubnetFailover should be design for always-on

    what about if I have a 4 node cluster , with multi-subnet. is it still faster the reconnection during failover ?

    EntityFramework Connections to SQL Server Distributed Availability Group Cluster

    $
    0
    0

    We recently built a Distributed Availability Group cluster.  The cluster has three different availability groups within three different domains/region. One primary read/write listener and two other listeners with read-only replicas.

    We plan to have an application built with EntityFramework configured in each domain/region. How would we configure EntityFramework connections to properly send read/write connections to the Primary Listener and read-only connections to the Local Listener?

    Is there documentation on how to configure connections to communicate with Distributed Availability Group clusters?

    Thx



    jobs disable in alwayson secondary

    $
    0
    0

    Hi,

    Allwayson availabillity group is configured on some dbs. Once the failover happens the primary becomes secondary and secondary becomes primary. we configured some same migration jobs in both replicas. Now once the failover happens these jobs should be enable in primary and disable in secondary.

    I have a doubt here. When fail over happens the secondary becomes primary that is read write. primary becomes seconadry i.e readonly dbs. then what is the necessary to disable the jobs? In this stage what happens to the jobs in both replicas? also how about SQL Server Agent? will agent works in seconadary after failover? Do i need to enable and disable jobs automatically in both replicas?

    Please clarify. i am confused with this.

    If i need to enable (or execute in primary) and disable (not execute in seconadry)the jobs in each replicas,then pls help me how to do that? Any script is available to add step in each job.

    Thanks.

    Jo


    pols


    Resolving status instead of making automatic failover

    $
    0
    0

    i have always on  availability group which has three nodes and this AG support DTC .

    when the primary server service stopped the all database involved in this AG changed the status to resolving on all other nodes.

    why this AG didn't failover to the sync secondary replica ?

    Always ON role is going offline continuously

    $
    0
    0

    Hi,

    Always ON role is going offline continuously.

    Please find the cluster events.

    Event id:1069

    Cluster resource 'AlwaysON' of type 'SQL Server Availability Group' in clustered role 'AlwaysON' failed.

    Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it.  Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet.

    Event id:1205

    The Cluster service failed to bring clustered role 'AlwaysON' completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered role.


    Harsha

    sql server 2008 - failback not working as expected

    $
    0
    0

    Each month we run windowsupdate on a windows 2008 cluster running sql server 2008 r2 and sql server 2012. This will do the patching and then reboot the servers, and by default all the instances will end up on one node, whichever came up first. We have configured autofailback for each group, so that the 6 instances will come up on their 'preferred' node to optimise performance. We enable autofailback only during the date/time that windowsupdate will run, then disable it again as we don't want it to kick in during normal operation.

    Last month this process failed and all the instances came up on 1 node. The only obvious difference being that a (backup) service failed to start. This happened in a similar way on 4 clusters.

    is the reason that autofailback failed to do its magic because it doesn't kick in if the group doesn't come if a service is in a 'failed' or possibly 'failing' state?

    Upgrade to SQL Server 2016 Enterprise using Side by Side

    $
    0
    0

    Availability Group1 --Node-A synchronous Always ON to Node-B --Transaction DB Windows Cluster (WSFC) --- VIP-1

    Availability Group2 --- Node-C asynchronous Always ON to Node-D --- Reporting DB Windows Cluster (WSFC) --- VIP-2

    Availability Group --- Node-A asynchronous Always ON to Node-C

    DR -- Node-C asynchronous Always ON to DR-Node --- VIP-3

    Is it possible two different Availability groups can synchronize the data each other. Please refer my design and suggest me your opinion

    =========================================

    CDC on Asynchronous Secondary Replica

    $
    0
    0

    For the purposes of processing changed data, I would like to set up an asynchronous replica which has CDC enabled.  The primary replica will not have CDC enabled to avoid any issues with CDC resource usage.

    Change Data Capture and Other SQL Server Features states "[w]hen you use Always On, change enumeration should be done on the Secondary replication to reduce the disk load on the primary."  When you look at Replication, change tracking, & change data capture - Always On availability groups it states "... CDC configuration is always performed on the current or intended primary replica."

    From the Microsoft documents, it seems that the only way to enable CDC ONLY on the secondary database is to avoid Availability Groups and go with one of the more traditional replication options (i.e. Snapshot, Merge, or Transactional).  Is that correct?  Is there really no way of enabling CDC just on the secondary replica using Availability Groups even when the secondary replica is asychronous?

    HADR in SQL 2016

    $
    0
    0

    I would like to create a HADR using SQL FCI + AG on 3 nodes (2 node FCI in Primary site and 1 stand alone). Can i use secondary replica in DR as read only for backups and reporting? 

    What would be the challenges with config that i need to be aware of?

     

    AlwaysOn AG - 2 nodes - automatic fail-over

    $
    0
    0

    Hello:

    I am trying to setup an AOAG for a SharePoint 2019 farm using 2 SQL servers/instances.  I have always setup SQL server fail-over clustering using Shared disk but I want to leverage AOAG.  I have a couple of questions around this as while I'm reading the guides.

    1. Can I achieve AOAG using only 2 nodes with automatic fail-over? 
    2. Does my listener have to be on the same subnet as my SQL servers?

    If there is a step-by-step guide that can be shared to achieve AOAG with auto fail-over, I would appreciate it.

    Thank you,


    Rumi

    Restore Alert

    $
    0
    0

    Hi Team,

    how to get alert for last restored T-Log for every 1 Hr? (kind of SQL Job)

    Because i am doing some testing, restoring T-Log for every 10 mins from one environment to another environment.


    Failover of AlwaysOn Groups Without Alerts

    $
    0
    0

    I have experienced a failover of my AlwaysOn groups. However I have alerts for failover set up and didn't receive any when this failover happened.

    I am also unable to see an indication of the failover in the logs.

    The only logs for that period, on the secondary, show the following:
    22/01/2019 07:35:59 1641 Information Clustered role 'Group' is moving from cluster node 'SQL2' to cluster node 'SQL2'.

    I'm just after some advice on why this might happen and how I can monitor for it.

    Always on cluster issues

    $
    0
    0

    Hi,

    I've been having issues with my Always on cluster configuration which baffle me.

    The cluster consists of 4 SQL 2017 nodes:

    Primary

    Secondary - synchronous

    Two more secondary instances who are a-sync

    From time to time, the cluster itself fails due to health issues on one of the secondary a-sync nodes (I suspect that stress on the network). When such a failure occurs, the entire cluster fails and no instance is able to be promoted to primary unless I intervene manually and failover to one of the nodes.

    The a-sync node that fails has no quorum vote (it's on a remote location) , and I would expect that if it has any issues then the cluster would simply disregard it but for some reason it causes the entire cluster's health to fail.

    The cluster is based on Windows server 2012.

    Has anyone ever encountered such a behavior ? Would there be any benefit in upgrading to Windows server 2016 ? Has anyone done a zero downtime migration from 2012 to 2016 (on the cluster level - my SQL is already 2017) ?

    I've contacted MS support and it's been a few months yet no solution was provided.

    Thanks ! :)

     

    SQL Server Service Account trying to connect to Dedicated Admin Connection(DAC), why?

    $
    0
    0

    We started getting Alerts:

    Could not connect because the maximum number of '1' dedicated administrator connections already exists. Before a new connection can be made, the existing dedicated administrator connection must be dropped, either by logging off or ending the process. [CLIENT: 127.0.0.1]

    It's seemed a classical alert when port scanning is taking place but our infrastructure confirmed that no jobs/port scanning configured to run at those times, nor any other tasks. 

     With a help of SQL job and holding table, found out that it's the SQL Server Service Account that tries to connect, approx every 4 hours e.g. 12:00 16:00 and so on, with a program name: Net SqlClient Data Provider. I've added the host_process_id column, identified the PID, but there were no processes with the same PID in the windows task manager. Oddly enough, there are no entries in the log: "Dedicated admin connection support was established", just the following ones:

    16:00:05 Could not connect because the maximum number of '1' dedicated administrator connections already exists. Before a new connection can be made, the existing dedicated administrator connection must be dropped, either by logging off or ending the process. [CLIENT: 127.0.0.1]

    16:00:11 Dedicated administrator connection has been disconnected. This is an informational message only. No user action is required.

    No clues in Windows Event Viewer logs or Windows Cluster logs. This is only happening on an active node of AlwaysOn Availability Group.


    Ladies and gentlemen, how can we troubleshoot the issue further?

    Viewing all 4689 articles
    Browse latest View live


    <script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>