Alwayson secondary replica database in reverting/in recovery state
Transaction rollback hanging from past two months(MSSQL 2016)
I executed a query using SQLCMD on 2019-01-30
,
and the query caused ldf file to bloat exponentially and it became 2TB in size(2,216,539,357
rows), then my MSSQL started rollback of transaction at 2019-02-01
.
I used KILL
SPID WITH STATUSONLY
to monitor the rollback process: progress of rollback was about 2%
a day for first two weeks, afterwards the rollback process got stuck at 22% from 2019-02-14
,
and it's still there today.
I'd appreciate if I can get expert opinion on if there is any way to fix this issue - how to stop or speed up rollback?
Please find below code details about this issue:
------------------------------------------------------------------------------------------------
SQL
BEGIN TRAN DECLARE @m int SELECT @m = @@ERROR DECLARE @tbname_old varchar(50) = 'OTS_ARCHIVE' DECLARE @tbname_new varchar(50) = 'OTS_ARCHIVE2' DECLARE @column_old varchar(30) = 'GuID_ID' DECLARE @column_new varchar(30) = 'GuID_ID_old' DECLARE @sql varchar(50) = '[' + @tbname_new + '].[' + @column_old + ']' DECLARE @sqlid varchar(100) = 'CAST(CAST(NEWID() AS BINARY(10)) + CAST(GETDATE() AS BINARY(6)) AS UNIQUEIDENTIFIER)' DECLARE @date as datetime DECLARE @i int DECLARE @f int set @date = '2017-01-01' set @i = 0 set @f = 27 WHILE @i < @f BEGIN EXEC ('INSERT INTO ' + @tbname_new + ' select GuID_ID ,Box_ID ,Start_Time ,End_Time ,Duration_Time ,ots_count ,Group_ID ,' + @sqlid + ' from ' + @tbname_old ) END IF @m = 0 COMMIT TRAN ELSE ROLLBACK TRAN SELECT ERROR_NUMBER() AS ErrorNumber, ERROR_SEVERITY() AS ErrorSeverity, ERROR_STATE() AS ErrorState, ERROR_PROCEDURE() AS ErrorProcedure, ERROR_LINE() AS ErrorLine, ERROR_MESSAGE() AS ErrorMessage
Recovery Pending State in SQL Server Database
Ms-SQL differential backups using VSS writer
We have some queries regarding Differential backup and restore of MSSQL using its VSS writer.
During backup my application first take full backup, which backup both database and log transaction files i.e .mdf and .ldf files and during the differential backup, it backups only the changed blocks of database(.mdf file) provided by MSSQL VSS writer. There are no issues in backup.
Our application does a VSS based restore. In this, first we restore the full backup data i.e both database( .mdf) and transaction log(.ldf)files. Then it writes the differential/changed chunks to the corresponding database (.mdf) file. The VSS writer does not give any error during the actual restore process. But the databases are not accessible after restore , they are in corrupted state. The SQL Server service stops after the restore. It cannot be started. The application logs contains errors related to transaction log number mismatch due to which we can say that the database restore has actually failed.
Also after restoring the VSS partial chunks , I tried restoring SQL data by using the restore database command of SQL i.e"RESTORE Database [Database name]". But again the restore failed.
So I have a few queries here:
- Can we restore the database using only the VSS provided partial/differential data chunks?
- If the above is true then why the restore is failing while restoring the transaction logs. Am I missing some step?
- Is there some kind of recovery command needed after restoring Full backup + Partial chunk backup to get the databases in a consistent state?
VSS Differential backup returns complete range instead of changed ranges for Log (.ldf) files
Hi All,
I was consuming SQL VSS writer api's to take SQL Database backups in my org. I am successfully able to do Full backup using VSS service but when i am taking differential backup, VSS Service returns complete block ranges {For Ex: (0, 80900900)} instead of only changed blocks. Can someone please help me here in understanding if this is a known limitation or is there any function/method available to get only changed blocks for ldf files?
question on adding/removing Node from SQL 2016 Alwayson cluster
Hello, we have a Production SQL 2016 Alwayson cluster with 2 nodes in one datacenter1 & 1 node in a DR site. The secondary passive node in datacenter1 is having issues. So we are planning to remove that passive node from the cluster and add a new server as secondary passive node. Since it's a passive node, I don't think there will be any downtime in either removing/adding that passive node. Planning to perform that work during the day time. Here are the high level steps I am planning to do:
1. Remove the faulty passive node from the cluster
2. Add the new passive node to the cluster
3. Take a full DB & tran log backup from the active node
4. Stop all the DB backup jobs in all the nodes of the cluster
5. Restore the full DB backup with NORECOVERY onto the new secondary passive node
6. Restore the tran log backup with NORECOVERY onto the new secondary passive node
7. Sync up the logins between Primary & secondary Passive node
8. Configure SQL Alwayson Availability group to add this new secondary node
9. Start the sync process between the active & passive node
Please let me know if anything is missing from the above steps.
Thanks.
sqldev
Disaster Recovery Steps for complete loss of servers and data.
What are the steps or where is the Microsoft documents that provide steps to recover a "AlwaysOn" SQL database when the 2 underlying servers are no longer available. Let's say the scenario is 1 Data-Center with 2 SQL servers running Always On DB Groups, and by chance the data center blows up. What type of backups would I need and how would I get the servers rebuilt?
I currently have SnapShot backups of both SQL Servers (crash consistent), but when I bring them online everything is out of sync. I think I need steps to clean things up. the databases are not showing as online and are out of sync. I need steps to clean this up and restore SQL data from a SQL maintenance plan. So I am basically asking 2 questions. What does Microsoft recommend for my given scenario AND/Or if my snapshot methods is okay, what are the steps to get the databases back online so I can perform further restores of more current backups.
Restoring SQL Cluster as standalone server
Hi,
We recently moved to a new DR system which allows us to spin up our SQL servers in a private cloud. As such when the server is restored it doesn't have access to cluster disks etc., the disks are restored as local disks and cluster networks don't exist.
We use a SQL cluster in active/passive mode. When the SQL server is restored the cluster won't start, I'm not at all surprised at that. But I'd like to start the SQL services separately, as a stand alone server without having to get the cluster working. Is this even possible? Or is it possible to destroy the cluster and continue using SQL afterwards?
Thanks,
Dave
Always on ReadOnly routing from client without AD DNS
Hi there,
I've created a SQL 2017 AlwaysOn multi-subnet AG. I've been really impress how easy it's been to set up and how well it works.
From a server on the same Active Directory I can test the listener and watch the primary and readable secondary move around as expected. But most of our clients (web servers) aren't on an Active Directory like our SQL servers, so I wanted to test the client connectivity from them but am seeing some very strange behaviour.
To test the non-AD servers, I added two a-record's on the DNS server that the non-AD servers use pointing to the same IPs asthe AD listener (to mimic what's in AD DNS). When testing a connection/query, if my connection string doesn't contain the ApplicationIntent=readonly option then everything works fine.
But if I include the ApplicationIntent=readonly option then the connection fails with the following error:-
Exception calling "Open" with "0" argument(s): "A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: TCP Provider, error: 0 - No such host is known.)" At line:17 char:5+ $connection.Open()+ ~~~~~~~~~~~~~~~~~~+ CategoryInfo : NotSpecified: (:) [], MethodInvocationException+ FullyQualifiedErrorId : SqlException
unless the read only routing is going to the same server as the primary, then it connects fine!?
I can ping, nslookup and telnet on 1433 to all the servers in the AG. And just for clarity, this all works fine from a server on the same domain as SQL, it's just servers that aren't on the domain.
Can anyone please help me understand where I've gone wrong or what's different about connecting to the listener from a server that's not on the same domain?
many thanks!
:D
calculating the size required for backup of database
How to calculate the size required to take backup of a database in mssql 2000.the results returned by sp_spaceused differ too much from the actual size taken on the disk.
Thanks in advance
Multi Instance with different port AlwaysOn 2016
Hi all,
I'm configuring a multi instance Always On Availability group with SQL 2016 Enterprise.
I have two server DB-01 and DB-02
I successfully configure AlwaysOn with DAG in default Instance. All works Fine. WSFC works Fine. Perfect.
I have a Big problem installing a second Instance called for example "INSTANCEONE" on both servers.
The second Instance must run on a different tcp port, example 1435. When configuring a Availability Group with wizard i can specify tcp port on the classic login screen (see image) but whet permorm a failover, the AG cannot connect both instances because try to connect with default port.
My questions are:
1) exist a way to configure multi instance AlwaysOn cluster?
2) how can edit the Replica instance configuration to use non default tcp ports?
Thank you !!
read-only routing with single application connection
Good day colleagues,
My application supports a single database connection and in the app console I can produce reports. If I include the app database in an AlwaysOn availability group with a read-intent replica will SQL automatically route the “selects” to that second instance thus offloading my application’s reporting activities or I need a separate db connection (maybe from a reporting app or cli) with a connection specifying read-only intent?
Many thanks,
Archie
SQL Server replication distribution databases - Supported versions
For SQL Server replication distribution databases in an Always On availability group (AG). We're trying to confirm if the limitation "All SQL Server instances hosting distribution database replicas must be SQL Server 2017 CU 6 or later" is correct or if SQL Server 2016 SP2-CU3 also supports this.
In this article it says SQL Server 2016 SP2-CU3 introduced support for replication distribution database. Surely if that version supports it, then the replicas can run on that version? The limitation I quoted above from the same article contradicts that. I wonder if it's because support for 2016 was added later (as per the update at the bottom of this doc) and they missed updating the versions in that limitation?
Thanks,
Justin
SQL 2017 std maintenance plan and Availability group question
HA available options when combining Always on with and without SQL Failover cluster instance (FCI) or Always On Failover Cluster Instances
I have two queries on achieving HA using Always on with and without SQL FCI.
1.When combining SQL Failover cluster instance (FCI) or Always On Failover Cluster Instances with Always On (SYNCHRONOUS COMMIT, AUTOMATIC FAILOVER) do we get automatic failover to another node when following happens;
a. Instance goes down.
b. Server goes down
c. Mother board, memory, network issues
d. Quorum with file share witness is offline
e. Is Always On failover automatic or manual
f. How does Listener points to another node when instance level failover occurs
2.When Always on (SYNCHRONOUS COMMIT, AUTOMATIC FAILOVER) is installed without SQL Failover cluster instance (FCI) or Always On Failover Cluster Instances? do we get automatic failover to another node when following happens;
a. Instance goes down.
b. Server goes down
c. Mother board, memory issues
d. Quorum with file share witness is offline
e. Is Always On failover automatic or manual
SQL Server Service Account trying to connect to Dedicated Admin Connection(DAC), why?
We started getting Alerts:
Could not connect because the maximum number of '1' dedicated administrator connections already exists. Before a new connection can be made, the existing dedicated administrator connection must be dropped, either by logging off or ending the process. [CLIENT: 127.0.0.1]
It's seemed a classical alert when port scanning is taking place but our infrastructure confirmed that no jobs/port scanning configured to run at those times, nor any other tasks.
With a help of SQL job and holding table, found out that it's the SQL Server Service Account that tries to connect, approx every 4 hours e.g. 12:00 16:00 and so on, with a program name: Net SqlClient Data Provider. I've added the host_process_id column, identified the PID, but there were no processes with the same PID in the windows task manager. Oddly enough, there are no entries in the log: "Dedicated admin connection support was established", just the following ones:
16:00:05 Could not connect because the maximum number of '1' dedicated administrator connections already exists. Before a new connection can be made, the existing dedicated administrator connection must be dropped, either by logging off or ending the process. [CLIENT: 127.0.0.1]
16:00:11 Dedicated administrator connection has been disconnected. This is an informational message only. No user action is required.
No clues in Windows Event Viewer logs or Windows Cluster logs. This is only happening on an active node of AlwaysOn Availability Group.
Ladies and gentlemen, how can we troubleshoot the issue further?
***EDIT: I've found the cause of the problem, it was a powershell script that synchronises server objects for Availability Groups, it was set to run on port 1433 and I have no idea why it tried to connect to DAC, however simple reboot has solved the issue! Thanks ever so much to everyone who replied!***Automatic Seeding Failures
Hi
We are running a two Server Always On HADR system with enabled Automatic Seeding. Both SQL Servers are Microsoft SQL Server Enterprise (64-bit) with Build 14.0.3029.16 and both are running on Windows Server 2016 Standard (10.0). Assigned memory is 4096 MB to 432128 MB. Disk space is enough on both servers and performance Tuning options like "Lock pages in memory" are set.
From time to time the automatic Seeding process will not start. In the DMV sys.dm_hadr_automatic_seeding the current_state is CHECK_IF_SEEDING_NEEDED and after a couple of seconds it changes to FAILED. The failure_state_desc then shows "Seeding Check Message Timeout".
The last database we had this problem was just 8 GB. The Problem can be easily solved by removing the DB from the Availability Group and adding it again. Then it is working without problems.
Where can I enhance this timeout?
What could delay the check if seeding needed?
Do you need more information?
Thanks for the help.
Kind Regards
Dominic
Production database log file is getting increased rapidly.
Hi Guys,
I am facing issue on production environment available onazure(VM). The database server is (SQL Server 2016 EE on Windows Server 2012R2 DC) inHA-AlwaysOn& on the same environment, one database log file is getting increased very rapidly. Approx. 10+GB per day growth in log file. Previously, the LDF file size was 44 GB & now it is approx. 400 GB. MDF file size is 56 GB.
Tried to check the latency between both the nodes & is<=1ms.
Required your help to identify the reason behind this unpredictable behavior of SQL Server.
Thanks
Dhanush
Replace a NIC on secondary replica in AlwaysOn
Hi all<o:p></o:p>
Hoping I can get some assistance from the AlwaysOn/clustering gurus out there! We have a SQL 2016 availability group (asynchronous) which consists of a primary and secondary replica. The secondary replica has some hardware problems which we are trying to resolve. The hardware support people have recommended replacing one of the NICs. First question I have is… will this cause a problem with clustering/AlwaysOn? My understanding is that the two NICs form a team and they will both be replaced so the teaming will be broken (this is what the infrastructure team have told me). If we need to take the node out of AlwaysOn and windows clustering, replace the NICs and then put it back in????? What are the exact steps – is there some good documentation out there that can guide me through this? What are the potential risks? Obviously don’t want to end up taking down the primary or listener inadvertently.<o:p></o:p>
Thanks!<o:p></o:p>