Quantcast
Channel: SQL Server High Availability and Disaster Recovery forum
Viewing all 4689 articles
Browse latest View live

Questions about Availability Groups and the "order" data moves to the Secondary.

$
0
0

Greetings. I've currently got an AG used for a large Data Warehouse environment. Of course I know an AG isn't ideal for this setting, but it's the hand I've been dealt. Anyways, I've recently discovered from this thread there's really no good way to measure AG latency, and wondering if I could somehow roll my own. We have about 15 DB's in our AG, but knowing the latency for only one of them is really critical. That said my hoakie idea is as follows:

On the Primary:

Create a table named agInsertTime in this DB that simply has an identity field, and a dateTime field. Once a minute, a job on the Primary will insert a getDate() value into the dateTime column in this table, of course generating the next highest value into the identity column as well.

Also create another new table in a DB that’s NOT in the AG named agRetrievalTime. It will have a an INT column, and two dateTime columns. Once a minute, a job will query the max value from the agInsertTime table on the Secondary, along with the dateTime field from that table, as well as the current getDate() value, and insert these values into agRetrievalTime.

I’ll then query the agRetrievalTime table for the difference between the the two dateTime values, grouped by the identity field having the highest difference.

Pretty sure this would work and wouldn’t be all that difficult. What I don’t know about this scenario is:

  1. A massive DML statement goes into a real table.
  2. Before the data from number 1 makes it to the Secondary, the job/ Insert statement I’ve described above occurs.

 

Will number 2 have to wait for number 1 to commit to the Secondary before it commits, or will this new record from number 2 possibly get there before number 1? If number 2 can arrive and be committed on the Secondary before number1, this is doomed.

Thoughts?


Thanks in advance! ChrisRDBA





Failover Cluster Instance (FCI) with Cluster Shared Volumes (CSV)

$
0
0

Can you use AlwaysOn Availability Groups alongside Failover Cluster Instance (FCI) with Cluster Shared Volumes (CSV)?

Thanks,

Lijun

Know when a Log Shipping restore is complete

$
0
0

Hi

anybody know a better way of figuring when a .trn log restore is complete other than something like this:

DO WHILE FOREVER

    Execute this query:

SELECT max([rs].[destination_database_name])  AS [destination_database_name],
max([rs].[restore_date])  AS [restore_date],
max([rs].[restore_type]) AS [restore_type],
max([rs].[user_name]) AS [user_name], 
max(CAST([rs].[recovery] AS INT)) AS [recovery],
max(CAST([rs].[replace] AS INT)), 
max(CAST([rs].[restart] AS INT)) AS [restart],
max([rs].[backup_set_id]) AS [backup_set_id],
max([bmf].[physical_device_name]) as [backup_file_used_for_restore]
FROM msdb..restorehistory rs
INNER JOIN msdb..backupset bs ON [rs].[backup_set_id] = [bs].[backup_set_id]
INNER JOIN msdb..backupmediafamily bmf ON [bs].[media_set_id] = [bmf].[media_set_id] 
WHERE [restore_type] = 'L'

    IF locally stored restore_date > restore_date from the query THEN

        kick off the required action

        IF success THEN locally stored restore_date := restore_date

        ENDIF

    ENDIF

ENDWHILE

Any ideas?

A Merry Christmas and a Happy New Year to all our readers! :-)


Donna Kelly

Infinite Log Shipping Chain? What about storing .trn files?

$
0
0

Hi,

any thoughts on the length of the log shipping chain . . . or, how many .trn files do you store?

So, On Sunday I do the full backup and restore to the DR box.  Then I kick off Log Shipping every 15 minutes . . . or even every minute (and why not?).  I accumulate (let's say) 10080 .trn files over the course of the week.  

What if I never re-seeded my DR box?

Should I ever bother storing any .trn files after a successful restore to DR?  What would be the point?  I cannot store an infinite number, so the chain is going to get broken atsome point . . . at which time no .trn files are of any use.

So, this question breaks down into two halves:

1.  Periodic re-seed (say, once a week) or not?  Obviously in this case I'd store all the .trns from the initial full restore onwards.  If I choose to do a periodic re-seed . . . then exactlywhy would I do that?  If not, why would I do that?

2. If I choose not, then what would be the value of storing any .trn files at all?

Your thoughts would be most appreciated.

A Merry Christmas and a Happy New Year to all our readers!


Donna Kelly

DB went to inaccessable mode and it's in restoring stats after remove from AG group.

$
0
0

Hi All,

I am facing the issue with DB after remove it from AG group.

1) I removed the DB from AG through GUI.

2) run the restore database db_name with recovery on secondary replica to make the DB online.

3) but getting below error. (Msg 3104, Level 16, State 1, Line 1
RESTORE cannot operate on database 'XXX' because it is configured for database mirroring or has joined an availability group. If you intend to restore the database, use ALTER DATABASE to remove mirroring or to remove the database from its availability group.
Msg 3013, Level 16, State 1, Line 1
RESTORE DATABASE is terminating abnormally.)

4) Then i ran the alter database xxx set partner off. got below error.

Msg 945, Level 14, State 2, Line 2
Database 'XXX' cannot be opened due to inaccessible files or insufficient memory or disk space.  See the SQL Server errorlog for details.

5) db is not allowing the alter command.

Msg 5052, Level 16, State 1, Line 2
ALTER DATABASE is not permitted while a database is in the Restoring state.

Pleae let me know how to trouble shoot the isse.

Thanks In advance,


rup

Database SSISDB- Log Shipping

$
0
0

Hi  All,

 we have  configured database 'SSISDB' (IS catalog DB) in log shipping  under DR availability  from PR to DR server , after  applying  SQL2K16-Sp1 on DR server,  we have faced some issue  related to SSISDB ( Read only-mode)& sql services are not  startup, so we have enabled T902 for startup sql services , Please let us know  , whether it is recommended to enable LS for SSISDB ( as per MS KB  it support  for AOG for alternate mirroring(https://msdn.microsoft.com/en-us/library/hh479588.aspx?f=255&MSPPError=-2147217396) , but not mentioned LS-DR  solution.


SQL cluster and AlwaysOn availability group

$
0
0

Hi all,

I have an interesting scenario.  I run a 2 node cluster (Windows 2012) with SQL 2012 SP1, which uses a SAN.  I need to create an availability group to have a set of these databases online on a standalone SQL server.  I have actually done this exact task in the past, but am struggling with an error message.  According to the Microsoft Technical group, this is 1 scenario which AlwaysOn may be used for (http://msdn.microsoft.com/en-us/library/jj215886.aspx).

So, I add a node to my 2 node cluster.  I then go into SSMS to configure the availability replica.  I get through the initial validation.  On the last step, I get an error.

-------------------

TITLE: Microsoft SQL Server Management Studio
------------------------------

Attempting to add availability replicas to the availability group resulted in an error. (Microsoft.SqlServer.Management.HadrTasks)

------------------------------
ADDITIONAL INFORMATION:

Create failed for Availability Replica 'USTAWVSHAGEMAN1\SQLCAD'.  (Microsoft.SqlServer.Smo)

For help, click: http://go.microsoft.com/fwlink?ProdName=Microsoft+SQL+Server&ProdVer=11.0.3000.0+((SQL11_PCU_Main).121019-1325+)&EvtSrc=Microsoft.SqlServer.Management.Smo.ExceptionTemplates.FailedOperationExceptionText&EvtID=Create+AvailabilityReplica&LinkId=20476

------------------------------

An exception occurred while executing a Transact-SQL statement or batch. (Microsoft.SqlServer.ConnectionInfo)

------------------------------

Failed to create, join or add replica to availability group 'ag8', because node 'standalone' is a possible owner for both replica 'cluster' and 'standalone\SQL'. If one replica is failover cluster instance, remove the overlapped node from its possible owners and try again. (Microsoft SQL Server, Error: 19405)

For help, click: http://go.microsoft.com/fwlink?ProdName=Microsoft%20SQL%20Server&ProdVer=11.00.3000&EvtSrc=MSSQLServer&EvtID=19405&LinkId=20476

------------------------------
BUTTONS:

OK
------------------------------

I can't seem to get by this error.  My disk configuration matches,  my user accounts to run SQL server identical on all machines involved.  I read somewhere that the standalone machine needed to be a named instance and not the default, so I added a named instance as well.  Everything brings me back to this error.

Failed to create, join or add replica to availability group 'ag8', because node 'standalone' is a possible owner for both replica 'cluster' and 'standalone\SQL'. If one replica is failover cluster instance, remove the overlapped node from its possible owners and try again. (Microsoft SQL Server, Error: 19405)

If anybody could please help, I would greatly appreciate it.

Can't move Primary Role to my secondary server

$
0
0

hi,

i'm testing SQL2014 AlwaysOn with 3 replicas (database name is POCDB)

- POC-SQL2014-L1 as primary role with automatic failure mode, sync commit, readable secondary

- POC-SQL2014-L2 as secondary role with automatic failure mode, sync commit, readable secondary

- POC-SQL2014-L3 as secondary role with manual failure mode, sync commit, readable secondary

- Group Listeners set as POC-POCDB at 1433 with static IP

all setup went okay and the cluster/availability group is working/healthy

the problem lies when 1 shutdown the L1/primary (power off the machine or stopping MSSQL service or cut off the network)

the "Automatic Failure Mode" didn't happen. i can still connect to the POC-POCDB but the primary role didn't switch to L2 (and the database POCDB still register as read-only)

the same thing happend if i failover manually. the process is success but there's an warning in Validating WSFC quorum vote configuration -> The current WSFC cluster quorum vote configuration is not recommended for this availability group.

how can i solve this?

sorry i'm new in this clustering thing, and would appreciate a few pointers to solve this problem



Always on Asynch

$
0
0

Will the redo Queue size get low when i take frequent TLOG backup?
Is there is any relation between the redo Queue and TLOG backup or check point?

Is there an alternative method to force synchronization between the primary/replica in async mode in always on.

Thanks

Revathi

HA with two SQL servers and a witness - SQL servers on separate subnets

$
0
0

Good afternoon, all!

Working on validating an SQL HA setup. We have two SQL 2016 servers riding on Server 2012 R2 instances in VMware. The master server and the witness share are on one subnet, and the second server (with no witness) is on a second subnet in a remote datacenter.

I did some updates to the master server and rebooted; the cluster seems to have responded correctly and failed the primary replica database over to the second server. However, this will require a manual failback once the desired primary server comes back online. Is there a setting or a method to get an automatic failback? We would like this to happen to avoid problems with the application. Failing this, can this AlwaysOn be configured with both DB servers as peers? So that whichever server responds the transaction will replicate to the other one and automatically resolve any concurrency problems?

Thanks for looking!

G

Arithmetic overflow error converting expression to data type int. [SQLSTATE 22003] (Error 8115)

$
0
0

Comments what could cause this SQL agent job failure

Date                      29/12/2016 11:41:44 a.m.

Log                         Job History (DBA)

Step ID                  1

Server                   SQL00

Job Name                            DBA

Step Name                          DBA

Duration                              00:00:02

Sql Severity        16

Sql Message ID  3621

Operator Emailed            

Operator Net sent          

Operator Paged

Retries Attempted          0

Message

Executed as user: GRP\SVC-SQLSERVER. Arithmetic overflow error converting expression to data type int. [SQLSTATE 22003] (Error 8115) 

The statement has been terminated. [SQLSTATE 01000] (Error 3621).  The step failed.


Muhammad Mehdi

Should SQL Server 2016 AlwaysOn automatic failover be completely transparent to the end user?

$
0
0

We are paying a hosting company to host our production environments.  They have setup the entire sql server 2016 Enterprise Edition AlwaysOn architecture for us.  I apologize if I misuse any of the always on terms as I am new to AlwaysOn and have just used Clusters previously.  We have a .Net website running on 2 web servers with a load balancer, a SQL Server back-end database running on sql server 2016.  The sql server 2016 is configured for always on in an availability group.  With 2 synchronous replicas in one data center and and an asynchronous replica in another data center on the other side of the country basically.  All things web related are VMs, all things SQL are dedicated physical boxes (at the recommendation of our hosting provider).  

My question/possible issue is:  when sql fails over automatically to the other synchronous server I sometimes experience (about 1/3 of the time) sql errors in the web application.  Mostly they are errors like "could not complete the request" or "database x is participating in an availability group and is not accessible for queries".  After a few seconds I can resubmit my request and the web site behaves normally.  Is this the expected behavior during a fail-over?  I was expecting slowness/lag but no error messages in the web application to the end user.  Again this is just fail over between the synchronous replicas in my group by simply rebooting the primary replica from the command line (shutdown -r) , then alt+tab over to a web browser immediately and clicking around in the web site (aka- 1 user, which is me and not a high volume of transaction being thrown at the db).  I  have not attempted failing over to the asynchronous replica yet.

Is this expected behavior?  I was under the impression I should see slowness during fail-over but no failed transactions/errors to the end user.  I know we can make the secondary replica readable but we are not currently licensed for that I am told by the hosting provider.  At the price we are paying for the hardware and licensing I am having a hard time believing this what we should be getting, especially during my single user test.  I just want to make sure we have the best possible system setup before we go live.  Thanks!

Database revert from snapshot failed when I use Azure Blob storage

$
0
0
Please see details
 
 
SELECT @@VERSION
 
Microsoft SQL Server 2014 (SP2-GDR) (KB3194714) - 12.0.5203.0 (X64)
                Sep 23 2016 18:13:56
                Copyright (c) Microsoft Corporation
                Developer Edition (64-bit) on Windows NT 6.1 <X64> (Build 7601: Service Pack 1) (Hypervisor)
 
User’s databases placed in Azure Blob storage.
 
Step 1
USE [master]
GO
 
CREATE DATABASE [MyDB]
ON  PRIMARY
( NAME = N'MyDB', FILENAME = N'https://***/MyDB.mdf')
LOG ON
( NAME = N'MyDB_log', FILENAME = N'https://***/MyDB_log.ldf')
GO
 
All is OK
 
Step 2
CREATE DATABASE [Snapshot_MyDB] ON (NAME = [MyDB], FILENAME = 'https://***/MyDB_Snapshot.ss') AS SNAPSHOT OF [MyDB]
GO
 
All is OK
 
Step 3
ALTER DATABASE [MyDB] SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
GO
 
All is OK
 
Step 4
RESTORE DATABASE [MyDB] FROM DATABASE_SNAPSHOT = 'Snapshot_MyDB';
GO
 
Msg 5120, Level 16, State 145, Line 17
Unable to open the physical file "https://***/MyDB_log.ldf". Operating system error 183: "183(Cannot create a file when that file already exists.)".
Msg 3013, Level 16, State 1, Line 17
RESTORE DATABASE is terminating abnormally.
 
 
Is there any possibility to restore database from snapshot?
Is it a bug?

SQL Server services restarting with error SQL Server component 'query_processing' health state has been changed from 'clean' to 'warning'

$
0
0

Hi Experts,

One of our cluster box is on Windows Sever 2008 R2 Enterprise where we have Microsoft SQL Server 2012 (SP2) services on one node and SQL Server 2012 Analysis server on the second node. Its and active-active setup. Recently we are facing problem for frequent restart of SQL services. During last 2 months it has restarted almost 5-6 time.

We are not able to find out any error in windows event logs, SQL server logs which might give reason of restarting SQL services. 

We checked cluster log we found below error messages where “SQL Server component 'query_processing' health state has been changed from 'clean' to 'warning'” just before the SQL services restarted.

Server CPU, Memory utilization was normal at that time.

If anyone has faced similar problem and have resolution for the same will be really helpful.

Below is the Cluster log at the time of Failure

2016/12/22-13:12:15.029 INFO  [RES] SQL Server <SQL Server>: [sqsrvres]SQL Server component 'query_processing' health state has been changed from 'clean' to 'warning' at 2016-12-22 13:12:15.027
2016/12/22-13:12:35.044 INFO  [RES] SQL Server <SQL Server>: [sqsrvres] SQL Server component 'query_processing' health state has been changed from 'warning' to 'clean' at 2016-12-22 13:12:35.043
2016/12/22-13:14:02.033 INFO  [NM] Received request from client address SERVERNAME.
2016/12/22-13:16:58.242 ERR   [RES] SQL Server <SQL Server>: [sqsrvres] Failure detected, diagnostics heartbeat is lost
2016/12/22-13:16:58.242 INFO  [RES] SQL Server <SQL Server>: [sqsrvres] IsAlive returns FALSE
2016/12/22-13:16:58.242 WARN  [RHS] Resource SQL Server IsAlive has indicated failure.
2016/12/22-13:16:58.242 INFO  [RCM] HandleMonitorReply: FAILURENOTIFICATION for 'SQL Server', gen(10) result 1.
2016/12/22-13:16:58.242 INFO  [RCM] TransitionToState(SQL Server) Online-->ProcessingFailure.
2016/12/22-13:16:58.242 ERR   [RCM] rcm::RcmResource::HandleFailure: (SQL Server)
2016/12/22-13:16:58.242 INFO  [RCM] resource SQL Server: failure count: 2, restartAction: 2.
2016/12/22-13:16:58.242 INFO  [RCM] Greater than restartPeriod time has elapsed since first failure, resetting failureTime and failureCount.
2016/12/22-13:16:58.242 INFO  [RCM] Will restart resource in 500 milliseconds.
2016/12/22-13:16:58.242 INFO  [RCM] TransitionToState(SQL Server) ProcessingFailure-->[WaitingToTerminate to DelayRestartingResource].
2016/12/22-13:16:58.242 INFO  [RCM] rcm::RcmGroup::UpdateStateIfChanged: (SQL Server (MSSQLSERVER), Online --> Pending)
2016/12/22-13:16:58.242 INFO  [RCM] TransitionToState(SQL Server Agent) Online-->[WaitingToTerminate to OnlineCallIssued].
2016/12/22-13:16:58.242 INFO  [RCM] TransitionToState(SQL Server Agent) [WaitingToTerminate to OnlineCallIssued]-->[Terminating to OnlineCallIssued].


SQL 2012 clustered analysis services issue

$
0
0
Hi,

We have two SQL 2012 Enterprise servers configured in an active\passive cluster on top of Windows 2012 R2 cluster. We have 4 SQL clustered instances installed and an instance for Microsoft DTC. All instances have below components installed:

SQL DB engine, SQL agent, SQL reporting and SQL integration.

One of them has additionally Analysis services installed and the service is started. When we run SQL best practice analyzer on this instance, we receive the below error:

Analysis services: The instance being scanned exist for sql server version which is not supported

Category: Prerequisites

Source : localhost

Issue: incorrect sql server version in use

impact: analysis cannot be performed

resolution: install sql server 2012

I noticed that if the SQL instance that has the issue (the analysis services) is active on node 1, I receive the error mentioned above in SQL best practice analyzer and I cannot login to the analyses service using SQL management studio with error below:

No connection could be made because the target machine actively refused it xxx.xxx.xxx.xxx:51650 (system) 

but if I moved the sql instance to node 2, I do not receive the error in SQL best practice analyzer and I can login to the analyses service using SQL management studio.

Can you please advise me on this issue...


Sql server 2016 in clusterd Hyper-V with local storage

$
0
0

Hi,

  my plan is to install a new Hyper-V  2016 cluster. My plan is to run Exchange 2016, SQL 2014/2016 (Navision C5) Exchange

  sever 2016, Domain controller 2016 and Terminal server 2016. I will be using two identical Tower servers HP Proliant ML350

  Generation 9, and to have two Physical drives i.e. C drive  300GB for host system and HyperV sso (fast disks) mirrored and E drive   1.5TB sso  Raid10  for VM's.

  What confuses me is if I create a two not Hyperv-v cluster do I need to create a DAG for the Exchange server or SQL cluster as 

  well? Is the Hyper-V cluster covering a Hight Availabilty for all the servers on it.?

  thanks


Erro

SQL Failover cluster instance with mutliple instance - Storage

$
0
0

Hi Team,

We have SQL FCI (multiple instances) running on windows server 2012 R2 WFSC with shared storage between node 1 and node 2 in Main Datacenter. We have separate node 3 in DR (standalone SQL with its Storage and multiple instances). Node 3 is for AG secondary replica.

I need some pointers on using same drive with multiple mount points with Multiple SQL Instances.

Is it possible to used the same drive with mount points for multiple SQL Instances in failover Cluster?

So if i have 2 workloads SCCM and SCOM. So I'll have drive as J: and then have LUNs mapped for DB and Logs as mount point in J drive for both workloads?

SCCM: J (LUN):\SCCM\Database(LUN)

          J:\SCCM\Logs

SCOM: J:\SCOM\Database

          J:\SCOM\Logs

So all LUNs are mounted inside J:\ as mount points and use with multiple SQL instances.

While installing SQL failover instance for SCCM I'll give J:\SCCM\Database (For DB) and J:\SCCM\Logs (For Log)

While installing SQL failover instance for SCOM I'll again give J:\SCOM\Database (for SCOM DB) and J:\SCOM\Logs (For SCOM Logs)

Is it possible to achieve it? Appreciate any pointers

Regards

Shared storage with mutiple SQL Instance - DB and Log placement recommendations

$
0
0

Hi Team,

I know we can have DB and Logs on the same LUN for SQL Instance on failover cluster. But it is not recommended as per my understanding. I am looking for any Microsoft link that explains the issues SQL might have if DB and Logs are kept on the same storage LUN in failover cluster. Thanks

Regards

SQL 2014 FCI and Availability Groups guide

Always On

$
0
0

Hi All

Let me explain the our environment and will tell you the issue what we are facing

We have always configured with FCI+Stand Alone

FCI - has three nodes and one shared drive for Quorum and Stand alone - Act as DR.

Node1 - A ; Node2 - B; Node3 - C - (Primary Replica)

Node4- D- (Secondary)

We have configured AG for a set of Databases in this. And we are facing an issue when we are performing the test in this environment. We are trying failover from Node1 (A) to Node (B) it works fine without any issue. Even after the Failover the AG group automatically fails over to Node2(B) and SQL is up and online.

And when we perform the failover from Node2 (B) to Node1 (A) or Node3 (C)SQL Services are coming online without any issue but Always on Services were not failed over to respective nodes and we see the AG is resolving state and also Databases are in Recovery pending. And when I stop the AG role and start it back and it come online with out any issue on the respective node and I could see the databases online and able to access with out any issue

Can any one suggest on this

Regards

Revathi J


Viewing all 4689 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>