Quantcast
Channel: SQL Server High Availability and Disaster Recovery forum
Viewing all 4689 articles
Browse latest View live

SQL 2012 installation for Failover Cluster failed

$
0
0

While installation of SQL 2012 on FOC validation fails on "Database Engine configuration" page with following error:

------------------------------
The volume that contains SQL Server data directory g:\MSSQL11.MSSQLSERVER\MSSQL\DATA does not belong to the cluster group.
------------------------------

Want to know how does SQL installation wizard queries volumes configured with Failover Cluster. does it:

- Enumerate "Physical Disk" resources in FOC

- does it enumerate all Storage Class resources in FOC for getting the volume list

- or it depends on WMI (Win32_Volume) to get volumes ?

The wizard correctly discovers volume g:\ in its FOC group on "Cluster Resource Group" and "Cluster Disk Selection" page. but gives the error on Database configuration page.

Any help in this would be appreciated.

Thanks in advance

Rakesh


Rakesh Agrawal


SQL 2008 R2 Cluster | change Cluster Disks

$
0
0

Hello,

We have a SQL cluster (for SharePoint 2010) consists of two nodes Windows server 2008 R2 & SQL 2008 r2.

The SQL cluster have MSDTC (Clustered) & SQL services, with total four disks:

Quorum Disk

MSDTC Disk

Databases disk

Logs disk.

Now the old SAN will be decommissioned and new LUNs have added to replace the our disks above. I managed to change Quorum & MSDTC. I used the below robocopy command to copy databases and logs with the same folder structure and permissions:

robocopy t:\ l:\ /E /MIR /SEC /COPYALL /V

I stopped SQL services then swapped drive letters , when I start SQL services it starts without problems (using the new Disks).

But the issue is when I connect to SQL management studio, all databases are in suspect mode. I know there some SQL query to be run against each database , but this a production environment and I don't want to mess with it.

Is there any other way to change cluster disks of SQL cluster? or use the above method without getting into suspect mode?


Thanks, Shehatovich

SQL Cluster unexpected failover

$
0
0

So we had one of our SQL clusters unexpectedly failover recently. Second time in a few months. Two node active/passive SQL 2012 cluster running on Windows 2012 Standard.

Here's what we could cull from the application/system logs?

1. "

Cluster resource 'SQLServer' of type 'SQL Server' in clustered role 'SQLServerRole' failed.

Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it.  Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet."

2. "

Cluster resource 'SQLServer' (resource type 'SQL Server', DLL 'sqsrvres.dll') did not respond to a request in a timely fashion. Cluster health detection will attempt to automatically recover by terminating the Resource Hosting Subsystem (RHS) process running this resource. This may affect other resources hosted in the same RHS process. The resources will then be restarted. 

The suspect resource 'SQLServer' will be marked to run in an isolated RHS process to avoid impacting multiple resources in the event that this resource failure occurs again. Please ensure services, applications, or underlying infrastructure (such as storage or networking) associated with the suspect resource is functioning properly."

3. "The cluster Resource Hosting Subsystem (RHS) stopped unexpectedly. An attempt will be made to restart it. This is usually associated with recovery of a crashed or deadlocked resource.  Please determine which resource and resource DLL is causing the issue and verify it is functioning properly."

4. "A timeout (30000 milliseconds) was reached while waiting for a transaction response from the MSSQLSERVER service."

Cluster.log wasn't much more helpful on the root cause either:

"

00000f28.00001c78::2014/12/04-21:25:54.662 INFO  [RES] Network Name <Cluster Name>: Netbios: Slow Operation, FinishWithReply: 0
00000f28.00001c78::2014/12/04-21:25:54.662 INFO  [RES] Network Name:  [NN] got sync reply: 0
00000f28.00001c78::2014/12/04-21:25:54.662 INFO  [RES] Network Name <Cluster Name>: Netbios: End of Slow Operation, state: Initialized/Idle, prevWorkState: Idle
00000f20.00000e94::2014/12/04-21:25:55.240 INFO  [RES] SQL Server Agent <SQL Server Agent>: [sqagtres] IsAlive request.
00000f20.00000e94::2014/12/04-21:25:55.240 INFO  [RES] SQL Server Agent <SQL Server Agent>: [sqagtres] CheckServiceAlive: returning TRUE (success)
00001134.000001d8::2014/12/04-21:25:57.287 ERR   [RES] SQL Server <SQLServer>: [sqsrvres] Failure detected, diagnostics heartbeat is lost
00001134.000001d8::2014/12/04-21:25:57.287 INFO  [RES] SQL Server <SQLServer>: [sqsrvres] IsAlive returns FALSE
00001134.000001d8::2014/12/04-21:25:57.287 WARN  [RHS] Resource SQLServer IsAlive has indicated failure.
00000880.0000161c::2014/12/04-21:25:57.303 INFO  [NM] Received request from client address HOST-XXX-SQL02.
00000880.0000161c::2014/12/04-21:25:57.303 INFO  [RCM] HandleMonitorReply: FAILURENOTIFICATION for 'SQLServer', gen(3) result 1/0.
00000880.000023a4::2014/12/04-21:25:57.303 INFO  [GEM] Sending 1 messages as a batched GEM message
00000880.0000161c::2014/12/04-21:25:57.303 INFO  [RCM] Res SQLServer: Online -> ProcessingFailure( StateUnknown )
00000880.0000161c::2014/12/04-21:25:57.303 INFO  [RCM] TransitionToState(SQLServer) Online-->ProcessingFailure.
00000880.0000161c::2014/12/04-21:25:57.318 INFO  [RCM] rcm::RcmGroup::UpdateStateIfChanged: (SQLServerRole, Online --> Pending)
00000880.00001db8::2014/12/04-21:25:57.334 INFO  [GEM] Sending 1 messages as a batched GEM message
00000880.0000161c::2014/12/04-21:25:57.334 ERR   [RCM] rcm::RcmResource::HandleFailure: (SQLServer)
00000880.00001db8::2014/12/04-21:25:57.334 INFO  [GEM] Sending 1 messages as a batched GEM message
00000880.00000bac::2014/12/04-21:25:57.334 INFO  [RCM] ignored non-local state Pending for group SQLServerRole
00000880.0000161c::2014/12/04-21:25:57.350 INFO  [RCM] resource SQLServer: failure count: 1, restartAction: 2 persistentState: 1.
00000880.0000161c::2014/12/04-21:25:57.350 INFO  [RCM] Greater than restartPeriod time has elapsed since first failure of SQLServer, resetting failureTime and failureCount.
00000880.0000161c::2014/12/04-21:25:57.350 INFO  [RCM] Will queue immediate restart (500 milliseconds) of SQLServer after terminate is complete."

Any ideas? Anywhere we could look for more specific info? Any preventative measures we could take?

Thanks,

Ryan

BACKUP LOG suddenly failed with Msg 35250, Level 16, State 11 The connection to the primary replica is not active. The command cannot be processed.

$
0
0

I have AlwaysOn SQL 2012 Enterprise set-up using Windows Failover Clustering Services (not FCI), and have 1 Primary node (P), 1 Synchronous Commit Auto Failover (SC), and 1 Asynchronous Commit Manual Failover (AC) node.  It is set up to prefer secondary, with the highest priority given to AC node.

I am using Ola Hollengren's scripts for Database Maintenance jobs, including a native BACKUP LOG job for the transaction logs of all user databases on a 1 minute schedule.  His scripts already consider AlwaysOn, and although the job is set-up on all 3 nodes, only ever runs on AC node.

The job has been running successfully since initial set-up almost 1 year ago, but suddenly yesterday morning started to fail with the following error, only on 1 of the 13 databases in my availability group:

Date and time: 2014-06-08 09:36:11
Command: BACKUP LOG [my_db] TO DISK = N'E:\MSSQL\\Transaction Dumps\my_db\MySQLCL$MySQLAG_my_db_20140608_093610_U_LOG.trn' WITH CHECKSUM, COMPRESSION
Msg 35250, Level 16, State 11, Server AC, Line 1
The connection to the primary replica is not active.  The command cannot be processed.
Msg 3013, Level 16, State 1, Server AC, Line 1
BACKUP LOG is terminating abnormally.
Outcome: Failed
Duration: 00:01:00

The other 12 databases continued to backup successfully.

Checking the Availability Group dashboard, windows event logs, and SQL Server error logs, including Failover Cluster events showed no issues.

However, monitoring software (Idera SQLdm) showed blocked sessions on P node.  When I ran sp_who2, it showed that a background process was being blocked by another background process with an HADR BACKUP LOCK.

Since both processes were background processes, I was unable to kill either process.  I temporarily disabled the transaction log backup job, but the blocked process was still active.

I ran DBCC CHECKDB (my_db) WITH all_errormsgs, no_infomsgs, data_purity on both P and AC nodes, with no errors.  However, on AC node, it also showed 1 transaction rolled forward and 0 transactions rolled back.  This also had the effect of releasing the blocked background process, but another background process was now blocking with the same HADR BACKUP LOCK.

I tried to restart SQL Server Agent on AC node, which did not immediately seem to work.  However, after a few minutes, I noticed that the block had disappeared.  I re-enabled the transaction log backup job on AC and it started working normally again.  The error has not occurred again, but I am at a loss as to what happened, and how to prevent it from happening again.

Any help would be greatly appreciated.


Diane

Log Shipping setup

$
0
0

Hi Guys ,

Sorry I may be asking a very basic and probably a duplicate question on the forum but I do have a doubt on the log shipping.  When we setup log shipping we need to define a shared storage area for the log file backup to be kept for the secondary server to pick.  I have a database that is 600GB+ in allocated space.  Now when I first setup the log shipping  there is the option of doing the database backup and restore from within the wizard. My queries are based on this option:

1. Is it recommended to use this option to create the Secondary DR instance ?

2. What is the performance impact of the above.

3. If in the case I enable and setup the Log Shipping on an empty Instance and then I run the data import into the primary database, will it affect the import speed ?

4. If we use this option what should be the size of the shared disk where the backup of the log files are to be kept. Should this shared disk be 600GB+ for the first time and then just enough to keep log file backups based on the policy followed by customer.

Thanks & regards

Ravinder


Ravinder

Disaster Recovery

$
0
0

Win 2012 /SQL 2014

We're in the process of bringing up a 2nd colo location for disaster recovery purposes.  We want to have our SQL Server dbs somehow shipped to the 2nd location as well.  I've used replication quite a bit in the past and it's a pain in the butt and far from what I'd call reliable, especially over a long wire.  Ok so maybe replication is very easy to setup but it's far from what I'd call robust, and we frequently need to re-initialize publications because counts don't match.  Sometimes Replication Monitor reports an error but most often it doesn't.  We need a robust solution.  Our main database is 3 TB too, so re-initializing over the wire isn't an option.

I've just started reading about AG's and this seems like the solution we want but it also seems like a lot of work to setup and then work to monitor that it's working properly and troubleshoot issues when they happen.  I watched Brent Ozar's 30 min video on real life lessons and he even mentioned that AG's are a lot of work.  I put some stock in it if Brent is saying it.

So my question is which is the best solution for Disaster Recovery?  Is Log Shipping still used these days or is that old school?  What about using LiteSpeed to apply the tranlog backups at the 2nd colo?  In an ideal world I would like something that is fairly easy to implement, doesn't require a lot of babysitting (so I can do my other development work) and is reliable.  Is that AG?

I've read multiple articles about AG, including the articles in BOL and several others on the Web, so unless you have an amazingly super fantastic article I don't need links.  What I'm really after are real-world scenarios like mine, and what you have implemented for DR and how is it working.

Thanks in advance.


André

Am I backing up the Service Master Key correctly?

$
0
0

I am trying to follow the System Center Orchestrator DR guide regarding backups and they emphasize capturing the Service Master Key as part of the requirements:

http://technet.microsoft.com/en-us/library/hh852622.aspx

Using SQL Server Management Studio, I have connected to the Orchestrator instance, selected the Orchestrator db and run the indicated T-SQL example:

BACKUP SERVICE MASTER KEY TO FILE = 'c:\temp_backups\keys\service_master_key' ENCRYPTION BY PASSWORD = '3dH85Hhk003GHk2597gheij4';

However, when I examine the resultant file as a sanity check it is of course encrypted.

I'm really just curious if my methods are sound. I gather that there is a separate service master key for each instance, but does it matter which database is selected when I execute the query in SSMS? Is there anything else I may need to know? The documentation is pretty basic:

http://msdn.microsoft.com/en-us/library/ms190337(v=sql.110).aspx

Thank you for your help.

SQL Server File stream setup error in Cluster.

$
0
0

We have a two node SQL Server cluster  (windows 2012 R2 and SQL Server 2014 RTM – CU3). We have set up File stream in our cluster , remote client is enabled in both nodes.

Enabled Files tream for Transact-SQL access

Enabled File  stream for file I/O access

 

I can successfully query the file tables and when I try ‘Explore FileTable Directory’ in  Management I am getting below error.

We don’t have this error is SQL standalone instance.

The shared folder properties is showing below error When we open windows fail over cluster manager on the Roles --SQL instance -- Share .

 

Help is appreciated.


Recover data from .mdf and .ldf file?

$
0
0

Hello!

Hopefully someone can help me with the following (potentially huge) problem:

We've got a simple database application running on microsoft sql desktop engine. This database contains two tables. Up until now all worked fine, but probably due to a programma that crashed part of the database seems to be corrupted or broken while today only one table contains/returns data and the other table does not return any values!

When I open the .ldf and .mdf files in notepad I see that there is data belonging to both the tables. But how do I recover it?? 

The max. size of the database has not been reached by far (it's only 9Mbs), the sql queries are correct.

This .ldf file is a log file (?) I can't find detailed info about the principles of these .ldf and .mdf files on the internet, is it possible to trace back what has happened en recover the database that way?

 

Thanks in advance.

Greetings,

Rens Voogd

 

SQL 2014 Clustered with CSV

$
0
0
I just built up a test two node clustered environment with Server 2012R2. Everything seems to be functioning, however, the default file paths seem inaccessible within SQL tools. Meaning, if I try to specify the backup location to restore, I get an error that I cannot access the specified path on the server (C:\ClusterStorage\Instance\SQL\...) - then then the file structure is empty after I select OK. If I create a database new, the files get put in the default locations fine as well as when I'm on the server, I can access the files fine via the CSV path.

My thought is, the SQL cluster is not communicating properly with the CSV. In my prod 2008 R2 cluster, the disks are a dependency of the SQL service while in Windows 2012 R2, it is not (since it is using CSV).

I followed http://blogs.msdn.com/b/clustering/archive/2014/05/08/10523860.aspx - which really doesn't differ too much than older versions of Failover Clustering.

Any thoughts?

Use SMO (C#) to Drop a DB Joined to an Availability Group

$
0
0

I am having a heck of a time using SMO in C# to drop a database that is joined to an Availability Group. This is currently being done using dynamic SQL and we do the following...

for each secondary

  ALTER DATABASE db SET HADR OFF

  DROP db

on primary

  ALTER AVAILABILITY GROUP ag REMOVE DATABASE db

  DROP db

I am trying to replicate this behavior in SMO, but nothing seems to work. I've tried every combination of SuspendDataMovement, LeaveAvailabilityGroup and Drop that I can think of, and each time I just get back a FailedOperationException that says that the operation failed, but does not provide any more information. Does anyone have any experience using SMO to drop a database that is joined to Availability Group, and, if so, can they please share it with me?


SQL daily maintainance and remote backup.

$
0
0

Hi,

I have recently inherited a backup environment (commvault) at my new Job, and i have a question about backing up transaction logs.

I am responsible for guaranteeing a complete disaster recovery of a couple of our critical SQL servers, which I have been accomplishing with commvault, with a daily full backup of the database, and a backup of transaction logs every 30 min. The problem is my transaction logs need to be copied to our DR location, over the WAN, every 30 min, and when the daily maintenance job runs every evening at 7pm, it re indexed the database, creating a huge transaction log that is almost the size of the database, which takes 5-10 hours, or longer to copy over to our DR location, this is unacceptable.

Is there anyway to keep the reindex from creating such a huge transaction log, which to commvault appears to be all changed data?

I am new to SQL, but i assume there has to be solution to this, as i would think anyone with a short SLA on SQL would run into this.

High Availability

$
0
0

Hello:

I currently have one SQL Server 2012 enterprise edition that the whole State Agency relies on.  We have redundant web servers with load balancing and I am able to shut down one web server - update its code without our end users noticing (they automatically go to the other web server because our WAF senses that the server is down).. So I am looking for a similar solution for my databases, so that when I shutdown the database for maintenance (service packs - differential backup which locks tables, etc...) , the end users would not notice.

It seems like I need duplicate and redundant datastores.

There might be several approaches for that - what are your thoughts.

Thank you


Support

what is index and explain types of index ?

$
0
0
Hi,pls explain me any one about index and its types 

Backup Suspended

$
0
0
I am facing issue in one of my production server. Backup of a database appear suspended it is not progressing after attaining 99%completion. I restarted the sql service and again ran the backup for the db still it is getting suspended when it reached 99%completion. size of the db is 1 GB. It is in suspended state for last 6 days. iam facing issue in sql server 2008 enterprise edition. i need help on this and wait type is CMEMTHREAD 

SQL Server multiple data centres - Synchronization

$
0
0

Hello,

I am new to MSSQL. We have a web application that used MSSQL 2012 as backend. We now plan to have the same application hosted in another data centre. We are able to provide the geo-redundancy/HA for the web application. 

But we also need to make sure that our SQL databases are always in SYNC. The web application in primary data centre is always going to serve all requests. The backup application will only be online if the primary data centre server is down or in maintenance mode (once every two weeks).

How can we achieve the SQL replication (both ways) over the WAN making sure that all data is up-to date. We have users constantly updating information so data is written frequently to the DB (every 5-10 minutes on average).

Thanks


SinghP80

Sql Server High availability failover trigger

$
0
0

Hello,

We are implementing sql server 2012 availability groups (AG). Our secondary databases are not accessible in order to save licenses.
We have a lot of issues concerning monitoring, backup and SSIS. They all come down to the fact that they want basic information from the secondary, that is not accessible. We are implementing SSIS, which is supported on AG, but the SSISDB is encrypted.

Backup problem

The secondary instance does not know anything about the backups made in the primary instance. After a failover differential backups fail.

SSIS problem:

There is a blog (http://blogs.msdn.com/b/mattm/archive/2012/09/19/ssis-with-alwayson.aspx) that suggest to make a job that checks whether the status has changed from secondary to primary. If so, you can decrypt and encrypt again. This job has to be executed every minute. Which is way too much effort for an event that happens once in a while.  There are a few other problems with this solution. The phrase "use ssisdb" has to be included in the a job step. And the jobstep fails. The secondary is not accessible.

Monitoring problems:

We use Microsoft tooling for monitoring: SCOM. Scom does not recognize a non readable secondary and tries to login continuously.

There are a few solutions that I can think of:

-  sql server build in failover trigger

-  Special status of secondary database.

Failover trigger:

We would like a build-in failover trigger, in stead of a time based job, that starts a few standard maintenance actions if only at the time (or directly after) a failover has occurred. Because now our HA cluster is not really high available until :

- SSISDB works and is accessible after failover
- Backup information is synchronised
- SCOM monitoring skips the secondary database (scom produces loads of login failures)

Does anyone have any suggestion how to fix this?


AlwayOn 2014 - Suspected DB

$
0
0

HI,

I have an always on sql server 2014 developed with some databases.

Today I realized the log file of one of the DBs in primary replica filled the disk drive and this caused that the mirrored DB in secondary  replica become suspected.

Even after solving the problem at the primary, the mirrored DB is still in the suspected mode.

We are all sure that we can not avoid this issue (filled up disk drive), so please let me know how to get rid of this situation.

Regards

Change the IP Addresss of Servers that are members of a Windows Failover Cluster and SQL cluster

$
0
0

Hi all,

I have 3-nodes Windows Server 2012 R2 Cluster, on top of it I have two clustered named SQL 2012 instances installed. We need to change the IPs and subnet for the three physical servers incorporated in the Windows cluster. What will be the impact on the Windows and SQL clusters? What extra configuration is needed in both Windows and SQL clusters to make sure the service will not fail after changing the IPs?

Adding Failover clustering or availablility groups later?

$
0
0

My apologies if this has been answered elsewhere.

We are looking at building new a server environment, but don't want to implement HA right now.  We will want to implement HA (either AlwaysOn failover clustering OR availability groups) down the line.  Is there anything we need to be aware of or to do when we build the new servers and OS?  We will be working with Windows Server 2012 SP1 and SQL Server 2012 SP1. I just want to make sure we can add the AlwaysOn features later.

Thank you!

-Peter

Viewing all 4689 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>