Quantcast
Channel: SQL Server High Availability and Disaster Recovery forum
Viewing all articles
Browse latest Browse all 4689

SQL 2008 R2 cluster disks offline after SAN port config

$
0
0

Hi all,

We have a 2 disk, 2 node cluster (SQL 2008 R2) on Windows Server 2008 R2.

Last week while I was out, one of our techs plugged an additional network cable into empty #2 ports our HP P2000 G3 SAN in an attempt to add additional network connectivity.

He didn't see any lights when he plugged it into B2, so he unplugged it an plugged it into A2. If I understand right, still no lights came on.

However, after he did that, one of our SQL clusters failed and we can't figure out how to bring the 2 Cluster Disks back online. He unplugged the new cable, so it should physically be back the way it was. They rebooted the nodes, but the cluster is still offline.

Interestingly, the Quorum/witness disk is online and the SQLCLUSTER server's IP address is up, and the two network cards for the nodes are up. We can ping all the IP addresses.

But, Cluster Disk 2, Cluster Disk 3, SQLCLUSTER, FileServer-SQLCLUSTER, SQL Server (Other Resources), and SQL Server Agent are down. When clicking on "Nodes", it says both nodes are up. But when clicking on Node01, it says SQLCLUSTER Failed. Clicking on Node02 says "there are no services and applications hosted on this node."

We can see the 2 drives the cluster uses when looking at Disk Management, but the two disks are in a Reserved state and there's no GUI option to bring them online. However, the disk names are Disk 3 and Disk 4. We can also see Disk 2, which from the size of the disk, is what the Quorum is using.

I tried DISKPART ATTRIBUTES CLEAR DISK READONLY on the two disks, but both tries failed.

I tried Clear-ClusterDiskReservation -Disk 3, which did nothing even after rebooting. It didn't even report an error.

We've chatted with cluster and SQL techs from Microsoft, but they've forwarded our ticket on to their disk management team. We've been waiting a few hours for them to call back. The initial failure happened on Feb. 4, so users and management are pretty upset this production server has been offline for so long.

Does anyone have a suggestion for getting the cluster to recognize these drives?

I've seen a suggestion that the GUID on the drives could be different than the GUID on the object in AD. To fix it, it says to remove the object from AD, then use ADRestore. Here's a link to the article. I'm hesitant to try this, because we don't have a CN=DeletedObjects container in AD.

I've thought about going in the Failover Cluster Manager > clicking on Storage > right clicking on the disks > choose Remove from Cluster > then add them back. However, I'm not sure if they'll be available to add back, or what that would do to the data.

Suggestions?

Thanks in advance,

Sonya


Viewing all articles
Browse latest Browse all 4689

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>