Recently Microsoft published an Exchange Preferred Architecture. This post on the ehlo blog explains what Microsoft thinks is the best way to deploy Exchange 2013. This is a great post packed with awesome information that I think everyone who would bother with my silly little blog should go out a read right away. It’s OK, I’ll wait…

So how was it? Pretty awesome right? I almost completely agree. All of it except that one little bit about the 3^rd site witness. Everything else is that post is absolutely the correct way to deploy Exchange 2013. I’m not even saying a 3^rd site witness is a universally bad idea. I’m just saying there is more to the story, and other some other factors need to be considered before deploying a DAG in this configuration. Let’s get into it.

For the purpose of this blog post, I am going to assume you have read The Exchange Preferred Architecture post on the ehlo blog. My issues with this blog post all come down to one sentence

If your organization has a third location with a network infrastructure that is isolated from network failures that affect the site resilient datacenter pair in which the DAG is deployed, then the recommendation is to deploy the DAG’s witness server in that third location.

In my opinion, it is a pretty big ask to expect an Exchange admin to understand his/her company’s network infrastructure well enough to properly make that determination. I like to think of myself as a pretty accomplished Exchange architect, and in most cases I don’t understand customer’s networks well enough to make the determination of if a 3^rd site witness is going to be beneficial or detrimental in the event of a datacenter failure. Furthermore, testing a network outage between datacenters is almost always going to be off the table.

Clearly customer want automatic datacenter failover. I completely understand that being able to say that we can lose an entire datacenter in the middle of the night and no one will notice is a huge selling point. The Exchange product group has done a great job of ensuring that Exchange 2013 can support automatic datacenter failover, and I commend them for that. If, however, the customer is not sure what will happen (or how long whatever does happen will take) in the event of a network failure, then a 3^rd site witness can make a bad situation worse. Personally I don’t see the benefit gained from an automatic site failover to be universally more valuable than the predictability of a FSW + node majority at a primary datacenter. The process of manually activating a secondary datacenter is not all that onerous if you are prepared for it.

I’m sure many will disagree with me on that point, and as long as they go into this configuration with their eyes open that is fine. I just want to avoid situations where a network failure causes unexpected cluster behavior. Let me demonstrate what I mean…

I’ll use the example of an Exchange 2013 deployment with 4 multi-role servers all members of a single DAG. The 2 Exchange servers will be deployed to two datacenters. I’ll walk through the relevant failure scenarios for this deployment with and without a 3^rd site witness, and how to recover from those failures.

The simple explanation of this setup is that with a four node cluster you need 2 nodes and the FSW to have quorum. In truth, it’s a little more complex than that. Each node in the cluster gets a vote quorum vote, but the FSW itself does not really get a vote. What happens is that the cluster node that becomes the PAM locks the FSW and grants itself two votes. This behavior is the same for if the FSW is located at the primary site, or a 3^rd site. All the below scenarios assume that the Exchange PA is followed in all regards except the 3^rd site FSW, and that DAC mode is active.

No 3^rd site witness (witness server at Site A)

In the “traditional” model, the failure behavior is pretty easy to understand. If Site A (with 2 nodes and the FSW) goes down, you lose quorum and all the databases go offline. If Site B (with two nodes and no FSW) goes down, you do not lose quorum and all databases come online at Site A. If the network connecting Site A and Site B goes down, then Site B loses quorum and Site A retains quorum. All the databases come online in Site A.

In the failure cases of Site B or the network connecting Site A and Site B, recovery is automatic. All databases will come online at Site A, and all clients with connectivity to those servers will remain online. In the case of a Site A failure, manual intervention will be required. The recovery process would be to run Stop-DatabaseAvailabilityGroup at both sites (This may not be possible at the failed site, if that is the case don’t worry about it), then run Restore-DatabaseAvailabilityGroup at Site B (Start-DatabaseAvailabilityGroup can also be used for DAG recovery. I’ll explain the differences in a later post).

3^rd site witness

In the Exchange PA model, things are more complex. We still have Site A and Site B, each with 2 DAG nodes. We still have the network connecting Site A and Site B. Now we add Site C (FSW site), and we add network connections between Site A and Site C, and Site B and Site C.

In the case of failure of Site A or Site B, everything comes up on the other site automatically. In case of failure of Site C, the DAG will retain enough votes to keep quorum and everything will stay up where it is. These three site failure scenarios are why Microsoft recommends this deployment. Failure of any two of the three sites will cause loss of quorum, and require manual recovery as detailed above for the failure of Site A without a 3^rd site FSW.

The problem is with network failures between these three sites. For instance, what happens if I lose the network between Site A and Site B? If the site that has the PAM (hopefully that will be the site with the DAG node with the lowest cluster ID, but it may not be) still has connectivity to Site C, then that site will retain quorum and all databases will come on-line at that site. If the site with the PAM does not still have connectivity to the 3^rd site witness, then all DAG nodes will lose quorum and manual recovery will be required. Microsoft’s Exchange PA does not consider this possibility because it assumes redundant links between all sites on different ISPs. I do not think it is reasonable to assume this level of network redundancy. More importantly, I do not think it is safe to assume that Exchange admins will understand all these possibilities when deploying Exchange 2013. Furthermore, there is almost no chance an Exchange admin is going to be able to test the failure of networks connecting 3 datacenters.