Master/Slave vs. Peer-to-Peer

I've got an interesting problem facing me now. I now have multiple servers - on different continents, and it would be nice to be able to have them act as one unit. This isn't a requirement of the project, the requirement was simply that we have a BCP (a.k.a. disaster/recovery) site for the server, and that's done. But there's a step that we can take, if we want to, and that's to make the system better because of the existence of the other server - improve response times, share the load, etc. The problem is in how?

The first obvious solution is to do nothing. Not very creative, but it's the lowest-cost solution and has to be considered given that we have limited people and time available to us.

The next might be some kind of master/slave set-up. Something where all the data still comes into the main server, and then after it's done verifying it, processing it, something is sent to the other one to keep it up to date. This has the benefit of being the only solution where we know that the second (and third) server's state is - because we set it. No inputs other than price are allowed to this read-only, slave server. This keeps things under control nicely. However, the downsides are significant:

  • Plenty of development needed - we're going to have to come up with multiple protocols that will be used to transfer the data from one to another with handshaking and acknowledgments so that we know what we sent was received successfully, etc.
  • Significantly more bandwidth needed - if we're going to be moving the complete state (as it evolves) from one machine to the other, we're going to need a lot more bandwidth between sites to keep the latency on updates low. This might be a big issue and then again, it might not, but if we do this, we certainly need to keep this in mind.
  • Still going to need verification - no matter what, we'll have to have something that will verify that the slave is indeed a copy of the master. This might take the form of daily script, or it might be an hourly check, but something or someone has to check them against one another.

The other solution I can think of is some kind of peer-to-peer system where a change entering one gets sent to the other prior to it actually being checked and acted upon. The most obvious extension of this is to have all the changes come into the one, Chicago, server and then have it send the data to the other(s) so that we're pretty sure that we don't have to worry about cross-site changes, which could be a drag. Still, the problems with this solution are significant too:

  • Loosely coupled == More differences - no two ways about it, if the systems are loosely coupled, then it's very possible that the data in them will be different. This might also happen in the master/slave, but it's going to happen a lot more in the peer-to-peer.
  • Support Costs - because of the first point, this solution almost guarantees that we'll need someone in the right timezone to make sure the differences are kept small. It's a person at this point - not a script.

I've sent out these ideas and trade-offs to a few of the folks here that would be most effected by the decision. I want to hear what they have to say. My initial impression is that I should have the second server running every day, taking ticks, ready to cut over at a moment's notice, but that linking these guys is a lot of work that isn't really needed, and won't necessarily pay off in the long-run.