EMC VMware's ESX 3.0 was released a bit more than three years ago. While ESX
2.5 was a solid virtualization platform, ESX 3.0 seemed to push server
virtualization into the realm where a lot of small and large businesses alike
could really sink their teeth into it. The new high-availability features in ESX
3.0 were a huge draw to many businesses seeking better uptime, and the refined
centralized management offered by VirtualCenter 2.0 was compelling. Support for
a wider set of hardware such as iSCSI SANs also allowed high-end functionality
at a lower price.
Now that we're three years down the road, many of these initial adopters of ESX
3.0 are starting to replace their hosts with new ones and preparing to upgrade
to vSphere 4.0. That seems to be leaving a lot of server admins staring at a
stack of three-year-old virtualization hosts that aren't yet finished doing
their jobs. Sure, they might not be quite fast enough to go the distance with
increased production loads, and you might like to have some more performance
headroom, but it's always a painful decision to turn off a bunch of expensive
servers and not do anything with them.
[ Read the InfoWorld Test Center's review of VMware vSphere 4.0. | See why
vSphere is not yet cloud-capable, despite VMware's hype. ]
Instead of tossing their old hosts in a Dumpster, many enterprises are opting to
reuse them. Some turn them into development clusters to separate dev loads from
production loads. Some make them available for testing and training. My favorite
use is as the seed hardware for a warm site. Even if the old hardware can't run
all your production resources at 100 percent resource availability, having some
immediately available production capability in a production site failure
scenario is better than none -- and it bridges the gap between the time of the
disaster and the time that you can get replacement hardware on site.
Assuming that business continuity is important to your organization and you have
multiple offices or a sufficiently large campus, building a warm site is a great
use of your hardware. It certainly isn't free and there are a number of common
pitfalls that you'll want to steer clear from, but it's definitely a worthy
endeavor if downtime costs you money.
Step 1: Define the service level
First, you need to define the level of service you want to grant with your warm
site. Do you want to protect all of your machines or just a subset? How quickly
do you want to be able to recover (RTO)? How old can your data be when you do
recover (RPO)? Your answers to these questions may change as you work through
the design process and start attaching price tags to varying levels of service,
but you should never let what you can afford directly drive what you provide.
It may be that, to be useful, a warm site would cost more than you can currently
afford to spend on it. In that case it's better to save your pennies and do it
correctly than to implement something that won't accomplish your organization's
goals.
Step 2: Assess your SAN situation for replication options
The SAN is the first piece of hardware that needs to be looked at, as it tends
to be the most expensive. If possible, using asynchronous SAN-to-SAN replication
is the best way to implement a warm site. Depending on the SAN platform in use,
such replication might simply be impossible or uneconomical.