25 Comments

Summary:

While at some point, dynamically moving VMs inside a single data center or between two data centers will be a seamless process, it’s not now. In the meantime, however, there are numerous opportunities for startups to offer solutions that will help make such seamlessness a reality.

Among the many innovations that virtualization has brought to the data center is server mobility, or the ability to live-migrate virtual machines (VMs) across physical servers. With it comes a marketing story that dynamically moving VMs inside a single data center or between two data centers is a seamless process. While at some point that will undoubtedly be true, it’s far from an operational reality today. In the meantime, there are numerous opportunities for startups to offer solutions that will help make such seamlessness a reality.

Currently, moving a VM from a one physical machine to another has two important constraints. First, both machines must share the same storage back end, typically a Fibre Channel/iSCSI SAN or network-attached storage. Second, the physical machines must reside in the same VLAN or subnet. This means that inside a single data center, one can only move a VM across a relatively small number of physical machines. Not exactly what the marketing guys would have you believe.

While this might be a small inconvenience to an enterprise data center, think about it from the perspective of a cloud provider. Their entire reason for existence is to maximize revenue earned from every dollar spent on servers. If the network limits their server utilization by constraining VM mobility, then it’s costing them revenue. Maybe they should try asking for that money back from their current networking vendor?

Many networking vendors talk about the flattening of the data center network as a cure-all, and would advise simply building a very large VLAN. But such an approach has a host of problems that have been around for years and are why routers are deployed in the first place: limiting multicast traffic, Spanning trees issues with multipathing, and of course the fact that network operations staff use VLANs for very real purposes like segmenting traffic for security, compliance, PCI, etc. These and a number of other limitations of current data center networks are on the radar of vendors and standards bodies; meanwhile, the research community has responded with lots of papers with funny names like VL2 (PDF) and PortLand (PDF), among others. What we need now are startups to come up with some creative, and practical, solutions to these problems. For if networking continues to play second fiddle to compute and storage, the cloud vision will never be fully realized.

And while server mobility inside a single data center is tough, inter-datacenter and data center-to-the-cloud server mobility is even tougher. Cisco and VMware have published (PDF) papers about it a few times, but the solutions they’re proposing seem more like hero experiments than practical solutions at this point.

For long-distance applications storage becomes the big problem, which EMC highlighted just last month with the release of VPLEX (GigaOM Pro, sub req’d). It turns out that replicating VM data across the WAN is doable, but really expensive and even more bandwidth-consumptive. Also, if you’re using Fibre Channel there are distance limitations due to the FC protocol. Oh, and with the various flavors of storage over IP, you’d better not have any packet loss. On the other hand, you could chose to move just the VM state while keeping storage in the original data center, but that will impact application performance. In other words, when all is said and done, getting this to work is extremely complicated, and probably only feasible if money is no object.

The typical use cases cited for VMware’s LD VMotion are for disaster recovery and avoidance, workload balancing, etc. The concept of moving from disaster recovery to disaster avoidance is a compelling shift and can add additional layers of reliability on top of existing features from VMware, including High Availability and Fault Tolerance. There seems to be a clear need for inter-data center and private-to-public cloud migration. And those use cases –- which I believe are set to take off –- should be the real call to arms to make such a capability operationally efficient and much less expensive.

Server mobility is a powerful tool in the modern data center, but it currently has numerous limitations. In these limitations I see opportunity for startups, especially when it comes to fixing the networking issues and helping to decrease the storage costs. Let me know if you do, too.

To hear more about server mobility and similar topics, attend Structure on June 23 & 24 in San Francisco.

Alex Benik is a principal at Battery Ventures

  1. Just to add a vendor perspective. We offer full FTP over SSL access for all clients for free. You can FTP in and out drive images using regular FTP clients. Migrating a physical server to the cloud is simply a matter of imaging the drive and FTPing it into our cloud. Likewise, migrating out means FTPing your drive images to another provider. The idea of physical to virtual is a reality and technologically feasible today.

    We thought this was the easiest and most straight forward approach for most users. We don’t charge for incoming bandwidth easy making it easy to transfer in data, trial our server and simply delete out your drives if you aren’t happy.

    Share
  2. I think you are overestimating the limitation of VMotion within the same data center. All you need to do is make sure the VLAN you need for the virtual machine is trunked to all esx hosts in your cluster and configure your virtual switch/distributed virtual switch/nexus 1000v accordingly. This is really not a big deal and trivial to setup and maintain. I do this everyday. As a matter of fact, while I was writing this response a few of my production virtual machines moved around seemlessly in my DRS cluster.

    The bigger issue is across data centers. The problem is not so much a VMware issue but a general networking problem/best practice and also a real hard storage problem as you have stated. The network problem has recently been sorta solved by Cisco with OTV. The problem is truly a spanning tree issue. With L2/L3 being at a distance you can cause yourself some real problems, read: loops or broadcast storms.

    The storage problem still has some kinks to be worked out although EMC’s VPlex looks very interesting and quite promising. I hope to get a demo from EMC sometime this year to see well this piece of technology actually works.

    Share
  3. “Second, the physical machines must reside in the same VLAN or subnet. This means that inside a single data center, one can only move a VM across a relatively small number of physical machines. Not exactly what the marketing guys would have you believe.”

    Not true at all. Please check your facts. I could not have said it better than Mr. Scott Lowe. Please see here:

    http://blog.scottlowe.org/2010/06/13/the-vmotion-reality/

    Share
  4. VMUtil is in the business of solving some of the issues mentioned in this great article. Take a look at the instant provisioning product VMProv in particular.

    Share
  5. vMotion has a goal of keeping the VM powered on and serving requests while it is transitioned between physical hosts. I am not sure what route you are wanting to take if the same IP network doesn’t exist for that VM on the other side of the move?

    Are you looking to have the IP stack of the VM dynamically adjust and re-register (thinking dynamic DNS with extremely low TTLs) so that the requests can be processed after the move? If that is the case, why not just spawn more instances of a stateless VM in the cloud and use network load balancers to deal with new TCP or UDP requests for the services that the particular VM cluster processes.

    So for clarification, do you assume that the vMotion myth applies to VMs being moved in a powered ON or OFF state?

    As a side, I am all about SRV records being used more with low TTLs.. still find it hard to believe it isn’t more widely adopted with standard internet protocols.

    Share
  6. Andrew Miller Sunday, June 13, 2010

    Hmm….I agree that there boundaries around vMotion design but practically speaking they’re high enough to leave a lot of flexibility (especially in enterprise or even small business datacenters).

    Speaking as an architect/engineer who just implemented a full-on “VMware datacenter in a box” last week (storage + servers + vSphere), vMotion is definitely a reality even in shops with (4) ESX hosts — we had it up and going on day 2 actually (a good bit of day 1 was whiteboard time around VLANs and IP range layout….data center design really that should be done in any virtualized environment).

    The wonderful thing about vMotion (and DRS really, i.e. dynamic vMotion) is that you can stop thinking as much about individual physical hardware and just consider it to be a “pool of resources”….if you need more resources, add another physical box into the pool.

    I agree there are limitations (show me tech without it) but I have many customers who are using vMotion/DRS with fantastic results…..

    Share
  7. Scott make some additional comment about this post and hope this is useful at http://blog.scottlowe.org/2010/06/13/the-vmotion-reality/

    Share
  8. Since 2006, Parallels Virtuozzo Containers technology has been able to move an entire container from one machine to another very efficiently, w/o need for shared storage.

    Share
  9. “First, both machines must share the same storage back end, typically a Fibre Channel/iSCSI SAN or network-attached storage.” <– This is a half-truth, yes, both physical machines must see the same storage back end, but that storage back-end can be multiple and different physical SAN’s. VMWare has Storage vMotion which allow live copy of a VM from one physical array to a completely different physical array w/ no downtime.

    I’m rather curious where you’ve received all this information. VMotion is not a myth.
    “Second, the physical machines must reside in the same VLAN or subnet.” <– Not true, the physical machines can all reside on different VLAN’s, but best practice says to isolate those VLAN’s so that no other traffic supercedes moving VM’s around. It’s also a good security measure.

    As for the matter of Disaster Recovery, there are solutions for that in VMWare’s portfolio as well as several other vendors to mirror the current environment across to anywhere, but yes cost could be a huge factor unless done correctly. Technologies such as deduplication from NetApp or EMC’s Celerra, can reduce the amount of SAN space needed to reduce the overall amount of data that needs to be transferred over the network.

    Share
  10. [...] Previous The VMotion Myth [...]

    Share

Comments have been disabled for this post