17 Comments

Summary:

The virtualization of systems allows for efficient use of server resources and is clearly a trend that many enterprises are embracing. Systems engineers see virtualization as the next generation of tools that can help scale their servers, while network engineers see the virtualization trend headed in […]

The virtualization of systems allows for efficient use of server resources and is clearly a trend that many enterprises are embracing. Systems engineers see virtualization as the next generation of tools that can help scale their servers, while network engineers see the virtualization trend headed in their direction as well. Unfortunately, it seems that server virtualization also helps foster trench warfare between the two.

I found myself witness to one small skirmish in this battle today, when I met with a startup looking for funding. The startup is building enterprise services, and for its next generation plans to make heavy use of XenSource’s XenMotion functionality to manage virtual machines on about 50 physical servers. This functionality, which is similar to that of VMware’s VMotion, promises to seamlessly move a virtual machine from one physical server to another. The startup’s service product could be running in one virtual machine on a server and if the server receives too much load or has a failure, the XenMotion functionality could move the virtual machine to another server without resulting in any downtime. For an enterprise services startup, avoiding downtime is a good idea.


I asked some questions about the network and systems architecture and found that the systems engineers had made the assumption that in the new service, any virtual machine could be allocated to any physical server. The network engineers, unfortunately, had not taken this into account. Based on the physical network topology — a classic three-tier architecture — the network engineers had set up firewall rules and access-control lists to appropriately protect the infrastructure. For example, not every server could be accessed from the Internet and only certain physical servers had permission to mount storage area network resources. If using XenMotion meant every server was expected to house any virtual machine at a moment’s notice, these were clearly issues that needed to be resolved.

The systems engineers’ expectation of being able to move any virtual machine to any physical server in the infrastructure meant a complete redesign of the network topology was required. And that is when the skirmish ensued. The systems engineers insisted that the network topology be set up to allow XenMotion to work seamlessly. The network engineers argued that their network topology was necessary for scalability and security. As far as I was concerned, they were both right, so before continuing my due diligence on their business, I sent them off to settle their skirmish amongst themselves.

But it had got me thinking: Has server virtualization added an abstraction layer that further separates systems engineer and network engineers from the physical reality of their environments? Do we need a new engineer — a virtualization engineer — that understands how virtual machines are allocated on physical servers and networks to act as a liaison between the two factions?

You’re subscribed! If you like, you can update your settings

  1. Allan – interesting commentary – please let us know when the engineers get back to you – worth another post.

  2. Allan Leinwand Thursday, April 17, 2008

    Thanks G – I will follow up.

  3. I think the issues/concern that you raise have been started to get addressed as the concept of virtualization of the server, the network and the storage start working together. For a data center solution, you need to look at solutions for all of these instead of a single server-only or network-only solution. As your applications move, your network & storage has to be configured accordingly and what you need is the management of this which is addressed by products like Cisco VFrame Data Center.
    http://vmblog.com/archive/2007/07/28/cisco-vframe-data-center.aspx

    Check out this talk on Cisco Nexus 5000 (which is a network entity) and how it it designed (theoretically) to enable VMWare (which is a server-virtualization entity).

    Disclaimer: I am not a systems/network/virtualization engineer, but in a past life did work at Cisco.

  4. Allan Leinwand Thursday, April 17, 2008

    @stockandawe – Agreed on your points that these items are starting to be addressed. I’m just not sure that many folks want to put all of their virtualization eggs into Cisco’s basket. I wrote about this topic on this site and on cio.com:

    http://gigaom.com/2008/03/21/coming-soon-the-cisco-blade-server/
    http://advice.cio.com/allan_leinwand/who_will_control_the_next_enterprise_data_center

  5. …and following up on StockAndAwe’s comments, looks like Scalent Systems has stolen a march on Vframe.

    From the website:

    “Scalent…lets data centers change server software, network connectivity, and storage access in real-time”

    and

    “Scalent code ships in VMware ESX Server”.

    Seems like a network-boot solution already exists to do the first-stage work you’re describing, Allan. (EG, power on a machine, set up the right network and storage connectivity, and boot the right hypervisor…)

    Scalent also appears to be resold by HP, Unisys, and EMC. I thought I saw Scalent listed in the Nexus press release too (they are in the Google cached version), but then they’re missing on the Cisco site version…?

  6. as with any environment (virtualized or otherwise), those working on architecture (systems, app, network and security engineers alike) have to all be involved together from the get-go. Dropping in any new technology without making sure everyone understands how it interoperates with the existing infrastructure (as well as the risks and business case) is a recipe for failure.

    In this particular case, the correct approach (well, one correct approach anyway) is to use VLANs, tag all the server VLANs that will be in use for any VMs to all ESX hosts, and firewall based on VLAN, rather than based on physical host. This approach tends to be more flexible and scalable for neteng in addition to the benefits for the systems and application folks (speaking from experience in building out just such an architecture).

  7. Increased simplicity in the front means increased complexity on the back

  8. Don Nalezyty Friday, April 18, 2008

    I have first hand experience attempting to negotiate this exact battle for about 4 years.

    We’ve had a stateless Grid for over 7 years, which allowed us not to migrate a live instance, but extremely rapidly move a shutdown instance to any server in the Grid. We use PXE to boot the servers from NFS volumes on NAS. When we only had 240 servers in the grid it wasn’t difficult to have them all in a single network space.

    Then the Grid continued growing and we had added requirements for more than simple http access from the internet. Security and networks began to grow more and more concerned about the environment. About 4 years ago we realized we needed to go virtual to remain cost effective and be able to scale as the business continued growing, which finally opened the first shots in the war.

    There are a number of vendors with solutions attempting to resolve it. As mentioned by stockandawe, Cisco VFrame is one. We were lucky enough to be part of a very early beta for this product. While Cisco has done a excellent job of seeing the nature of the problem and attempting to create a solution, it has one major draw back. VFrame needs complete control of the environment to work it and both Security and Networks were unable to overcome cultural and very real concerns about doing this.

    Scalent (as noted by virtualman), Xsigo and a few others have taken a similar approach, but all face the same challenges. The cultural issues can’t be ignored. It’s ironic that as technologists we can fall into the same trap we so often accuse non-technologists: Fear of technology because it’s new and untried.

    Many companies are going to be unwilling to be the first to adopt these new technologies because they not proven by wide acceptance. It’s often up to the smaller companies that aren’t afraid to take a risk and are capable of being more agile to adopt these technologies and prove they are reliable.

    I think darkuncle hit the nail on the head, when he said everyone needs to be involved from the start. By involving networks security, apps and systems architects and engineers from the start you have the opportunity to support one another through large leaps of faith.

    We haven’t really achieved the holy grail of a fully virtualized environment through all components and infrastructure yet, but we’re getting there. We’ve moved the entire Grid behind SSL-VPN and much like darkuncle, we’ve used VLANs to gain some flexibility, but that has limits as well. Our Grid has grown significantly and can host upto 6000 or so VMs, which is like a small datacenter unto itself. Having a single network space encompass so many systems adds complexity and risk.

    Until more of the bigger vendors become more engaged in this space, it’s going to be a challenge finding solutions that make everyone happy.

  9. Allan Leinwand Friday, April 18, 2008

    @VirtualMan – thanks, I’ll check out Scalent.

    @darkuncle – the folks I met with were using VLANs, but probably not as you describe. I’ll dig into that more with 802.1q trunks on the servers.

    @Daon – thanks for the comments and sharing your experiences.

  10. @Don – that’s (6000 VMs) about 5x the size of the environment we built out (at least, the size when I moved on), and definitely to the point that it makes sense to take a good look at distributing those services across multiple locations or networks. One of my favorite things about a fully-virtualized environment is that everything becomes a commodity akin to power – we have big logical containers of CPU cycles and RAM and storage, and when we combine those with some high-end load balancers we get a very flexible and easily scalable environment. If you can then define what pieces of equipment you need to build out such an environment, you can turn the datacenter itself into a single logical container, and plunk down one of these containers anywhere that’s got good connectivity to your customers and good pricing. It’s a little like building hierarchically with Legos – when all the pieces are interchangeable and well-defined, the infrastructure becomes consistent, simple to manage, easy to learn and massively scalable.

    (of course, if all your customers are in one location – say, academia or a compute cluster somewhere – distributing load geographically may not make sense, but the concept still applies on the local level, I think.)

Comments have been disabled for this post