8 Comments

Summary:

Plexxi has made a new networking box that it calls a switch, but is radically different from the switches on the market today. The switch contains software plus an optical transceiver that link to other Plexxi boxes to form a fast connection between thousands of servers.

Plexxi CEO David Husak
photo: Plexxi

The enterprise infrastructure of the late 90s and early aughts is no match for the demands of webscale companies like Google or Facebook, or even cloud providers like Amazon. Thus, the giants in the web and cloud worlds are demanding new infrastructure and remaking the world of computing for their own needs.

These giants are deconstructing the server, rethinking the data center and building new databases. And they are also rethinking how they build out networks. To that end several companies have latched onto the concept of software defined networks to help build an abstracted and programmable layer on top of the physical networking gear. This gives systems administrators more flexibility, but it doesn’t help with another problem that many data center providers face, creating faster equipment that can carry their escalating traffic.

This is where Plexxi, a Cambridge, Mass.-based startup sees its opportunity. It has built what it calls a switch, but is actually an entirely new type of networking gear aimed at delivering a lot of traffic between racks of servers as fast as possible. Plexxi also offers Plexxi Control, management software that runs on commodity hardware and routes traffic around the network. Inside the Plexxi “switch” are optical components normally found inside telecom gear. Plexxi has designed the first networking gear for the data center that uses optics instead of electronics to solve the networking bottleneck.

Each switch has 32 10-gigabit-Ethernet capable ports and can be connected to other Plexxi switches. For now the company has tested deployments of its gear supporting 10,000 servers, but Plexxi CEO and CTO Dave Husak assures me it can scale bigger. It will have to. For example, Facebook, a potential customer, is estimated to have more than 150,000 servers. The Plexxi gear allows for the network to become essentially flat — a rack of servers talk to the box and the boxes have the speed and capability to talk directly to each other. Instead of hierarchies or trees — software running on the switch tracks where virtual instances have moved and routes traffic to them over this single layer. This speeds up the networking and simplifies the gear supporting the network.

Plexxi has done something unique in its box worth looking at. It has replaced the traditional electronics networking with optics, which boosts speed but also reduces costs and power draw. To keep costs in line it buys off-the-shelf components and strips them down to the essentials. While the transceivers and equipment inside telecom networks cost thousands of dollars, Plexxi’s entire box sells for $64,000 in part because it did away with things like line cards and amplifiers that telcos use to boost the distance an optical signal can travel.

Using optics inside the boxes and connecting the boxes directly together in a flat network also eliminates a lot of cabling and additional electronics that both add costs and increase the power consumption. Husak told me the Plexxi switch draws milliwatts per port. The latest switch from Arista draws 5 watts per port.

Husak is a veteran on the networking world, remembering how the switch replaced token ring and other networking architectures in the 90s. Now, twenty-some odd years after the first switch was unveiled, Plexxi is poised to remake networks again with an entirely new systems architecture and design. It has almost $50 million in funding and has been building this switch out since 2010. This is a company unafraid of taking on Arista, Cisco and other giants in networking in a quest to re-imagine the data center of the future.

  1. If Facebook is a potential customer, will the IP behind this be contributed to Open Compute?

    Share
    1. Why would it be? Facebook doesn’t make its vendors, or potential vendors, contribute their IP to Open Compute.

      Share
    2. Douglas Gourlay Wednesday, December 5, 2012

      Robert, I am not sure there is a whole lot of IP here. There is some pretty good marketing mind you.

      1) These switches deploy in a ring topology, just like most every stackable switch has in the past. For the stack interconnect they use MPO connectors with 12x10b signaling on each path. 12x10Gb East and 12x10Gb West on the ring.

      2) Like any hard-wired ring topology you have to break the ring to add new nodes. In a high capacity scale-out model this is scary for anyone who has to deal with high rates of change and node additions.

      3) Also like any ring topology if a single node fails you break the ring and the traffic capacity is halved. Two node outages and you have a split connectivity model where some portion of your data center cannot see the rest – not usually ideal in any scenario.

      4) The bandwidth constraints of this model isn’t too bad for one or two boxes – but at the scale advertised, lets say 15,000 nodes that would take about 469 switches in one big ring. If we assume equal traffic distribution that means that at any point in time about 7500 10Gb conversations are on the eastbound ring and 7500 10b conversations are on the westbound ring. I am sure there is some software magic here where ‘SDN’ will magically fix this, but there is a high likelihood of experiencing congestion choke points at that scale where you try to cram 7500 10Gb ports of traffic down 12 10Gb ports of capacity. Or 625:1 oversubscription… not ideal for today’s data center – or yesterdays, or last centuries for that matter.

      I thought Token Ring died for good reasons… why is someone trying to bring it back?

      Share
      1. I’m interested to see Plexxi’s response to this.

        Share
  2. Douglas Gourlay Wednesday, December 5, 2012

    Some pretty serious misinformation on power consumption in here:

    The Plexxi product, per their datasheet is 120W of average power draw for 32 ports. The Arista product is 120W of average power draw for 64 ports.

    Plexxi would therefore be 3.75W per port, not ‘milliwatts’. The Arista switch (7050S-64) would be 1.87W per port – half the power draw of the Plexxi product.

    Share
    1. Thanks, Doug and why I cited Plexxi as the source of that info and linked back to the Arista release. I’ll follow up with Plexxi to figure out why they told me one thing and put another in its data sheet.

      Share
  3. Hi Stacey – regarding power, the comment Dave made was with respect to the optical interfaces, I don’t believe he was attempting to quote the Switch power specs, just pointing out the big advantage by leveraging optics in the architecture. Apologies if we were unclear in our discussion. Our actual full Switch power specs are as stated in the data sheet, 120W average, yet that is across 64 ports, not 32 (we have 32 10 GbE + 2 40 GbE, or 40 equivalent 10 GbE, plus 24 that head to the optical domain). That yields 2W per port. And what Doug fails to mention is that to aggregate their switches together you need a 10W per port spine layer, which is where the bulk of our network power savings come from as that entire layer is not needed.

    With respect to Doug’s other analysis, he clearly has a lot of misinformation on our architecture and makes a ton of incorrect assumptions, but clearly we’ve struck a nerve. In our view and in our dealings with customers, there are some serious issues with traditional leaf/spine and multi-tier architectures that incumbent vendors seem to want to ignore, so customers are looking for alternatives. We’ve been deep through our architecture with many very well informed and educated customers that seem to come to different conclusions, not surprisingly.

    Share
  4. Doug,

    I’m not sure where you get your information, or why you feel compelled to spread FUD, but for the benefit of potential end users, I’d like to respond to your comments. Certainly your frame of reference is just “what has been done before,” but we like to look at things in the context of what is possible with innovation and creativity, unbounded by historical boat anchors like “Token Ring” and stackable switches. But we do agree that every technology dies eventually. Maybe switched hierarchies will be the next Token Ring?

    Some more actual information on our architecture below.

    claim 1) These switches deploy in a ring topology, just like most every stackable switch has in the past. For the stack interconnect they use MPO connectors with 12x10b signaling on each path. 12x10Gb East and 12x10Gb West on the ring.

    (mat) Yes our LightRail uses standard MPO connectors and standard single-mode fiber cabling (so customers don’t have to worry about proprietary cabling), and manages to consolidate 120 Gbps (240 FD) in each direction (East/West) with a single cable. In a traditional Leaf/Spine, that would require 24 separate cables + 48 transceivers and all traffic would be oriented North-South.

    Also, stackable switches and previous ring topologies have lacked critical innovations that make the Plexxi system compelling:
    – Multiplexing multiple physical links into a single cable to reduce the capital and operating costs of building and supporting networks at large scale networks
    – Ability to change the point-to-point connection topology of the physical network in software
    – Scaleout model where adding a switch to the ring adds bisection to the interconnect

    claim #2) Like any hard-wired ring topology you have to break the ring to add new nodes. In a high capacity scale-out model this is scary for anyone who has to deal with high rates of change and node additions.

    (mat) If you architect a system correctly, nothing has to be scary. If I insert a new node in the ring, everything else is fully accessible during the insertion because we have many paths to get from anywhere to anywhere. When the new node comes online, its local attachments will get communicated to everyone else. Its actually very simple and very scalable.

    Also, let’s compare it to the amount of disruption involved in scaling the alternative (Clos) tree based solution, which will require you to break huge numbers of links and disrupt the operating network to a much greater extent. To take non-blocking tree networks built with conventional switches and join them into one big one, or to change the oversubscription ratio in a given fabric, or to add a few ports to a fully provisioned fat tree requires orders of magnitud more changes to cables and in the latter case the introduction of an entire new layer in the switching heirarchy with resulting higher cost per port and reduced system reliability (larger failure domain).

    Growing a network one switch at a time actually has huge operational benefits and doesn’t force the user to contemplate years in advance the fixed limitations of their network and pour that in concrete.

    claim #3) Also like any ring topology if a single node fails you break the ring and the traffic capacity is halved. Two node outages and you have a split connectivity model where some portion of your data center cannot see the rest – not usually ideal in any scenario.

    (mat) In fact, if a single node fails (or any number for that matter), all traffic traversing through that node passess unaffected. Only traffic destined to devices directly connected to that switch would be affected, so the failure domain would be only 32-40 servers. If there is a cable break, you lose exactly 120 Gbps of capacity (240 FD), not half, and a similar amount you would lose if you lost a line-card in a spine chassis. Compare this to losing half (at a minimum) in a hierarchical network if the top node fails.

    claim #4) The bandwidth constraints of this model isn’t too bad for one or two boxes – but at the scale advertised, lets say 15,000 nodes that would take about 469 switches in one big ring. If we assume equal traffic distribution that means that at any point in time about 7500 10Gb conversations are on the eastbound ring and 7500 10b conversations are on the westbound ring. I am sure there is some software magic here where ‘SDN’ will magically fix this, but there is a high likelihood of experiencing congestion choke points at that scale where you try to cram 7500 10Gb ports of traffic down 12 10Gb ports of capacity. Or 625:1 oversubscription… not ideal for today’s data center – or yesterdays, or last centuries for that matter.

    (mat) At least a few things wrong with the base assumptions, but lets start with “one big ring”.. certainly that is one way to do it, but not the only way. Why not many rings? Secondly, why start with the assumption of equal traffic distribution? Certainly that is convenient when dealing with limited embedded protocols like ECMP that only know how to very primitively distribute traffic evenly, but are there many data centers that have equal traffic distribution amongst all devices? Are there any? This is exactly why we came up with the concept of leveraging Affinities – so we could build networks that actually match real data center usage models – i.e. non-equal traffic distribution. Some may consider this “SDN Magic”, we prefer to call it algorithms and math, but I guess we’ll take that as a compliment! But yes, we can get away from simple embedded protocol limitations that don’t match the real world when we think outside of the box. We can also leverage very powerful new concepts of spatial re-use in the optical domain and we don’t ever need to traverse the entire ring hop by hop, that would be kinda stupid, so the numbers you quote are actually are completely wrong, but to understand this would require moving past arcane notions of Token Rings. Fortunately our customers are not so limited in their thinking.

    Share

Comments have been disabled for this post