3 Comments

Summary:

Mobile-app-analytics startup Flurry is upgrading its data center network with Arista Networks 10 GigE switches, a move designed to improve network performance as Flurry continues to add both terabytes and nodes to its big data system. Is the network the hardware superstar in big data environments?

cables

Big data requires big networks, as mobile-app-analytics startup Flurry has learned. The company today said it is upgrading its data center network with 10 Gigabit Ethernet switches in a move designed to improve network performance as Flurry continues to add both terabytes and nodes to its big data environment that presently consists of hundreds of terabytes. Flurry, which provides monetization and analytics for some 70,000 applications for 37,000 customers, stores and analyzes all that data with HBase and Hadoop, but the volume is growing fast. Its decision to publicize its choice in switch vendors (it chose Arista Networks) raises an interesting question: Is the network the hardware superstar in big data environments?

This certainly appears to be the case for Flurry. According to a press release , since 2009, “Flurry’s network has grown by 20 percent month-over-month in application sessions tracked” and “now processes over 300,000 session reports per day with a peak of more than 7,000 incoming application session reports per second. In total, Flurry’s network handles more than 2.5 billion data transactions per day.”

Flurry CTO Sean Byrnes is quoted in the announcement, explaining its transaction volume routinely exceeds that of Twitter and that the company plans to quintuple its 100-plus-node Hadoop cluster over the next six months. Computing and storage are relatively cheap and easy — especially when using massively parallel tools like Hadoop — but organizations can’t skimp on the network gear, which is tasked with ensuring those terabytes move between machines without becoming the bottleneck in the process.

This shouldn’t be surprising, as processing data with Hadoop is, essentially, high-performance computing, and high-speed interconnects are a big deal in HPC. They don’t need to be tightly coupled InfiniBand setups, but 10 GigE is a must — it’s what sets Amazon’s Cluster Compute Instances for HPC apart from its mainstream EC2 architecture which has huge virtual machines, but a subpar network for supercomputing. If organizations are running production Hadoop applications, as Flurry is, they probably don’t want to starve their compute clusters as data trickles through slow pipes.

For more on how the data deluge is affecting the enterprise and innovative ways startups are tackling their own data problems, come to our Big Data conference held March 23 in NYC.

Image courtesy of Flickr user Claus Rebler.

Related content from GigaOM Pro (sub req’d):

You’re subscribed! If you like, you can update your settings

  1. Tweets that mention Got Big Data? You’re Gonna Need a Faster Network.: Cloud Computing News « — Topsy.com Tuesday, January 25, 2011

    [...] This post was mentioned on Twitter by James Urquhart, Rich Hintz, paul quickenden, Olafur Ingthorsson, structureblog and others. structureblog said: Got Big Data? You’re Gonna Need a Faster Network.: Mobile-app-analytics startup Flurry is upgrading its data center… http://dlvr.it/F4vGP [...]

  2. Do those numbers even add up, 300,000 session reports per day vs. 7,000 session reports per second vs. 2.5 billion data transactions per day? I know, one is average, one is peak and the third is transaction not session report. But still…

  3. IBM Creates Big Data Frankenstein with Netezza-R Fusion: Cloud Computing News « Monday, March 14, 2011

    [...] these situations as data moves between Hadoop clusters and databases, but a faster internal network could help alleviate that problem, and data moving from the Hadoop cluster post-processing will be far smaller in volume than the raw [...]

Comments have been disabled for this post