Got Big Data? You’re Gonna Need a Faster Network.


Big data requires big networks, as mobile-app-analytics startup Flurry has learned. The company today said it is upgrading its data center network with 10 Gigabit Ethernet switches in a move designed to improve network performance as Flurry continues to add both terabytes and nodes to its big data environment that presently consists of hundreds of terabytes. Flurry, which provides monetization and analytics for some 70,000 applications for 37,000 customers, stores and analyzes all that data with HBase and Hadoop, but the volume is growing fast. Its decision to publicize its choice in switch vendors (it chose Arista Networks) raises an interesting question: Is the network the hardware superstar in big data environments?

This certainly appears to be the case for Flurry. According to a press release , since 2009, “Flurry’s network has grown by 20 percent month-over-month in application sessions tracked” and “now processes over 300,000 session reports per day with a peak of more than 7,000 incoming application session reports per second. In total, Flurry’s network handles more than 2.5 billion data transactions per day.”

Flurry CTO Sean Byrnes is quoted in the announcement, explaining its transaction volume routinely exceeds that of Twitter and that the company plans to quintuple its 100-plus-node Hadoop cluster over the next six months. Computing and storage are relatively cheap and easy — especially when using massively parallel tools like Hadoop — but organizations can’t skimp on the network gear, which is tasked with ensuring those terabytes move between machines without becoming the bottleneck in the process.

This shouldn’t be surprising, as processing data with Hadoop is, essentially, high-performance computing, and high-speed interconnects are a big deal in HPC. They don’t need to be tightly coupled InfiniBand setups, but 10 GigE is a must — it’s what sets Amazon’s Cluster Compute Instances for HPC apart from its mainstream EC2 architecture which has huge virtual machines, but a subpar network for supercomputing. If organizations are running production Hadoop applications, as Flurry is, they probably don’t want to starve their compute clusters as data trickles through slow pipes.

For more on how the data deluge is affecting the enterprise and innovative ways startups are tackling their own data problems, come to our Big Data conference held March 23 in NYC.

Image courtesy of Flickr user Claus Rebler.

Related content from GigaOM Pro (sub req’d):

You're subscribed! If you like, you can update your settings


Comments have been disabled for this post