1 Comment

Summary:

Looks like Oracle has some competition when it comes to selling big iron for big data. On Wednesday, Cray, the Seattle-based company best known for building some of the world’s fastest supercomputers, announced it’s getting into the big data game.

Cray's XK6 supercomputer

Cray's XK6 supercomputer

It looks like Oracle has some competition when it comes to selling big iron for big data. On Wednesday, Cray, the Seattle-based company best known for building some of the world’s fastest supercomputers, said it’s getting into the big data game. A new division within Cray, called YarcData, will leverage Cray’s experience working within data-intensive environments for customers such as Boeing in order to woo large-enterprises with big data needs.

Cray was short on details in a press release announcing the new division, but new YarcData SVP and GM Arvind Parthasarathi, formerly of Informatica is quoted saying, “YarcData is the nexus of the world’s most advanced technologies from Cray being applied to solve the world’s most challenging Big Data problems.” The natural leap is that Cray will design parallel-processing systems capable of incredible data throughput — something already required in the supercomputing space, where incredible processing capacity would be wasted without a steady data stream — but that will support today’s popular big data tools (e.g., Hadoop, analytic databases and predictive analytics software).

This type of system could be very valuable for organizations such as banks and intelligence agencies that want to run big data workloads as fast as possible — even process streaming data in real time– and the deep pockets to pay for Cray’s presumably pricey systems. Despite the fact that big-data framework Hadoop gained popularity in part because it’s designed to run on commodity hardware, there’s always a place for high-end hardware when milliseconds really do matter, and there’s something to be said for pre-configured systems that take the guesswork out of building a big data environment, as I explained recently in a piece for GigaOM Pro (sub req’d).

Cray isn’t alone in pushing this high-performance, enterprise-focused big data vision, though. Oracle made a splash in October when it announced a Big Data Appliance that marries Hadoop, R, NoSQL and other technologies to the high-end hardware Oracle obtained when it bought Sun Microsystems. IBM also has an extensive big data software portfolio complemented by a systems business that includes supercomputers, as well. And although it doesn’t have an HPC pedigree like the others, Teradata has years of experience building systems optimized for analytics.

Cray won’t likely become a household name in the big data world, and its notoriously secretive customers might never divulge what they’re using its analytics products for, but there certainly is a market — however small — for super-big, super-fast and super-expensive data.

You’re subscribed! If you like, you can update your settings

  1. Interesting. Cloudscale’s patented HadoopBI big data engine (hadoopbi.com) has been designed from the ground up to analyze streaming data in realtime at rates of millions of events per second. Like Cray we use MPI (HadoopMPI) to achieve performance that is over 100x faster than HadoopMapReduce. HadoopBI can be run on Apache HDFS or on MapR, but for blazingly fast performance it can also be run on Lustre, which is again is way faster at scale than any of the current Hadoop filesystems. Lustre is also what Cray use in their supercomputers. HadoopBI runs on standard commodity hardware.

Comments have been disabled for this post