Where most businesses have focused on relational databases to track transactions and inventory, they increasingly need to take in and parse “extreme” unstructured data — video from YouTube or surveillance cameras, audio from customer support calls, text feeds from Twitter and Facebook, and sensor data from factory equipment. It’s a big problem in terms of volume; the last I heard, users uploaded 48 hours of video to YouTube every minute. And it’s a big problem to analyze all that data.
HP is certainly not alone in attacking it. IBM and Oracle, as well as virtually every big software vendor, tout big data plans — most incorporating the popular Hadoop framework for handling distributed data. Ditto the younger pure-play Hadoop companies like Cloudera.
But, Nicole Egan, chief marketing officer for HPs’ new information business management group, said the problem is bigger than Hadoop. She said people try to put Hadoop and MapReduce pieces together to solve these problems but find limited success. The problem is, “at its heart, that technology can’t understand what’s in the content. It tries to count the number of times a word is used as a proxy for meaning. You need more than that,” she said.
She argues that Autonomy — which HP just bought for $10.2 billion — has thousands of patented algorithms that make its software smarter. It “gets” the context of the data. That, plus Vertica’s data warehousing and analytics, gives HP a leg up in big data, Egan maintained.
There are many potential applications. Consumer product companies want to track sales and consumer behavior both online and in stores. For that, they need to monitor social networks for comments, in-store video cameras for shoppers’ reactions, and customer support or service calls.
A convenience store chain wants to troll Twitter and Facebook to see when a flash mob might erupt. Then it wants to use security and surveillance cameras to watch the flash mob in progress. The goal is to analyze what impact the flash mob has on store sales.
Idol 10 will be available Dec. 1 to run on all standard hardware, although the plan over time is to optimize it for HP hardware, Egan said.
All these big data offerings are hugely ambitious in scope and talk a big game. What’s not clear yet is how well these promised offerings actually work in the field.