With Hadoop and HDFS and related big data technologies, we’ve pretty much licked the scale problem of handling petabyte upon petabyte of data. Next up: solving the speed problem.
Right now running interactive queries across data sets spread among a thousand nodes is no mean feat. As a rule of thumb, you can run fast queries on old data (in a data warehouse) but running fast, interactive queries on massive distributed data sets is still the problem, according to speakers at a Structure:Data 2013 session today that honed in on what problems real-time data analytics — when possible — could attack.
“Interactive analytics is a complex problem. You have on one end a business users asking ‘what if we did things a little different?'” said Silvius Rus, director of big data platforms for Quantcast. They may have 10 ideas on how to change something and 9 are bad, but one is good. They need to be to iterate queries and get the answer back in minutes not a day, he added.
On day one of the show, Paul Maritz, the head of the new EMC-VMware Pivotal Initiative, talked about how companies need to have faster, more nimble feedback loops from their massive data stores. Telephone companies know they have dropped calls but they don’t know whose call they dropped, and it can take days or even weeks to find out.
That’s the sort of problem that new fast, big data analytics can solve. In that world, the phone company would know it dropped your call and “at the very least, could text you an apology,” Maritz noted. (Or better yet, issue a refund, or a make good of some kind.)
Panel moderator Michael Driscoll, CEO of Metamarkets, really wanted to hear about what this “big data utopia” — where the system could ingest, transform and spit out answers to questions in real time — could mean.
The applications that could start coming out within months could be impressive, according to Ashok Srivastava, chief data scientist for Verizon. There are the obvious things like real-time or near-real-time response to customer problems (see the dropped call issue above) and requests but he also foresees breakthroughs in cybersecurity. He cited earlier talks at Structure:Data about how systems can increasingly understand the motion and movement of people around the globe and the movements of concepts through society.
And he thinks real-time big data capabilities will play out in citizen science research. “Imagine taking your cell phone pictures and combining them with multiple millions of other cell phone pictures. That’s something that can be used by scientists,” he noted.
In health and medicine, the ability to query the most up-to-date personal health data along with historical data could enable real-time predictions about a person’s health or the status of machine health, he added.
Check out the rest of our Structure:Data 2013 live coverage here, and a video embed of the session follows below:
A transcription of the video follows on the next page