UPDATED. Leading a data science panel at GigaOM’s Structure Big Data conference, Joyent Founder and Chief Scientist Jason Hoffman redefined the concept of big data. Speaking with bit.ly Chief Scientist Hilary Mason, Cloudscale Founder and CEO Bill McColl, Fluidinfo Founder and CEO Terry Jones, and nPario President and CEO Bassel Ojjeh, he discussed what it means to be a practitioner dealing with big data and the unique systems needed to do that.
Hoffman broke big data down into its components, with the most generic economic good being the bit, the service as the delivery of bits, and the service delivery time as latency
The three components of big data are:
- space, which is often why the word big is put in front
Mason described bit.ly’s data as being as small as a single link, yet also at terabyte-scale as the company crawls every link people share and click on through bit.ly. However, she doesn’t consider the volume to necessarily be “big data.”
Hoffman noted that “big” doesn’t have to mean size, and Ojjeh added that the size of the data is unimportant if you can’t access it; the future of big data is in the context of response time: Is the analysis going to take three days or three minutes? “Big” is really about accessibility: Do you need a scientist to access or can a business user access it? At the largest scale, most companies are dealing with petabyte-size data stores, but the big part is really the access time and the desire is to move into real-time access of that data.
McColl feels we’re at the point where there’s going to be a big shift in data science. Until now, he said, it’s been about running offline using things like MapReduce tools, 90 percent of data warehouse users want a world in which they use real-time analytics of the data. Within seconds, they want to see the opportunities and trends, so they can take action right at that moment. The next-gen platforms handle big data and fast data: in-memory, always-on, continuous, incremental access.
McColl asked where all this data is coming from, to which Jones replied, “[P]ublishing is trying to find a way to publish all these different types of information and make APIs out of it, along with biological/medical data … it’s diverse.”
With data science, the moment of change has arrived, and companies that will succeed will be the ones that develop tools to enable that real-time data.
Ed. The original version of this article omitted Terry Jones.