Why Big Data Startups Should Take a Narrow View


Looking back on last week’s Structure Big Data conference, one of the statements that struck me most was CA (s ca) CTO Donald Ferguson’s notion that big data represents a “very promising” opportunity for startups, particularly those targeting specific target use cases. I think he’s right, particularly with regard to the latter part: the market for horiztontally focused products is filling up fast with both startups and large vendors, so innovative companies might look at how to best tune big data tools for specific industries.

As I explained in detail last week, Hadoop has become popular among companies of all sizes, but most products at this point target broad use cases across industries. Yes, there’s still room for startups to get in here, but the door looks to be closing fast. It’s not just Hadoop, either; other techniques, from tradtional data warehouses to, arguably, predictive analytics, all are nearing the saturation point in terms of vendors selling the core technologies. Even a step up the stack from the core Hadoop layer are vendors like Datameer selling familiar-looking interfaces that abstract the complexities of processing and analyzing data with Hadoop.

But Ferguson made a particularly poignant, if not novel, observation: analyzing social media data is not the same, either in technique or in purpose, as analyzing user data to feed a recommendation engine for a site like Netflix. And herein lies the opportunity. Organizations keep on hearing about big data and about how big an opportunity it is, but even though the technology to capitalize on this opportunity is getting democratized, organizations still face a big challenge to hire personnel that understand not only the technology, but also how to ask right the right questions. Sure, analyzing social media data sounds great to find out what consumers like or how they might act sounds great, but actually being able to do it accurately is another issue. It’s a situation just begging for startups to fill the void between big data tools and actually using them for a particular task.

Whether the focus is by industry (e.g., tools for financial services, retail, etc.) or by use case (e.g., sentiment analysis, recommendation engines, etc.), one can easily envision an emerging class of companies tuning technologies like Hadoop or predictive analytics software to directly address these discrete classes of users. Organizations won’t necessarily need data scientists to “turn information into gold” if the data scientists employed by their software vendors have already done most of the work. Think about it like functions within spreadsheet applications tuned to specific industries, or like how PaaS startups took cloud computing a step further by configuring infrastructure with the push of a button. Just feed the application some data, push a button, and get results — no Ph. D. required.

To a degree, this is already starting to happen, but primarily by large vendors using their existing software (e.g., SAS (s sas) for social media) and in the form of fairly limited-scope analytic technologies (e.g., graph databases), but I think these are just baby steps toward what could be a huge opportunity. Companies of all types want to be the next Yahoo or Facebook in terms of big data, and there are plenty of companies willing to help them do that in terms of infrastructure. The real opportunity now is in helping companies figure out how to use it.

Image courtesy of Pam Brophy.


T.R. Fitz-Gibbon

Thanks for the great post, Derrick.

You are right, analyzing social media data is drastically different than, say, analyzing user data for a recommendation engine. With Netflix data, the main features are things like the movies and shows a user has watched. This data is very structured and well defined and almost no meaning is lost when it is translated into a form that machines can process.

However, the main features of social media data are the words, phrases, slang, punctuation, emoticons, etc, that authors use to express their ideas. From the machine’s perspective, these features often do more to obscure the authors’ ideas than to reveal them. Teaching the machine to look past these features and see the meaning behind the text is challenging, to say the least.

Just as you describe, companies are starting to use big-data technology, like Hadoop, to analyze social media data. In fact, the company I work for, Networked Insights, has viewed social media analysis as a big-data problem for many years now. We have been using Hadoop and similar technologies to develop and tune Natural Language Processing and Machine Learning techniques to the social media space. We use the proprietary software we develop to inform targeted, highly effective media planning for our customers.

Analyzing social media data is challenging due not only to the complexity of the data but also to the sheer quantity of it. Both aspects make it impossible to calculate a perfect solution in the traditional problem-solving sense. But, that is the type of problem I love: using Machine Learning and big-data techniques to find great solutions to problems that are too large and complex to have perfect solutions.

T.R. Fitz-Gibbon
Chief Scientist
Networked Insights

Manoj Kumar

Was not hundred percent sure of the comment “No Ph.D.” required”. Agree that mass scale adoption of Big Data will depend on eliminating the need for computer science Ph.Ds to integrate its piece parts. The question is whether the users in possession of Big Data will deploy the data scientists (statistical model builders) or prefer to use cdanned (black-box) models developed by some analytics application provider.

Comments are closed.