8 Comments

Summary:

I spent two days last week watching experts on big data and data science discuss how their companies are building businesses around data, or at least rethinking how they do business. Although most came from the web, these five ideas should matter across industries.

Big data is going mainstream, but there are still plenty of lessons to be learned from Silicon Valley data scientists whose businesses depend on data to survive. Although their use cases don’t always align with what more-traditional businesses are doing, they know enough about the science and technology to save big-data newcomers a lot of frustration.

I spent two days last week watching talks at the IE Group’s Big Data Innovation event, and here are five messages that really resonated with me. Hopefully, they’ll help your business, too.

1. Hadoop isn’t for everything. This should be common knowledge by now, but it bears repeating. Usama Fayyad, CTO at ChoozOn, pounded this point home when discussing how even Yahoo — Hadoop’s biggest champion and Fayyad’s former employer (he was chief data officer) — learned this lesson the hard way. Yahoo was trying to do some advanced customer segmentation with Hadoop, he said, but found out it would be 50 times less expensive to do that particular workload with a more-traditional database architecture. The realization ultimately killed that project, which was resurrected as analytics startup nPario. Yahoo is now a paying nPario customer. (At Structure:Europe in October, we’ll debate the merits of Hadoop versus traditional relational databases onstage.)

nPario’s Hadoop-free architecture.

2. Big data makes data science easier. I found this one of the more enlightening realizations, thanks in large part to how its messenger — Daniel Wiesenthal, chief data scientist at Sparked.com — was able to so clearly delineate between the sometimes-overlapping concepts of big data and data science. Essentially, he explained, techniques such as support vector machines and neural networks are time-tested and proven methods for “sucking every last ounce of information from your data set,” even when those data sets are small, but the techniques are very complicated, they’re difficult to interpret and they tend to break at scale.

A sample decision tree.

However, big data lets data scientists use simpler modeling techniques such as decision trees and regression, while letting the volume of data account for accuracy (and statistical significance) rather than a super-complex algorithm. And, Wiesenthal noted, using general-purpose big data technologies such as Hadoop means data scientists can develop and test models faster because their infrastructure isn’t tuned to a specific algorithm or problem type, and it’s designed to perform well against large data sets.

3. “Sometimes it’s more important to know what to kill.” Software-as-a-service pioneer Salesforce.com uses its big data platform to monitor the uptake and usage of various product features, said director of product management Narayan Bharadwaj, but the goal isn’t only to predict what new features to add next. Rather, he explained, using data to determine what features aren’t doing help a company like Salesforce.com decide to put those resources into more-valuable features. “Sometimes it’s more important to know what to kill,” he said.

Bharadwaj didn’t address this point, but it seems a logical next step would be to analyze the characteristics of features that perform well/poorly to get a sense of what works and what doesn’t from a design perspective.

4. Context adds value. To put it another way, if users know why they’re being shown a particular piece of content or offer or recommendation, they’re more likely to check it out. As a senior data scientist at StumbleUpon, explained, his company invests heavily in big data technologies and data science techniques in order to put the most-relevant web content in front of each user, but knows it’s not enough to expect those users to just trust the service’s judgment. Sparked.com’s Wiesenthal made a similar point in his talk, noting that services such as Pandora and Netflix are popular in part because they actually tell users something about themselves when recommending similar content.

If I like Metallica, I like …

5. Transaction data trumps search data. Mok Oh, chief scientist at PayPal, discussed the chain of events that begins with product searches and ends with purchases, and how it becomes increasingly difficult to determine signals when you start at one end of the chain and work your way toward the other. PayPal is trying to traverse this gap, however, beginning with the transactions it processes and using the other data at its disposal (both internally and from external sources such as Facebook and Gnip) to try and figure out who its customers really are and what they really want. He argues this is easier than, say, Google trying track users from search through purchase — unless, of course, they actually purchase something using a tool like Google Wallet.

Mok Oh discussing PayPal’s Customer Genome analysis.

I think the greater lesson, though, is to make lemonade from the lemons that are your data. Assuming a company’s greatest data resource is the data it has gathered specific to its own business, a path toward big data success is to use that data as a starting point and then get creative figuring out ways to glean more insights from it.

Feature image courtesy of Shutterstock user Bruce Rolff.

  1. Fascinating subject. I am going to think about the phrase “sometimes it’s more important to know what to kill.” Realization of how to ask a question is critical to finding an answer.

    Share
  2. I love the concept of “making lemonade from the lemons that are your data”.

    Share
  3. Seems like a common theme is connecting two events, with one of them being something you want to track. This could be connecting feature usage with internal resourcing decisions or connecting a completed transaction with a chargeback, indicating fraud. The closer these events the easier it is to get some insight; so connecting a search all the way through to a purchase is a lot more difficult than linking certain transaction characteristics with fraudulent activity.

    Share
  4. Developing matrices as you sift through this data trying to connect the dots will be the real benchmark of success.

    Share
  5. Sounds more to me like; SOMETIMES or more often, we need to let the data interpret itself to us and succumb to its significance; rather than force our significance on the data.

    Share
  6. Well, one comforting thought is that everyone is new to this game – imagine this – 9o percent of the worlds big data got created in the last 2 years

    http://statspotting.com/2012/09/big-data-stats-90-of-the-worlds-data-created-in-the-last-two-years/

    Share
  7. “Context adds Value” has been leveraged since the early days of web sites, in the way of savvy merchants providing great content on their site to help give more context and background to the products they’re selling; And to deliver credibility around domain expertise.

    Share
  8. These are all really great points by Derrick. I would also add that in order to be successful with Big Data, domain expertise is necessary. Domain knowledge is the human intelligence that accumulates within a certain practice or process and is necessary to genuinely know which data from all the possible sources is valuable and which is not. This aspect of Big Data is the primary reason why opportunities around Big Data require business unit personnel to lead rather than follow more than every before.

    Share

Comments have been disabled for this post