7 Comments

Summary:

Big data technologies are like manufacturing robots: they let people do what they’re already trying to do, only faster than before and at a much greater scale. But as with any other product, that analyzed data is nothing without humans to do something with it.

Despite all the talk about companies using big data to uncover insights, maybe automation is the real reason the world is so excited about big data. What makes the big data era so significant isn’t that people are using data to inform their decisions, but that there’s just too much data of too many different types. In many cases, keeping up isn’t so much a matter of changing mindsets as it is about getting better tools.

Last week, New York Times reporter Steve Lohr wrote about the possibility of a big data bubble forming because people rely too much on data at the expense of experience and intuition. It got me thinking about all the technologies and algorithms I’ve covered, about all the discussions I’ve had about why a data scientist is more than just a statistician who can write MapReduce jobs. Nearly everywhere, it seems to me (save for, as Lohr cites, unique uses such as algorithmic trading), big data really is less about replacing human intuition than it is about augmenting the human experience by making it easier, faster and more efficient.

Like the purpose-built robots that have revolutionized manufacturing, today’s methods for processing and analyzing data are fast, scalable and precise, but they don’t yet (in most cases) make our decisions. Big data can make life and business a lot more efficient, but for the time being, human judgment and willpower are still very much in control.

Offloading grunt work to the machines

We’ve recently covered some obvious examples of this. Take, for example, recent university research demonstrating how media researchers could use machine learning and natural-language processing to save themselves the work of manually reading and coding every piece of text they wish to analyze as part of a study. Algorithms — like robots in manufacturing — are doing the mindless, repetitive tasks of discerning subject matter, keywords and sentiment, but researchers are still the ones poring over those results and telling us what it all means.

A couple months ago, I spoke with Recommind CEO Bob Tennant about how attorneys are using software to pore through terabytes worth of electronic documents during the discovery process. Predictive coding, as it’s called, frees them up to focus more on case strategy than on the tedium of analyzing every single PDF and email message to figure out if it’s relevant to a case. However, he noted, although the software typically does a better job than a person alone would do, most law firms still use a hybrid man-machine approach to leverage the strengths of both and ensure nothing gets missed. And the software certainly doesn’t assess a document’s relative legal relevance in light of a case’s facts and craft an argument around it.

A screenshot of the Analyst Overview

A screenshot of the Analyst Overview

Even software products such as BeyondCore, which aim to minimize human involvement in the data analysis process as much as possible, are actually just about making business people more efficient. In this case, people are only integral to the first and final steps — selecting the metric with which they’re concerned and then interpreting the statistical correlations, respectively. The messy middle step of asking the right questions is (in theory) eliminated by software that analyzes all the possible correlations and scores and presents them accordingly.

In this sense, one of the better descriptions I’ve heard about actually using data in the corporate world came from ClickFox CEO Marco Pacelli, who compared it to figuring out which few of dozens of cockroaches to kill when the light comes on. Big data, like the flick of the light switch, can show people what’s really going on under the surface. But a smart executive still must figure out how to best solve the problem, capitalize on the opportunity or just put the situation into perspective.

Algorithms can only be so human

Of course, those examples are easy and largely ignore the world of really big data that exists on the web and presents its own its own challenges. Lohr, for example, citing Eli Pariser’s “The Filter Bubble: What the Internet Is Hiding From You,” noted a particular fear “that the algorithms that are shaping my digital world are too simple-minded, rather than too smart.” That’s an astute observation in a world of hyper-personalization, where one could easily find himself snowblind by the content, products, etc., he’s supposedly interested in, making it all the more difficult to gain visibility into the broader world.

But perhaps we’re just expecting the web to be smarter than it is and, really, smarter than any service built on the idea of scale probably should be. For example, web and mobile apps, ranging from Amazon Web Services to Instagram, are only able to automate processes for potentially billions of users because they offer fairly generic services (subscription req’d). Broadly applicable features and non-negotiable terms of service (however problematic) mean companies can focus on building great products rather than wasting time negotiating features and terms with every user.

You want data security or site reliability? Figure it out yourself or wait for your service provider to do it on its own time.

A sample interest graph from Gravity.

A sample interest graph from Gravity.

Why should personalization algorithms be any different? They can do a heck of a job automating the discovery of stuff we’re interested in, but creating a model intelligent enough to know when any given individual wants to — or needs to — view content outside their their typical interests could prove incredibly challenging for services that deliver personalization in part by identifying broad patterns in user behavior. It’s just not what they’re designed to do.

The web is an expansive place: If we as web users really don’t want to be slaves to algorithms and our usernames, maybe it’s up to us to log out, clear our caches and go do some anonymous digging.

Melding man and machine

That being said, the people tasked with creating the algorithms that power so many web services do seem to understand the need for human input in the model-building process, at least. Even machine learning — a term that conjures up images artificial intelligence and self-aware computer networks — is often just a tool to make data scientists’ lives easier through automation.

Smart data scientists knows know they can’t trust the machines alone, which is why companies doing everything from predicting the content you’ll like to predicting your credit risk have figured out how to make machines work for humans instead of replacing them. Yes, machine learning algorithms and big data technologies analyze a volume of data points that humans could never do, uncovering complex relationships the naked eye could never spot. But once the heavy lifting is done, humans come in and use their subject-matter expertise and logic to prune off bad connections, add context and maybe even inject a little serendipity into the final algorithms.

Whether it’s corporate business intelligence or the consumer web, though, all of this is about automation. Data-minded people have always used data to aid in decision-making without ignoring their instincts. Big data just lets them learn a lot more, a lot faster.

We’ll be talking a lot more about these issues and more at Structure: Data, from March 20-21 in New York, so feel free to mark your calendars. In the meantime, here’s a clip from last year’s event with lots of discussion about machine learning, including how humans will continue to play a role.

Watch live streaming video from gigaombigdata at livestream.com

Feature image courtesy of Shutterstock user Nataliya Hora.

  1. Depends on how you define Big Data and if you include M2M and IoT yes indeed it’s more about automation

    Big Data: Rise of the Machines http://nyti.ms/Vsnj3R

    Machine-generated sensor data will be become a far larger portion of the Big Data world, according to a recent report by IDC. The IDC forecast also suggests that there is a lot of substance to the vision of machine-to-machine communication and intelligence that W. Brian Arthur terms “the second economy.”

    > Whether it’s corporate business intelligence or the consumer web, though, all of this is about automation. Data-minded people have always used data to aid in decision-making without ignoring their instincts. Big data just lets them learn a lot more, a lot faster.

    Your point in discussed here with lots of comments

    We don’t need more data scientists — just make big data easier to use http://gigaom.com/2012/12/22/we-dont-need-more-data-scientists-just-simpler-ways-to-use-big-data/

    Share
    1. Good points. I actually wasn’t considering IoT or M2M, except of course that people might want to work device data into their analyses or algorithms. I was thinking about humans as the intended consumers of analysis, and automation in the manufacturing sense.

      Share
  2. Great post, and I fully agree. Machine learning (both supervised and unsupervised) is now practical for big data applications, thanks to the decline in storage, compute and bandwidth costs. Like factory automation, ‘machines’ can be taught the skills of humans for repetitive tasks (pattern recognition, for example). The business analyst/data scientists can focus on higher order questions as a result.

    Share
  3. Derrick, this is an excellent piece and overdue. There’s a certain amount of hazy enthusiasm about big data that hides the fact that it still comes down to humans having to make great decisions. We’re automating the feeding of those decisions but not necessarily changing the outcome (unless the outcome is a faster decision based on more information).

    More or less in response to the hype, we wrote up what we’ve nicknamed our Big Data Manifesto: http://successfulworkplace.com/2012/10/28/big-data-must-not-be-an-elephant-riding-a-bicycle/

    This article focuses on the wide variety of human decisions and complementary automation that makes big data work. Comments welcome.

    Share
  4. Great article Derrick – Our system helps advertisers analyse their data and see which channels have performed. A lot of human interaction is needed to take findings from the results but we automate the hard and frankly boring part of the job of collating and analysing the data and visualising the results!!

    Share
  5. I do not agree with most of you. I don’t find it a so great article. It is pretty straight forward that in first place the incentative to analyze data is for someone to make a decision or better informed decision at the end. This being said, the article is auto-feeding itself with this reasoning saying the analysis methods aren’t making any decisions by themselves because they are not making the final decision.

    Well, yes, there is a lot of marketing bullshit out there, but this was always the case with any product since the human kind is clever enough to communicate. And what is the whole point about this article? Selling us a conference in March sponsored by some of these bullshitters.

    Share
  6. Great article Derrick. I think the term “big data” creates a lot of confusion because of the ambiguity. Humans are still the critical element in the decision making process and it’s technologies that enable us to perform better that really make the difference. I’m excited to see big data and machine learning continue to evolve from an overflow of raw data to actionable intelligence.
    - @MrRyanConnors

    Share

Comments have been disabled for this post