Blog Post

That needle in the haystack of useful big data may be smaller than we thought

New research analyzing the log data churned out by applications developed on the Heroku(s crm) platform as a service shows just how little of that data is actually useful to developers or devops personnel running those applications.

logentries2Out of 22 billion log events across 6,000 Heroku applications, just 0.18 percent held information that a developer (or a devops pro) would actually need to know to prevent a failure or boost performance, according to Logentries. Taken another way, 99.82 percent of that data is the haystack of non-useful stuff.

Of course, Logentries is publicizing this because it’s in the business of helping people winnow out that 0.18 percent without having to sift through the rest, but still, it’s interesting.

Boston-based Logentries last month netted $10 million in venture funding to take on Splunk and Sumo Logic and make big data understandable to civilians (i.e., non-data scientists.)

Logentries claims that its SaaS service “pre-processes” data as it’s generated so that sussing out insights can happen faster.

6 Responses to “That needle in the haystack of useful big data may be smaller than we thought”

  1. William Louth

    logging is the most inefficient and useless of approaches to observation (monitoring) & control (management) of applications and services…it is ill-equipped to address the challenges facing IT mgmt

    “For this to happen we need for IT to change starting with how it (or its systems) observe. Moving from logging to signaling. Moving from monitoring to metering. Moving from correlation to causation. Moving from process to code then context. Moving from state to behavior then traits. Moving from delayed to immediate. Moving from past to present. Moving from central to local. Moving from collecting to sensing. When that has occurred we can then begin to control via built in controllers and supervisors.”

  2. andrew58

    Our research team was looking at one use case, that of a DevOps professional running an application on the Heroku platform. Many of our Heroku users have the DevOps role where they’re responsible for not only building the application but also running it. Usually, someone in this role is pretty interested in the Heroku error codes and application exceptions to help them understand the performance and reliability of their application. As the research highlights, these error codes and exceptions are buried by the other signal noise in the log stream.

    However, it’s not to say that the remaining 99% of the stream isn’t valuable, rather it depends on, as you’d probably expect, who you are and what information you’re looking for. DevOps is probably most interested in application health and performance, whereas marketing may be interested in user metrics and business performance. And InfoSec would be most interested in audit logs and security events. Etc. Etc.

    The great part about log data, is it’s a universal data source. Logs contain tons of information, known and unknown, for multiple different use cases, and for most folks they’ve been traditionally very difficult to access and use. This is where I’d normally put in a plug for Logentries, but I think the bigger point is most organizations aren’t aware of what’s available in log data, lack the resources to discover it themselves, and would benefit from the expertise of a third party service provider to help them dig through the noise and find the important bits that matter most.

    Regardless, as the research team highlighted, the proverbial needle in the haystack is actually much smaller than most people realize. And as mentioned in previous comments, there are likely more than needles in that haystack.

    You can check out the full research by visiting the following URL –

  3. Brian Gilmore

    Isn’t the fatal event just the outcome? How many of the exceptions, warnings, and critical events labeled here as “noise” could be used to predict and prevent fatal events from ever happening?

  4. Paul Calento

    But in that 0.18% the data is potentially incredibly valuable. That’s what makes Big Data such a challenge. There is value, but getting to it can be incredibly arduous. We are now in the era of Data Science, especially given the movement outside of traditional data scientists.

  5. This is a great topic…it isn’t like needles in haystacks suddenly became easy to find using Big Data technology and techniques. If we were looking for needles, that would be one thing, but visualized analytics are just as much about finding the unknown unknowns…sort of like looking in a haystack for something that isn’t hay.

    Companies asking how to get started (and that’s most companies that haven’t started) are looking for guidance, but the challenge in that is that every game plan for every company is a little different. Anyone thinking, “Hey, gang, let’s go find some needles” is going to be surprised that it isn’t quite like that.

    We wrote up the need to start gathering data and finding the patterns that matter, rather than starting with someone else’s plan: