2 Comments

Summary:

The open-source, data-processing tool Hadoop is already popular for a variety of use cases that can benefit from clusters of machines churning through unstructured data — such as search engines and social-media analysis — and now it’s turning its attention to security data.

security checkpoint

It looks like we can add security to the growing list of killer apps for Hadoop. The open-source, data-processing tool is already popular for search engines, social-media analysis, targeted marketing and other applications that can benefit from clusters of machines churning through unstructured data — now it’s turning its attention to security data.

At a high level, using Hadoop — or any big data tool — to sniff out security problems makes sense for the exact same reason to use Hadoop to do anything: Organizations have lots of data, and they’re trying to glean as many insights as possible from it.

I think targeted applications for specific industries and uses will help drive Hadoop and other big data tools into mainstream businesses, and security is a great starting point. Security concerns are universal, and anything that can bring a proven and white-hot technology like Hadoop to bear on them should garner serious attention.

Chris Hoff, senior director and security architect at Juniper Networks, recently wrote on his Rational Survivability blog about the problems and promise of big data as it relates to security. Essentially, he argues, even though it has been possible for quite a while to capture and analyze security data, the effectiveness has always been limited because it was difficult to draw insights outside the realm of the security applications from which the data was coming.

“Even when we do start to be able to integrate and correlate event, configuration, vulnerability or logging data, it’s very IT-centric,” Hoff explains. “It’s very INFRASTRUCTURE-centric. It doesn’t really include much value about the actual information in use/transit or the implication of how it’s being consumed or related to.”

Hadoop addresses this problem because it serves as a central repository for all of an organization’s unstructured data. Security-specific data can be analyzed against other data sets to create a clearer picture of what’s really going on, and how it might be affecting other parts of the business. However, leveraging new technologies such as Hadoop isn’t necessarily an easy prospect if organizations don’t already have in-house data analytics expertise.

Unless, of course, someone were to create an application that’s tuned to address a specific problem using a new technology, but that buries its complexities within the business logic.

That’s exactly what’ starting to happen with security. Tuesday, for example, big data startup Zettaset announced its Security Data Warehouse, which is designed specifically to mine security data using Hadoop as the storage and processing engine. As part of Zettaset’s greater Data Analytics Platform, the new product ties into an organization’s entire data warehouse and carries on the legacy of making it easier for them to process, visualize and consume that data. Hadoop is at the core of Zettaset’s products, but users interact with a specialized API and management interface designed to be more intuitive than using Apache Hadoop alone.

A few months ago, I covered another startup, called ipTrust, that uses Hadoop to process web-traffic data and assign reputation scores to IP addresses. When deployed as a service or within a security vendor’s firewall product, ipTrust makes it easier to determine how risky it will be to allow network access to traffic from certain points. But security administrators interacting with the product have no interaction at all with the Hadoop cluster that powers it.

And that’s the trick. Hadoop is helpful by itself, but it’s a lot more helpful if someone else takes out the guesswork of managing the cluster and figuring out how to put it to use for a specific task. Whether pointed at security or some other workload, applications that bury Hadoop under specialized interfaces and algorithms, or that deliver its results as a service, are the future of big data for most businesses.

Feature image courtesy of Flickr user Paul Keller.

You’re subscribed! If you like, you can update your settings

  1. “Hadoop addresses this problem because it serves as a central repository for all of an organization’s unstructured data. Security-specific data can be analyzed against other data sets to create a clearer picture of what’s really going on, and how it might be affecting other parts of the business.”

    What is unstructured security data look like?

    I would argue that aside from machine data (log files, et al) there really isn’t a concept of “unstructured” data as the term is used in the analytics community

    What companies are dealing with BigData (petebytes) of security data?

    I would argue that very few organizations, save maybe the NSA or other clandestine 3-letter agency, has a BigData security issue, as BigData is defined.

    I think Hadoop (or BigData) may have some interesting value benefits to security, but it is quite a leap from BigData to Hadoop to security analytics.

    You can read my thoughts here:

    http://techbuddha.wordpress.com/2011/07/25/bigdata-hadoop-and-the-impending-informationpocalypse/

    1. You have no idea how much security data a large corporation can create in just a few hours. More than you can ever imagine.

Comments have been disabled for this post