Summary:

Causata is really good at helping companies identify consumers and, thanks to new machine learning features, helping them predict behavior many steps down the line. Does all this personal data create a privacy concern? Depends who you ask.

IdentityGraph_01_0

I’ll be frank: Causata’s marketing software is a little creepy in the level of personal data it collects and analyzes, but it also seems very good at what it does. Good enough for the company to close a $7.5 million Series C round from Accel Partners in December, bringing its total funding to $23 million (all from Accel) since launching in 2009. It’s latest wrinkle: machine-learning algorithms that automatically figure out which campaigns are most likely to work on what customers.

If you’re not familiar with Causata, it’s a true big-data application dedicated solely to stitching together customer identities so marketers know what they want. It collects first-party data — cookies, email addresses, usernames, site activity, customer service phone calls and everything it can, really — and stuffs it into an event store, from where users can run predictive algorithms against the data. Because it takes in such a wide variety of data, Causata stores everything in HBase, the NoSQL database that sits atop the Hadoop Distributed File System and is designed with such unstructured or semi-structured data in mind.

IdentityGraph_01_0

Previously, though, as VP of Marketing Brian Stone explained to me, analytics and predictive modeling within Causata were solely an offline function. Analysts used R, Tableau, Qliktech, plain SQL or their data-analysis tool of choice in order to work through data, learn who’s who among customers and then ultimately build their models. With the new machine-learning capabilities, the system is always looking at how companies are targeting consumers and how those consumers are behaving, and then generating models to predict how certain actions might influence behavior one or even many steps down the line.

Once the data analysts figure out who’s who and how particular microsegments are likely to respond to particular actions, the marketing team can put these models to work in their existing platforms for placing advertising, surfacing offers or whatever other methods they might use to try and reach consumers.

About that personal data …

Any time we’re talking about personal data, though, a certain subset of consumers is likely to get creeped out — and rightfully so. It comes down to that now well-known tradeoff between how much we value personalization and how much we value privacy. Not surprisingly, Stone says he’s open to advertising when it’s “personalized, timely, relevant and intelligent.” If his bank didn’t “continually misfire” in trying to make him loan offers that don’t match his situation — something they should know based on his account information, online banking and site activity — he might actually be willing to take it up on an offer.

Besides, he noted, the only time a human being (at least using Causata’s software) would ever really have reason to look at personal-level data is during troubleshooting or when trying to figure out better methods for segmenting customers. Ideally, this is by activity-based data such as price-consciousness or loyalty rather than classical demographic data such as age, sex or race. But in terms of actual ads or offers served, the system clocks your activity, runs a predictive analysis against your identity profile and returns a result in well under a second.

This happens to be the same method, or at least a similar method, undertaken every time we see personalized ads online: No human being is sitting around, looking at our data and deciding we need hemorrhoid cream.

MachineLearning_02

Given the amount of digital data we’re contractually giving away every time we use surf the web or use our smartphones, combined with the number of companies out there trying to help marketers make sense of it, the personalization genie probably isn’t going back into its bottle. Not that that’s necessarily a bad thing.

I’m reminded of a conversation I had with IBM Fellow and overall identity-data genius Jeff Jonas nearly three years ago. He explained his theory on how extensive data tracking will ultimately lead to a surveillance society but we’ll love it because we love optimization. “It’s seemingly irresistible to us,” he said.

When someone actually gets targeted advertising right, maybe it will be.

To learn more about machine learning, privacy, Hadoop and everything else driving the discussion around big data, come to our Structure: Data event March 20-21 in New York.

Comments have been disabled for this post