Blog Post

Why the collision of big data and privacy will require a new realpolitik

Stay on Top of Enterprise Technology Trends

Get updates impacting your industry from our GigaOm Research Community
Join the Community!

When it comes to protecting privacy in the digital age, anonymization is a terrifically important concept. In the context of the location data collected by so many mobile apps these days, it generally refers to the decoupling of the location data from identifiers such as the user’s name or phone number. Used in this way, anonymization is supposed to allow the collection of huge amounts of information for business purposes while minimizing the risks if, for example, someone were to hack the developer’s database.

Except, according to research published in Scientific Reports on Monday, people’s day-to-day movement is usually so predictable that even anonymized location data can be linked to individuals with relative ease if correlated with a piece of outside information. Why? Because our movement patterns give us away.

The paper, entitled Unique in the Crowd: The privacy bounds of human mobility, took an anonymized dataset from an unidentified mobile operator containing call information for around 1.5 million users over 14 months. The purpose of the study was to figure out how many data points — based on time and location — were needed to identify individual users. The answer, for 95 percent of the “anonymous” users logged in that database, was just four.

From the paper:

“We showed that the uniqueness of human mobility traces is high, thereby emphasizing the importance of the idiosyncrasy of human movements for individual privacy. Indeed, this uniqueness means that little outside information is needed to re-identify the trace of a targeted individual even in a sparse, large-scale, and coarse mobility dataset. Given the amount of information that can be inferred from mobility data, as well as the potentially large number of simply anonymized mobility datasets available, this is a growing concern.”

Just because you’re paranoid…

For those already worrying about the privacy-busting implications of mobile device use, this should come as no surprise. As CIA CTO Ira “Gus” Hunt stressed last week at GigaOM’s Structure:Data conference, mobility and security do not go hand-in-hand. You can be constantly tracked through your mobile device, even when it is switched off. What’s more, those sensors you’re pairing with your device make it ridiculously easy to identify you.

From Hunt’s speech:

“You guys know the Fitbit, right? It’s just a simple three-axis accelerometer. We like these things because they don’t have any – well, I won’t go into that [laughter]. What happens is, they discovered that just simply by looking at the data what they can find out is with pretty good accuracy what your gender is, whether you’re tall or you’re short, whether you’re heavy or light, but what’s really most intriguing is that you can be 100 percent guaranteed to be identified by simply your gait – how you walk.”

One of the explicit purposes of Unique in the Crowd was to raise awareness. As the authors put it: “these findings represent fundamental constraints to an individual’s privacy and have important implications for the design of frameworks and institutions dedicated to protect the privacy of individuals.”

But this isn’t just about mobility; it’s also about the implications of our big data society. These are effectively two sides of the same coin – mobile devices make it easy to collect data, while big data capabilities make it increasingly trivial to take the resulting mass of supposedly anonymized data and tease out the kind of specificity that the anonymizers were trying to erase.

This was precisely the sort of problem foreseen by Europe’s cybersecurity agency, ENISA, a few months back when evaluating the continent’s proposed “right to be forgotten”. If a citizen really wants all traces of their personal data removed from the web, ENISA pointed out, that would have to mean removing their data from anonymized datasets as well as from more obvious repositories such as social networks and search indices.

As ENISA said at the time:

“Removing forgotten information from all aggregated or derived forms may present a significant technical challenge. On the other hand, not removing such information from aggregated forms is risky, because it may be possible to infer the forgotten raw information by correlating different aggregated forms.”

Shall we just give up now?

The Unique in the Crowd authors stressed in a BBC interview that “we really don’t think that we should stop collecting or using this data — there’s way too much to gain for all of us — companies, scientists, and users.” So what can be done?

Personally speaking, I have been writing about issues around data privacy for many years, and I still cannot see any easy solution to this problem. If it were simply a case of which side of the argument carries more weight, I would have no hesitation in siding with the privacy brigade: selling data to advertisers in order to fund that “free” app does not justify the creation of a surveillance society.

But it’s just not that simple. That Fitbit is also trying to help you keep fit — the fact that it can identify you by accident doesn’t change that fact. Mobile operators’ datasets help keep their networks running. Location-based services don’t work without location. We even hope big data capabilities will help us fight diseases and socio-economics problems. And, most importantly, despite the fact that most people in the U.S. and European Union insist they want better data privacy, we see time and again that this desire doesn’t translate into action – people still give up their data without much consideration.

What we need is a new realpolitik for data privacy. We are not going to stop all this data collection, so we need to develop workable guidelines for protecting people. Those developing data-centric products also have to start thinking responsibly – and so do the privacy brigade. Neither camp will entirely get its way: there will be greater regulation of data privacy, one way or another, but the masses will also not be rising up against the data barons anytime soon.

There needs to be better regulation that works in practice – unlike Europe’s messy cookie law or the “right to be forgotten”. It may be that the restrictions will need to be on the use of data rather than its collection, as proposed in a recent World Economic Forum report. However, regulators tend not to be very proactive, particularly when the risks, while inevitable, remain mostly theoretical.

I suspect the really useful regulation will come some way down the line, as a reactive measure. I just shudder to think what event will necessitate it.

5 Responses to “Why the collision of big data and privacy will require a new realpolitik”

  1. Reading your article I revisited my 2007 post to an I Cringeley blog on whether or not Google was going to own the Internet. Now 5 years down the road I am even more convinced that dealing with the immense challenges of our Brave New World indeed require the revisiting of the thought processes of Montesquieu to decide who should own the power to control Big Data:

    “…. Challenge number three involves power. Is being the middleman sufficient or do you want to achieve absolute power? We are entering an era with the promise of an “Ambient Intelligence”. Communicating via a “personal communicator”, commanding non-resident applications via voice-control and many other Starship Enterprise like features will become part of our not to distant future. But more importantly an era were your personal information will become an integral part of a Grid. Owning this Grid or just merely controlling this Grid and it’s abundance of personal data – for some information and for others a wealth of Intelligence – involves a major issue. In earlier days we (I believe Montesquieu) have developed the concept of the separation of powers, we are again going to be confronted with a major question: …

    Are we going to separate User – Application – Data Storage?”

    To conclude: Taking the wrong Big Data decisions are even more enormous than already expected. Let me therefore ask the same question I asked five years ago: Can we trust governments and corporations in this? Or do you want to take the change that they ultimately turn out to be Snowball and Napoleon wrecking our Animal Farm dreams.

    My vote goes to that old French guy.

  2. To what Andre said, I heard a good analogy on Twitter. Would you allow me to put a video camera in your bedroom if I agree to abide by a use policy for the recording? We can’t rely on use policies alone to protect privacy.

  3. André Rebentisch

    Interesting post but I don’t see an alternative between collection and use restrictions. Collection safeguards help to clear data for uses. RTBF just clarifies the existing right to have your personal information rectified and the principle that data should be deleted when it is not needed anymore. A message about “big data” ideas, a kind of remix of what was called data mining a decade ago with open data, has not even reached political spheres and appears naive in terms of data protection. So you cannot base a regulatory argument on that.

  4. Forget about anonymized location data being used against you. The government is already building the infrastructure to track everything American citizens do online, extending into everywhere people go with their smartphones, linked to each individual person, not even anonymously. The same surveillance-state acts our corrupt public government officials criticize countries like China for doing, our own government is doing behind the scenes. Look up the huge data center the NSA is building in Utah right now for one example, and then research how warrantless wiretapping is permitted domestically. Look around and wake up people!

  5. Mark van Rijmenam

    Interesting post David and I agree that fore sure we will need to develop ethical guidelines for big data to ensure that we do not loose all our privacy. I have tried to start a discussion here: to create four ethical principles to adhere to for organisations: simplicity, transparency, security and privacy.