Stay on top of emerging trends impacting your industry with updates from our GigaOm Research Community Join Research Community

Report

How cloud computing plus Facebook might mean the end of personal privacy v

Table of Contents

  1. Summary

Summary

In 1890, attorney Samuel Warren and future Supreme Court Justice Louis Brandeis wrote an influential Harvard Law Review article called “The Right to Privacy.” They were concerned about the advent of photographic cameras and newspapers’ practice of sharing the details of subjects’ personal lives.

As they put it:

Recent inventions and business methods call attention to the next step which must be taken for the protection of the person, and for securing . . . the right “to be let alone.” Instantaneous photographs and newspaper enterprise have invaded the sacred precincts of private and domestic life; and numerous mechanical devices threaten to make good the prediction that “what is whispered in the closet shall be proclaimed from the house-tops.”

If Warren and Brandeis were upset then, one can only imagine how the seriously increased pace of technology has privacy scholars’ heads exploding today. Warren and Brandeis were pretty certain of the right to be let alone. If there ever was one, it’s fading fast.

Now, thanks to Facebook, if we can accurately put a name to a face, there’s a lot of personal information available online to tell us about that person. The woman sitting next to me at the bar, for example, could know my name, address, interests and, possibly, Social Security number within a few seconds of using an iPhone app. As long as social media was going to come into existence, this was bound to happen. Facebook just happens to be one centralized location containing a lot of faces attached to names. What this culture of sharing and the technology to exploit it represents, though, is that we might have to reconsider notions of privacy in a Web 3.0 world and beyond.

On Facebook, one can be identified

In early August at the annual Black Hat security conference in Las Vegas, Carnegie Mellon University researcher Alessandro Acquisti showed that soon, all it will take is a poorly snapped photo of someone to learn in real time sensitive information about that person, including his Social Security number.

Acquisti’s experiments began simply enough: He and his colleagues created a database in the cloud containing 330,000 images from Facebook profiles in a particular city. Next, they grabbed supposedly anonymous online dating profiles from the same city. Using a tool called PittPatt (developed at Carnegie Mellon, since bought by Google) and some cloud servers, Acquisti’s team was able to definitively identify 10.5 percent of names behind those dating profiles by analyzing their photos against the Facebook database that had linked photos with Facebook pages.

In a later experiment, Acquisti’s team created a database of Facebook photos featuring students at a particular university, then conducted a faux survey about Facebook activity using participants they had found strolling the quad. Before answering any questions, the students let the researchers take headshots of them, believing it was necessary for the survey. The researchers sent those images to the cloud, analyzed them against the Facebook database and presented participants with a Facebook photo featuring them when they reached the survey’s final question.

chart1
Source: Carnegie Mellon University/Alessandro Acquisti

PittPatt accurately generated photos for 31.18 percent of the participants. That percentage is impressive but not necessarily shocking, considering that almost all the students were on Facebook. But here’s the scary part: One of the students didn’t have an account. He was identified using a photo of him publicly available on a friend’s Facebook account.

A number of brave souls from that experiment returned to participate in one that took identification to another level. Already having presumptive names to go with participants’ faces, Acquisti’s team was able to accurately identify participants’ interests about 75 percent of the time using information publicly available on their Facebook pages. Once they had a name, they could easily link it to a profile. Given two attempts, it accurately guessed the first five digits of participants’ Social Security numbers 16.67 percent of the time. That number increased to 27.78 percent with four attempts.

According to Acquisti, the probability of accurately guessing the first five numbers of a Social Security number randomly is .00014 percent, but Acquisti and his team had help. For one, they studied the Social Security Administration’s publicly accessible Death Master File and were able to identify patterns in SSN assignment based on factors such as citizens’ names and dates and places of birth.

Acquisti calls the process data accretion: You go from an anonymous face to a presumptive name to personal information. To be fair to Facebook, photos could come from any number of services (e.g., LinkedIn, Flickr or Twitter) that include names and photos, and the information could come from just about any site that includes personal information (from Facebook to Spokeo to WhitePages).

Acquisti and his team even created a mobile application called Wingman, which demonstrates the possibilities of taking their work on the road. The clever name speaks to its potential use in the dating world: Snap a quick photo of someone and immediately see it overlaid with interests pulled from his social media profile or profiles.

Of course, one could imagine someone using it for other purposes. Imagine spotting a tourist or someone out on the town, pulling his address from a site like Spokeo and robbing him while he’s out. The possibilities are endless: All you need is a snapshot and some cloud servers.

Caught on video

Consider this hypothetical: A decade from now, many of us will be carrying audio- and/or video-recording devices that document our every move and utterance.

Some believe that will be the case. As far back as 1998, privacy scholar David Brin predicted such a situation in his book The Transparent Society. By Brin’s logic, such devices will have profound effects on the ways we interact with everyone around us, from our friends to the authorities to, ostensibly, prospective victims.

Brin suggests a silver lining to this world of constant documentation: a clearer picture of reality. He discusses this possibility with regard to cameras, but the theory easily extends to video cameras and audio recorders. In a world where photographs can be forged almost to the point of being undetectable, he theorizes, more pictures of any given event will help us easily weed out the frauds. Think about paparazzi photographs, for example. It might be tempting to doctor an image or video to make it appear more scandalous, but 30 other photographs or videos taken by different people at the same time will expose the hoax.

lv2
Source: Carnegie Mellon University/Alessandro Acquisti

Ron Bekkerman, a researcher at LinkedIn, predicts that this world of constant audio-video documentation will be upon us by 2020, and he sees an opportunity for new analytics avenues because of it. Bekkerman recently published a paper describing his efforts to automatically classify and categorize textual phrases. As the paper explains:

We deployed the [phrase-based classification] system on the task of job title classification, as a part of LinkedIn’s data standardization effort. The system significantly outperforms its predecessor both in terms of precision and coverage. It is currently being used in LinkedIn’s ad targeting product, and more applications are being developed.

Using crowdsourcing and some advanced algorithms, Bekkerman’s system classified approximately 100 million job titles with 80 percent accuracy. It did so at a relatively low cost. Presenting the paper at the recent International Conference on Knowledge Discovery and Data Mining, Bekkerman appeared confident that with some more tooling, his system could handle the classification of audio phrases, too.

In Bekkerman’s vision, we’ll be recording ourselves almost every waking second as a means of digitally documenting our lives, uploading our recordings to cloud-based storage lockers. In my vision, Facebook or some similar platform could very easily become that repository, as much for the purpose of sharing among our friends as for safekeeping. Imagine the possibilities if systems like Acquisti’s facial-recognition project were tuned to analyzing our voices or our utterances.

Matching strangers’ voices to their online profiles would be one possibility, as would be crawling through publicly available videos and classifying people or profiles based on the phrases detected in their videos. Whatever the case, it’s just one more avenue through which to track our personal information, and it’s far more intimate. Continuously running video is less forgiving than selecting photos or text entries that likely have undergone at least some degree of mental editing.

Privacy concerns know no bounds

It’s the things we willingly decide to share that are so tricky: They’re often either publicly available or easily obtainable with a single click on the Follow or Friend button. And we’re sharing more data and more types of data with each passing year. Legally, we don’t have an expectation of privacy in public places — including our own backyards — and we shouldn’t expect a right to privacy for public data.

However, as anyone following the news knows, it’s not just Facebook, Google+, Twitter and other social media sites that are responsible for pushing the boundaries of privacy. They’re just the most fun to talk about, because so many people use them to willingly share huge amounts of information.

It’s this fact that makes the discussion around online privacy so different from other privacy discussions. Take cell phone data: In many instances authorities can access call records and location data from providers without first obtaining search orders. Police and federal agents have access to the same (or similar) cloud resources and big data tools that Acquisti’s team uses for its research. They can learn a lot about whom we know, where we travel and how it’s all connected, and in many jurisdictions, they can do so without us ever knowing.

But this type of government surveillance is governed by existing cybercrime laws or addressed by courts on constitutional grounds. That’s because it’s either clearly illegal (e.g., hacking into or exposing supposedly private data) or because it raises issues about the scope of the Fourth Amendment.

Of course, this level of insight into our personal lives isn’t reality just yet. There are still numerous technological hurdles that must be overcome first. Acquisti acknowledges, for example, that his experiment worked so well because he had clear frontal photos of subjects’ faces and because he was working with image databases confined to the known geographical locations of the subjects.

But with time, he said, these obstacles will be overcome. New technologies such as cameras in eyeglasses, or even in contact lenses, will make it easier to get good photos without having willing participants. Facial recognition software will get better. Thanks to advances in cloud computing and big data processing, it will become easier — and less expensive — to build large databases of photographs, videos or anything else and run analyses against them.

One has to think this will be appealing to authoritarian governments, law enforcement officials, marketers and others who have vested interests in finding out who’s who. Imagine taking a picture of a stadium full of people and identifying attendees one by one. The possibilities are endless for how that knowledge could be used. If something like Acquisti’s Wingman app ever comes to be commercially, any of us could be a download away from fulfilling our voyeuristic urges.

Whatever happened to the right to be left alone?

This fate might be Warren and Brandeis’ worst fear. Forget about the archaic type of intrusion that comes from merely taking a photograph. Now a discreet photograph will be all it takes for perfect strangers to know personal information about us. It’s disturbing to think of a world where anonymity is not necessarily possible.

There’s really no avoiding this brave new world. New policies such as Facebook’s tagging controls are a positive step toward letting individuals control their own stories, but they’re not perfect. For one, they only work if someone is on Facebook, and then only if they actively alter their privacy settings to limit who can tag them without permission.

It’s people who are not online at all, or at least not participating on popular social media platforms, that might be most vulnerable. If you happen to be in a prolific social media user’s flesh-and-blood social graph, you’re going to get tagged on Facebook or mentioned in a Flickr caption. Unless you know that it has happened, you don’t know to ask to have the tag removed.

This situation is nothing new, of course; it’s been happening since the advent of blogs. I tell my story, you’re part of it, and I publish. A good number of teachers, public officials and others have lost their jobs for speaking ill of, or just too intimately about, their students or colleagues. The big difference is that identifying someone from a blog post means either willingly searching him out or knowing him in the first place. It also requires the author to actively write about someone else.

With photo and video tagging, sharing personal information about someone else’s life is a lot easier. This is a trickier line to draw when it comes to privacy, which makes it harder than ever to keep your personal information personal.

It’s not worth fighting it

At the end of the day, though, all we really can do about new, potentially privacy-intruding technologies is wring our hands and prepare for a different future. It might be little shocking at first, but we’ll adapt. Even now, younger generations that grew up with Facebook as the norm might consider possibilities like what Acquisti predicts perfectly normal.

Future inventors will thank us for not killing potentially world-changing innovation. Warren and Brandeis had a justifiable beef with the practices that accompanied the camera, but the camera itself was revolutionary. The same goes for cloud computing, big data tools and facial-recognition software. Like all new technologies, they will enable many new capabilities, some beneficial and some less so. We’ll figure out a way to live with them and get the most from them.

Access Report

Available to GigaOm Research Subscribers

Subscribe to
GigaOm Research