5 Comments

Summary:

Netflix’s newly announced Netflix Prize 2 would challenge competitors to recommend movies based on demographic and behavioral data. But one privacy expert says the company should pull the contest before it even launches, because the data Netflix is offering up about its users cannot truly be […]

Netflix’s newly announced Netflix Prize 2 would challenge competitors to recommend movies based on demographic and behavioral data. But one privacy expert says the company should pull the contest before it even launches, because the data Netflix is offering up about its users cannot truly be anonymized.

Netflix said yesterday it would be offering up a dataset of more than 100 million items including age, gender, ZIP code, genre ratings and previously chosen movies for many users. Paul Ohm, an associate professor of law at the University of Colorado Law School who blogs at Freedom to Tinker, called it a “multimillion-dollar privacy blunder” in the making.

Ohm quotes a study by computer science professor Latanya Sweeney that showed 87 percent of Americans can be identified by the combination of their gender, ZIP code and birth date. He says:

True, Netflix plans to release age not birth date, but simple arithmetic shows that for many people in the country, gender plus ZIP code plus age will narrow their private movie preferences down to at most a few hundred people. Netflix needs to understand the concept of “information entropy”: Even if it is not revealing information tied to a single person, it is revealing information tied to so few that we should consider this a privacy breach.

The anonymization of Netflix data had actually come into question with the previous contest as well, which only gave out user ratings. Researchers were able to “break the anonymity of the dataset,” though Ohm said that he ultimately believes Netflix acted responsibly — the first time around.

The specific law Netflix might be breaking now is the Video Privacy Protection Act, said Ohm, which as he says “prohibits a ‘video tape service provider’ (a broadly defined term) from revealing ‘personally identifiable information’ about its customers,” and entails “not less than $2,500″ in damages for each violation. It’s possible that Netflix’s terms of service might protect the company, but Ohm thinks that given the current research around so-called anonymous data sets, Netflix would likely face millions of dollars in damages.

  1. I’d happily opt in to this experiment if it creates more value for me as a consumer. There is no such thing as “privacy”; people need to start sharing their data for a quantifiable return. Most internet users today still have no idea how much a cookie says about them. They should we worried less about movie recommendations and more about what marketers are doing with the same information.

    Share
  2. I have mixed feelings on this. For one, I think that anyone willing to sue over their movie rental privacy is a little money hungry. And I also agree, if it could help my movie renting experience… then why not allow it?
    But on the other hand, if I just think of it in terms of the qualifiers they’re using to share data (taking out the fact that it’s something so minor as movie rentals and substituting it with something major, like medical records), I’m not as comfortable.

    Share
  3. I’ll give Professor Linkbait $100 if he can identify me by zip, age, and movie ratings. PS I didn’t care for Air Bud.

    Share
  4. I for one really enjoy netflix but there movie suggestions are way off! so there collected data can’t be all that good either.

    Share
  5. [...] of the prize contest was supposed to take advantage of demographic and behavioral data, but it became quickly clear that the use of this data would have huge privacy [...]

    Share

Comments have been disabled for this post