Netflix’s newly announced Netflix Prize 2 would challenge competitors to recommend movies based on demographic and behavioral data. But one privacy expert says the company should pull the contest before it even launches, because the data Netflix is offering up about its users cannot truly be anonymized.
Netflix said yesterday it would be offering up a dataset of more than 100 million items including age, gender, ZIP code, genre ratings and previously chosen movies for many users. Paul Ohm, an associate professor of law at the University of Colorado Law School who blogs at Freedom to Tinker, called it a “multimillion-dollar privacy blunder” in the making.
Ohm quotes a study by computer science professor Latanya Sweeney that showed 87 percent of Americans can be identified by the combination of their gender, ZIP code and birth date. He says:
True, Netflix plans to release age not birth date, but simple arithmetic shows that for many people in the country, gender plus ZIP code plus age will narrow their private movie preferences down to at most a few hundred people. Netflix needs to understand the concept of “information entropy”: Even if it is not revealing information tied to a single person, it is revealing information tied to so few that we should consider this a privacy breach.
The anonymization of Netflix data had actually come into question with the previous contest as well, which only gave out user ratings. Researchers were able to “break the anonymity of the dataset,” though Ohm said that he ultimately believes Netflix acted responsibly — the first time around.
The specific law Netflix might be breaking now is the Video Privacy Protection Act, said Ohm, which as he says “prohibits a ‘video tape service provider’ (a broadly defined term) from revealing ‘personally identifiable information’ about its customers,” and entails “not less than $2,500” in damages for each violation. It’s possible that Netflix’s terms of service might protect the company, but Ohm thinks that given the current research around so-called anonymous data sets, Netflix would likely face millions of dollars in damages.