The news that British 4G carrier EE is trying to sell anonymized user data, in league with market research firm Ipsos Mori, has been greeted with wrinkle-nosed outrage — particularly the part about the Metropolitan Police being a potential customer. After all, the UK has just (mostly) dodged proposed legislation that would have led to monolithic registers of citizens’ online communications. This is just a privatized version of the same thing, right?
The short answer is no. The Sunday Times (paywall alert) may have billed its story as being about the potential sale of 27 million people’s details to the cops, but the reality is somewhat less alarming. As Ipsos Mori has been forced to explain in response to the exposé:
“In conducting this research we only receive anonymized data without any personally identifiable information… We do not have access to any names, personal address information, nor postcodes or phone numbers. We can see the volume of people who have visited a website domain, but we cannot see the detail of individual visits, nor what information is entered on that domain. We only ever report on aggregated groups of 50 or more customers. We will never release any data that in any way allows an individual to be identified.”
So what does this data tell us? According to the original article, it provides insights based on “gender, age, postcode, websites visited, time of day text is sent [and] location of customer when call is made”.
Now, as we discussed recently, it is easier than you might think to de-anonymize data due to the uniqueness of our personal movement patterns — as long as you have the will, the datasets and the pieces of identifying information that can be correlated with the anonymized individuals effectively described in those datasets. So those horrified reactions to the weekend’s revelations are not entirely groundless. They are over-the-top, though.
There is a significant difference between a register of communications (who contacted whom and when) and a pool of anonymized data where the most fine-grained nugget of information that might be reverse-engineered would tell you that Person X visited the Gmail domain while within a 100 meter radius of the corner of Oxford Street and Tottenham Court Road. To assume equivalence between the two ideas is to ignore the elements of intent, will, data-crunching capacity and, frankly, competence. In short, there are far easier ways for the police to track individuals through their handsets, such as just going to the carrier and demanding to do so.
(The Sunday Times said sources claimed “officers had been enthusiastic about the potential for tracking users of pay-as-you-go phones,” but – quality of sources notwithstanding — I suspect those officers may have been slightly overestimating their own data-crunching powers. They may have also overlooked the fact that the operators would have no idea of their pay-as-you-go users’ age or gender, making it near-impossible to tease out an individual from the anonymized mass. Either way, they backed off once the story broke.)
And then there’s the matter of this data’s innocent utility. Of all the sources of “big data” that is both largely untapped and genuinely useful, mobile operators must be among the most potentially fruitful. In societies where everyone is carrying a phone, there can be no better way to establish the density and fluidity of traffic flows and footfall. This data is gold dust, not just for retailers, but also for town planners and councils. It shows us how our cities and roads really work, and it can help us make them more efficient and pleasant to live in or use.
I feel a bit sorry for EE in this particular case. After all, its rivals Telefonica (trading as O2) and Vodafone are also offering up their customer data for analytics purposes – Telefonica’s “Dynamic Insights” program is being carried out in partnership with market research firm GfK, while Voda launched its mobile analytics play just last Friday.
“Everyone is doing it” would be a lousy apology in itself, but I don’t think any of these carriers or their partners are doing anything wrong, as long as their datasets are suitably anonymized. If people could feasibly be personally identified from this data, the carriers and their market research partners would instantly find themselves on the wrong side of existing data protection legislation — the fines in the UK for this stuff are pretty paltry, but they would also quickly lose the trust of their customers, so there’s little motivation for the telcos and their partners to cross the line.
It’s great that people are concerned and watchful about their privacy, and long may they continue to be. However, this is a case where the potential benefits of the data are both great and realistically attainable, and where the downsides are so unfeasible as to be worth discounting, at least at this stage. It’s now up to the carriers to explain this to their customers in understandable and honest terms.
There will be great battles worth fighting in the war over our personal data and its exploitation. This ain’t one of them.