Blog Post

Will a Crackdown on Privacy Kill Big Data Innovation?

The issue of data privacy on the web gets a lot of attention, thanks to the practices of sites such as Facebook and Google, but the positive aspects of those companies’ data practices tend to get overlooked. Not only does data drive the overall experience of our favorite sites and services, but it also drives innovation in broadly valuable technologies such as Hadoop and advanced analytics tools. As the government and policymakers, in general, strive to strike a framework for online data practices, they’d be wise to look at the issue from all angles.

The McKinsey Global Institute released an interesting report on big data last week, identifying key strategies for specific vertical markets that could save them hundreds of billions of dollars. The report highlights industries (e.g., health care and the public sector) and technologies (e.g., Hadoop and data warehousing) that we’ve covered before and that already are big data stars, as well as one very important issue to the future success of big data efforts: finding the appropriate balance between consumer privacy and business innovation.

The authors don’t delve into too much detail on this topic, but I give them credit for mentioning it at all, because it’s a deep, multi-faceted issue that could fill an entire report of its own, and that has broad implications beyond the world of big data.

As the report’s authors note, policymakers will play an important role in enabling future big data advances, both technologically and strategically. They point out and briefly discuss six issues facing policymakers:

  1. Build human capital for big data
  2. Align incentives to promote data sharing for the greater good
  3. Develop policies that balance the interests of companies wanting to create value from data and citizens wanting to protect their privacy and security
  4. Establish effective intellectual property frameworks to ensure innovation
  5. Address technology barriers and accelerate R&D in targeted areas
  6. Ensure investments in underlying information and communication technology infrastructure

I’ve given this issue a lot of thought over the past few months, and I think No. 3 is the key issue — not just for the future of big data, but for the future of the web in general. Unless there’s a well-reasoned balance developed between consumer privacy and business interests, goals such as information sharing and an increased pace of innovation could fall victim to the federal government’s heavy hand. As I explained in January, Congress is considering its strategy for regulating online privacy, but it’s an issue strewn with pitfalls. Here are a couple of thoughts I’ve been mulling lately:

  • Proposed federal regulations could hamstring technological innovation: For example, two proposed federal regulations — the Federal Trade Commission’s Do Not Track policy (which has just been endorsed by several senators in the form of the “Do-Not-Track Online Act of 2011”) and the Department of Commerce’s Fair Information Practice Principles — have the potential to seriously hamper big data and analytics innovations, illustrating the importance of striking the right balance. The regulations are fairly complex in their current states, but they strive for two separate but interrelated goals, respectively: giving consumers the ability to proactively opt out of certain data-tracking practices and giving consumers all the information — upfront and crystal-clear — about how sites are using their data. Both limit to some degree what sites can track, how they can do it, and impose penalties for violations. My concern — and one echoed by Google in its recent opposition to California’s proposed Do Not Track legislation — is that customer data has driven the innovation of numerous key big data technologies by major web sites, including Hadoop (within Facebook and Yahoo, especially), NoSQL databases and many of Google’s tools and projects. McKinsey highlights many of these among the list of technologies enabling big data. Will putting companies’ analytics efforts at the mercy of consumers, and under the thumb of the federal government, reduce desire to innovate because they fear penalties or because they simply don’t have the relevant data required to do so?
  • Social media and the personalized could be jeopardized. This is directly related to the above concern, but is more wide-reaching. Social media sites such as Facebook, Twitter and Foursquare, and larger-scope web sites such as Google, innovate on big data technologies because their services rely on data. The only way to optimize and create a better user experience is to draw better insights into customers’ activities, interests and connections. And the only way (or, at least, the primary way) to make money from such services is via targeted advertising. It’s the data that drives Google’s huge advertising revenues, which pay for its myriad free services, and Facebook to an $80 billion valuation. I’m not suggesting Facebook or Google are going to fold in the face of proposed regulations, just that their services could suffer. Less data and more regulations means less innovation and fewer risks taken. This might be a boon for privacy, but it’s a hindrance in the fast-moving web world, where major changes come from rewriting code as opposed to physically building a new project, and where services can be improved on the fly as issues arise.

Don’t get me wrong, consumers deserve more information and the federal government is right to attempt to give it to them, but everyone needs to get educated on the connection between data collection and usage and the benefits they provide. If consumers value their social media and personalized web experiences, and if the government is serious about pushing analytics as a major skill set for the next-generation economy, they need to consider the issue of big data in terms of its pros as well as in terms of its obvious cons such as privacy and security implications. It might be tempting to clamp down on data practices or to click “do not track” and shut off the personal-data firehose, but such decisions could have far greater implications than meets the eye.

Image courtesy of Flickr user PaulHorner.

11 Responses to “Will a Crackdown on Privacy Kill Big Data Innovation?”

  1. Mars is absolutely right. Strong regulation will force more innovation and more benefit to consumers. Businesses may have access to less information, but so what? They’ll be forced to innovate in good ways rather than bad ones to get what they believe they need.

    Derrick, your argument is fundamentally flawed. There are several premises that you accept as axiomatic which are simply wrong.

    1. Who the data belongs to.

    My personal data is mine, and only mine. It doesn’t belong to Google. It doesn’t belong to Facebook. It is a fundamental right of privacy that I control to whom and for what purpose I disclose anything, and who is allowed to keep and use information about me. The fact that Google collected it doesn’t give them the right to use it, and I’d prefer that in all cases, companies weren’t even allowed to ask for information that wasn’t absolutely required.

    I couldn’t give a woot that Google’s advertising model is impinged upon. They’re selling something that belongs to me, and the fact that they already do it doesn’t make it right.

    2. If I can do it, I should.

    We could install surveillance cameras in the homes, bedrooms and bathrooms of every place in the country that people inhabit. That would give us tons of “big data” about lifestyle, habits, when in the day people do things, how and why water and electricity are being used, how long it takes people to get dressed, how much garbage individuals produce, cleanliness, etc. All of this information could provide “positive benefits” such as:
    – insights into how to improve conservation efforts (green)
    – safety (I’ve fallen and I can’t get up; thieves are breaking into my house)
    – health (what habits encourage people to drink too much or eat junk food)
    – economizing (how to waste less of everything)
    – help business to optimize production and sales
    – create new products and services based on observations

    The list is probably endless of great new conveniences we could have if we subjected ourselves to this level of intrusion. And, there are probably any number of entrepreneurs and business people who would be happy to consume that data, no matter how icky it is to be spying on people 24×7 a la VERY BIG BROTHER.

    But sorry, we are a free society, and most of us want and protect our privacy. I don’t want the toilet paper police counting how many squares I used. I don’t want my boss to know that I had one drink too many last night. I don’t want the creep down the street to be able to see what my daughter is up to at all hours of the day and night.

    There is no benefit that compensates for what I’m giving up. And, I happen to feel the same way about all the information that is collected and stored about me now. It’s just not anybody else’s business, unless I choose to make it so. It isn’t my responsibility to ensure that new (illegitimate) business models can be endlessly developed by spinning my private information in thousands of ways.

    Haven’t we already learned what the scale of consequences and costs to individuals can be when Sony is unable to keep Playstation data secure, or TJ Maxx has an unprotected wireless network and is broadcasting unencrypted credit card data through the ether? Why would you ever assume that the breaches, mistakes and malfeasance wouldn’t continue to grow in scale if we don’t protect what matters, “benefits” be damned?

    3. All users of big data are benevolent.

    Of course you don’t explicitly say this, but it’s implicit in your whole thesis. In fact, even if 90% or more of the usage was for our own good, the fact is, evil people and evil companies exist (Enron, anyone?).

    And government is the worst. There has never been an opportunity afforded government in the name of the public good that hasn’t eventually been abused. The kind of data you think should be out there is crippling, and it absolutely will eventually be used to control our lives and crimp our freedoms over time. Just like the Social Security Number which was never to be used by anyone but the IRS for processing taxes, but is now the principle vector of identity theft, and bad decisions made by institutions everywhere about whether I’m creditworthy, insurable, etc.

    4. There is a balance to be struck between consumer interests and business interests.

    Why? This is the biggest fallacy of your whole argument. I don’t need to balance my interests with anyone as long as I’m not interfering with their basic rights and freedoms. That’s a fundamental principle of this country, and frankly should be a basic recognized human right everywhere.

    No one has a “need to know” my personal data. And businesses simply have an obligation to live within the boundaries and regulations that are prescribed. They don’t have any rights to balance their commercial interest in something that properly belongs to me with my interest to keep it private. That is simply absurd on the face of it.

    Bottom line: we’ve done without business snooping on everything we do for millenia. No harm has come to us because of it. Moreover, there is greater business benefit in full-disclosure and full control being in the hands of the consumer. If I trust that my information will be protected and not misused, I’m a lot more likely to offer it in exchange for something I want. Today, I default to nobody gets anything because it’s just too dangerous. The stronger the restrictions, the better.

  2. Peter Quirk

    @Alex Robinson: Many sites are smart enough to detect you using multiple email addresses. Google, in particular, constantly prods you to link your multiple identities together on YouTube and some of its other properties. Identities get linked by your use of common data, ranging from credit card numbers, IP addresses, MAC addresses, age information required by COPPA, zip codes requested for localizing the advertising, news and weather delivered to you, secondary email addresses or mobile phone numbers requested to help you if your primary email profile is compromised, etc. While some of these may be insufficient to generate an exact match, multiple items will usually generate a very close match.

    Signing up for something with a new throw-away email address is increasingly difficult. How many sites do you see that only allow you to participate through FaceBook, OpenID or other cross-site authentication systems?

  3. An interesting article, however I don’t think keeping people’s personal data private is going to stop innovation. If people have the option to share data then the innovations can still happen, however the user is still in control of their data.

    My recommendation while all regulations surrounding online privacy is unclear is to take control of your own personal privacy by doing things like using a temporary email address rather than your real email address. It is easier that you might first think:

  4. Jack C

    Consumers should have the information to easily make an informed decision about the privacy trade-offs involved with a product. If a product has a compelling value, many consumers will continue to use it despite some losses of privacy.

    Let’s not fool ourselves here either. The fact of the matter is business has had a LONG time to get ahead of this issue and obviate the need for government intervention.

  5. Dante Lepreaux

    Some first thoughts, there seems to be an analogy with Science and Ethics here: There’re a lot of technological progress in fields such as Genetics, Bio Industry, et cetera that’s practically “stifled” because of the technologies involved and their possible unethical implication(s) on society. Just because you can, doesn’t always mean you should. Right?
    Furthermore the Industry itself is fully responsible for the current crackdown on Privacy: Opt-in should have been the default from the get-go and first time users should be given insight in the effects of their actions w.r.t. privacy settings.
    If they fail to come up with a fair and easy to understand code of conduct by self regulation on short notice, then I think we should be thankful that (federal) governments are here to impose a code of conduct by law. After all, they seem to have had no trouble to gather behind the IAB and create Online Ad Standards, so a IAB consumer-centred Privacy Code of Conduct shouldn’t be a problem either?

    • Derrick Harris

      To be clear, I don’t actually oppose federal regulation, I just want to ensure it’s done right. The issue of online data privacy is so wide-ranging and cross-industry, and every site has its own unique characteristics, that we probably shouldn’t expect any meaningful self-regulation.

      However, I think Google makes a good point about using data in new, but harmless, ways to create new features, and I don’t think we can deny that analytics and other big data efforts within large web companies have produced some very important technologies for the greater IT community. Whether it’s Do Not Track, FIPP or some other legislation/regulation, the trick is giving companies enough flexibility to innovate without bogging down creativity in red tape. In cloud computing, by comparison, many experts argued against standards too early on so that the industry could have some room to grow and evolve before adhering to any set requirements.

      I think the web makes this particularly difficult because of how fast software and systems can be written and amended compared with, for example, building a nuclear plant or cracking the genome, and because the primary natural resource, if you will, is personal data. For example, what’s not clear in any proposed regulations at this point is whether there’s a line between experimental and production features, and whether sites will have to obtain informed consent before even testing out new features that utlilize personal data in new ways. Or, can Facebook develop a new database technology that stores and analyzes user data without first complying with regulations? Advances, or even single projects, in other fields can take years as the regulatory process plays out; do we really want this for the web?

      One might also point to the result of the FCC’s crackdown on indecency, which arguably was an overly prudent approach to airing anything edgy because of confusion over what might be deemed a violation and uncertainty over what the penalty might be.

      Regulation is very likely the right thing, but there’s a lot to consider before putting them on the books. Maybe the discussion will open up more once the FTC or some other agency actually puts some out for public comment.

  6. “Don’t get me wrong, consumers deserve more information and the federal government is right to attempt to give it to them, but everyone needs to get educated on the connection between data collection and usage and the benefits they provide.”

    And at the moment, the balance is probably about right. But what the advertising industry – of which the likes of Google and Facebook are a part – are railing against is restrictions of what they could do, but aren’t doing yet.

    People do by and large understand the current trade-off between privacy and data mining. But what they’re not prepared for is to allow companies the completely free reign they want with personal data.

    And if their business models don’t allow them to make money from that, they need to find new business models – that, after all, is what we constantly tell the music industry.

  7. Echo what Mars says below. I know I can go into any bar or shop in the world and buy a Coca Cola. I TRUST that it will taste right, be safe and I tip my hat to one of the great US brands. Facebook or any other social network can’t bully its way to that position. Ever. Opt In as a default should be law and if that means they have to take a step back to take a leap forwards then so be it.

    Just because 700m people like a world where they can be connected (congratulations Mark) it doesn’t mean the business model is the right one. If the assumption is that our digital home is the same as our physical one, then the money is in the commerce layer for Facebook not advertising which would be far greater in efficiency, richness and abundance if we were leasing our walls to advertisers with a payment to the house. If we were getting paid for it, then I’m sure we’d willingly tell people the things we like and don’t like and even perhaps register when our insurances were due, car leases expired etc. etc. That drives competition and benefits the consumer.

    I also think that this tech roadmap is suffocating aspiration because all brands are being diluted to crowd level and the media heads to the tech long tails are being so devalued. (but that’s another story)

    In the absence of an adjustment though, I worry were all missing an ist off social.

  8. This is silly. If anything, more privacy will drive innovation, not the other way around.

    Either the public is simply ‘cattle’ to be herded and delivered what is “best for them” or (and this is a radical idea…) we give people the freedom of privacy putting the onus back on biz to sell to us reasons to ‘Opt-In’ to their services with a clear understanding of that agreement, not some 80 page lawyer’d up document full of gobbly-goop nobody reads.

    Just because we as the public demand more privacy does not mean we will stop trusting biz with all our information, rather, I believe if we have greater control of our privacy, we would likely have greater trust in sharing with those who respect our privacy.

    • Derrick Harris

      I agree that consumers need choice, but I also know that the government has somewhat of a track record of imposing somewhat-stifling regulations. My concern isn’t so much with the ideas of the proposed regulations, but rather with the details of how they’re ultimately written and enforced, and with the dearth of any real education of everyday web users as to how their data is being used and why. Privacy control is good *and* data collection and analysis actually has many benefits to go along with the oft-publicized problems — that’s what I think needs to be more clear so policymakers and consumers can make the right decisions.

  9. Peter Quirk

    How about adding a requirement that web sites function when you block tracking? I’m running IE9 with the “Do-Not-Track” feature and many sites just don’t work until you enable tracking. If these companies get around the Do-Not-Track legislation by making the internet go dark, everyone loses.