Blog Post

Does your private data really need to be that private?

Stay on Top of Enterprise Technology Trends

Get updates impacting your industry from our GigaOm Research Community
Join the Community!

When it comes to medical or genomics data, the public good outweighs the benefits of keeping information private, said two academics speaking at the Big Data Privacy Workshop at MIT on Monday.

“I think most people fear death or the death of a loved one more than a loss of privacy,” said John Guttag, professor at MIT’s Computer Science and Artificial Intelligence Lab (CSAIL). In his view, patients or would-be patients would be well served to share their medical data — about hospital stays, treatments, procedures, etc. — in service of preventing things like the clostridium difficile (C. diff) infection.

Five percent of all U.S. patients suffer an infection unrelated to their admission and of those infections, C.diff is one of the most common, affecting 200,000 people per year, he said.

U.S. Secretary of Commerce Penny Pritzker speaking at MIT Big Data Privacy Summit.
U.S. Secretary of Commerce Penny Pritzker speaking at MIT Big Data Privacy Workshop.

To help figure out how patients get infected and to help avoid future problems, the anonymized information that most people talk about when trying to paint big data analytics as non-threatening — won’t work, Guttag said. Instead, scientists researching C. Diff need to know personal, identifiable information  — the patient’s zip code, hospital room number, names of roommates, who treated her and where, and dates of treatment.

That information can be used to help prevent future outbreaks. And if the right auditing mechanisms are in place, anyone who uses that data for a non-authorized purpose would be punished. Note: Others have already advocated for the donation of personal medical data for the public good.

White House Counselor John Podesta, who kicked off Monday’s event by phone from Washington, D.C. (his trip north was thwarted by weather –“big snow trumped big data,” he said) had a question for the panel: “We can’t wait to get privacy perfect to get going. What few things do we need to get right right now?”

Guttag said a uniform, standard process by which patients could give informed consent would be a good start. And he thinks something should be done about the Health Insurance and Portability and Accountability Act, which was meant to keep patient data private but to also assure secure sharing of that information between authorized parties.

“HIPAA is a problem and probably prevents useful things from happening — it would be great to pay attention to the tradeoffs. We underestimate our society — if people understood how valuable it would be to allow their data to be used for medical research, they would do it.”

Others said people who provide data should be protected from bad outcomes from non-condoned use of their information.

“My biggest concern is discrimination by algorithm, having someone make decisions about you based on your status profile — if you’re a risky driver, if you have a genetic predisposition to something,” said Sam Madden, an MIT CSAIL professor who specializes in mobile big data.

Safeguards must be put in place to either prevent or mitigate that possibility. It all boils down to transparency and an informed consumer. “We have to talk about people having visibility into what’s being collected and being able to say ‘I don’t want you to keep that data any more,'” Madden said.

Consumers beware

But for many consumers, the privacy horse is out of the barn, largely because of their own actions. Anyone who posts to Facebook (S FB) or Twitter(S TWTR) or any number of special interest websites is handing over their data to aggregators, said Michael Stonebraker, an adjunct professor at CSAIL and the database brains behind Vertica(s hpq), VoltDB, and Data Tamer.

“The question is tricky. We all use Waze to navigate traffic and it knows all about us. We’re volunteering data in return for personal benefit. Any governance of what Waze can do is a legal issue, not a technology issue,” Stonebraker said.

People also have to distinguish between privacy and “the illusion of privacy,” said Manolis Kellis, an associate professor at CSAIL.

“Every time you take your coat off you’re providing DNA data to someone,” he said. Data leakage is inevitable in the physical and virtual worlds, but “laws should protect us so we don’t have to hide our genomic data because we can be discriminated against,” Kellis noted.

Case in point: People can be tested for genetic predispositions to Alzheimer’s and other ailments, but many refuse to do so for fear that their insurers will cancel their coverage or jack up their premiums.

Trust? What trust?

The notion that individuals should hand over their medical data for research purposes is not new, and good arguments can be made for doing so.

But, in Monday’s session, U.S. Commerce Secretary Penny Pritzker and other speakers stressed the need for trust between consumers and businesses. And frankly, trust is a commodity in short supply these days given the Edward Snowden revelations of NSA data gathering and data breaches at Target and other retailers.

In response to a question, Podesta, who was brought back into President Obama’s inner circle in January in part to ride herd on data and privacy issues. said this work is separate from a review of U.S. intelligence surveillance practices.

Boiling all of this down, to me this means that even if you do trust medical researchers at MIT or Harvard or Stanford with your health data, you would be justified in worrying that the data could end up with someone else and used for non-medical purposes.

There’s a ton of work to be done before Guttag’s vision of shared medical data can come to fruition.

If you want to hear more on big data and big data privacy, check out our Structure Data show in a few weeks.

Panelists (from left): Michael Stonebraker; John Guttag; Manolis Kellis; Sam Madden; Anant Agarwal.
Panelists (from left): Michael Stonebraker; John Guttag; Manolis Kellis; Sam Madden; Anant Agarwal.

6 Responses to “Does your private data really need to be that private?”

  1. “I think most people fear death or the death of a loved one more than a loss of privacy,”

    Replace that with “I think most people want to be safe from the terrorists rather than have their privacy”, and you basically have the US government’s stance on a similar matter. This is the kind of rubbish we hear from the people who want to collect our personal information.

    These arguments have a long, long way to go to be proven right. The onus is on whoever wants to collect the data to really prove there is a need to do so (is this really going to help us as a society or me as a person?).

    I’m certainly a layman on big data with regards to healthcare, but I’ve yet to see (and the article doesn’t tell me) what exactly they would do with my data (well, I’m not American, but you catch my drift), why it cannot be anonymised.

  2. @nina you are right and some of the academics on stage talked about the need to cordon off different types of data. For example genomic and medical treatment data are totally different. the question is how much of this is a tech issue and how much a legal/policy issue.

  3. Barb – great article on the topic of medical information sharing and patient privacy rights. Based on my work at Accellion, we’ve all come to realize that not all data is created equal. Some patient information such as the room a patient resides in and the doctors/nurses who treat a patient is very different than the tests conducted and the diagnosis given. While it is enviable to share medical data with researchers so they can prevent the next outbreak of an infection, we don’t have enough standards and processes in place to ensure that data will remain with the parties who made promises to protect the data. Your data is only as secure as the weakest link in a parties’ network.

    Furthermore, people who provide data need be protected from bad outcomes from non-condoned use of their information. I certainly agree that a lot of work needs to be done before people will feel comfortable sharing their data for the greater good. Your academics share some compelling reasons to figure out the right mix of technology and legislation sooner than later.

  4. Madlyb

    Great look at both sides of the issue Barb.

    The only thing I would call out is that the vast majority of people do not understand how much data is being collected about them or more importantly how little pieces of data from a bunch of different sources can be put together to create very accurate profiles of you and your life and just like anything, it can be used for good or bad.

    If the road to hell is paved with good intentions, I think Data Science is going to be high speed bullet train, but we need to find ways to address these challenges because the rewards are quite high if we do.

    • I agree. most people should know more about how much they’re giving away when they use google maps, waze, twitter, facebook etc. I think that’s part of what stonebraker et al were saying here — get with the program people!