Blog Post

Facebook Data Deleted After Lawsuit Threat

Updated: A researcher who collected data from more than 210 million public Facebook profiles and used it to create a rich picture of connections among users of the social network has deleted the entire database after being threatened with a lawsuit by the company. Pete Warden, who says he had expressions of interest from more than 50 scientists who wanted to use the information in their research, writes in a blog post that he was asked by the company to destroy it because he didn’t ask the site’s permission to harvest it — and that since he doesn’t have the funds to contest a lawsuit, he complied. He writes:

As you can imagine I’m not very happy about this, especially since nobody ever alleged that my data gathering was outside the rules the web has operated by since crawlers existed. I followed their robots.txt directions, and was even helped by microformatting in the public profile pages. Literally hundreds of commercial search engines have followed the same path and have the same data. You can even pull identical information from Google’s cache if you don’t want to hit Facebook’s servers. So why am I destroying the data? This area has never been litigated and I don’t have enough money to be a test case.

Warden used the data in a variety of ways, including creating visualizations of the different connections among users of the social network both in the U.S. and in countries around the world. We highlighted some of that research in this post, which showed how Warden’s analysis had come up with seven distinct segments of the United States when it came to being connected with others through Facebook, including areas he described with colorful names such as “Stayathomia.” He also put together a site that allowed users to sort the data by different cities and countries and see the connections among them (the site still appears to be functioning).

Warden says he complied with the requirements in the robots.txt file, which Facebook (and most other major sites) use to restrict crawlers and bots from harvesting certain information, but Facebook told New Scientist that the researcher breached the site’s terms of use. The threat may not stop the kind of research that Warden was engaged in for long, however. In his blog post, he points out to “the researchers that I’ve disappointed” that there are a number of ways to harvest similar data from other sources, which he described in a separate blog post, including the ability to collect a large dataset from public information on Google Profiles.

Update: Andrew Noyes, manager of public policy communications at Facebook, said in an email that Warden “aggregated a large amount of data from over 200 million users without our permission, in violation of our terms. He also publicly stated he intended to make that raw data freely available to others.” Noyes also noted that Facebook’s statement of rights and responsibilites says that users agree not to collect users’ content or information “using automated means (such as harvesting bots, robots, spiders, or scrapers) without our permission.”

Related content from GigaOM Pro (sub req’d):

Why New Net Companies Must Shoulder More Responsibility

12 Responses to “Facebook Data Deleted After Lawsuit Threat”

  1. There is nothing wrong with doing research if it’s public information, there are privacy settings. And if people don’t want to have their info used in research it shouldn’t be posted. I don’t think anyone was wrong here…Just bored people looking for something to complain about…

  2. People just dont understand what going on here, this is all about MONEY and clearly a demonstration of the ‘ugly side’ to money and power. Money gives you power and power and money translate to nefarious intent or self-serving restrictions!

    When facebook started there was none of the current restrictions and bullying that we see now, the restriction have even been extended to the actual users of facebook that ironically makes up facebook itself.

    When they started-off for instant you can have as many friends as possible but now users are been restricted as to the amount of friend requests they are permitted to make, which completely defeats the whole purpose of social networking.

    There are two aspects to social networking – ONE is networking with actual friends you already have and TWO making new friends. There some uppity people or users who would say this new stance of facebook is right in order to protect a users privacy but how many of the many current facebook users would raise their hands and honestly say they have never sent a facebook friend request to a stranger before? And if one things about it, how can a stranger who just wants to be your friend impinge on your privacy, all you have to do is not accept the request, their is no need for facebook to put in these restriction or any restriction for that matter, let the users decide if they want to accept friend requests from friends. I think the problem facebook has with this is just the increased traffic, which cost money to them. They would prefer you click on the adverts from complete likewise strangers instead of using the site to make new friends. What some users dont realise is that in actuality facebook are the ones willfully infringing their rights, to privacy and to their personal content, be it written words or photos! They let developer use your data and content in anyway they chose and turn a blind eye because it brings in revenue!

    Its all about money and the more money facebook makes the more they want and the more they want the more they will try to control users and small business entrepreneurs who tries to use data openly available to other multimillion dollar companies.

    In conclusion I believe that facebook has become too big even for themselves to manage their own intentions and wimps objectively and contructively and like everything that becomes too huge or too big or too tall, it will fall eventually, fall because of its own weight!

    Going back to the original points in this thread – this is just another example of the ‘big dog’ taking advantage of the ‘small dogs’, especially if the ‘big dog’ see that the ‘small dog’ is trying to threaten facebook’s current monopoly position on social networking.

    Facebook should own users data simply because they provide the space to store it or upload it no more so that the providers of free parking in cities dont own your cars if you decide to use their carparks.

    Controlling people never works indefinitely, social networking and attitudes towards social networks is constantly evolving and facebook should evolve with it by giving its users what they want not what they think they needd or eventually users will become turned off with facebook, like they eventually got turned off hi5 or the many other now archaic social networking sites of yesteryears!

  3. papsyface

    [Last comment didn’t seem to go through…]

    The post never mentioned the researcher having ‘no money’, but that he doesn’t have the _required_ amount of money to be a test case. If he were to go up against a multi-billion dollar company on such a grey area like this, it could end up costing both sides millions of dollars when everything is said and done. There’s no question that Facebook has that kind of money, but it’s pretty unlikely that he does.

    Anyway, how is the amount of money he has at all relevant to your comment? The amount of money someone has doesn’t define what they’re capable of. More often than not, it’s the opposite of that.

    In response to the actual post:

    I don’t think the guy did anything wrong. I also don’t think it’s fair that he’s being singled out. Do you really think that Yahoo, Google, Bing, AOL, HotBot, AllTheWeb or ANY of the other major search engines went to Facebook and asked for permission before indexing their site?

    I highly doubt it. I don’t remember any of them asking me if they could index MY site. Even if I had the same exact agreement as Facebook does, my site would still be crawled to fetch that agreement. There’s no way that the owners of SEs manually read every single site’s agreement that they index, they simply follow robots.txt and that’s that.

    There’s something very wrong here. IF all major search engines were required to fully read Facebook’s terms of service and contact Facebook to get permission, the site very likely wouldn’t have even half of the users that it does.

    What is it that he did that was so wrong? The fact that he was honest about it? What if he intended to create his own search engine? I’m sure if he had the money to fund it (or the support of someone who does), he’d have a damn good chance of winning.

    I do think it’s interesting that the same company who retains your data indefinitely after you’ve “deleted” your account has a problem with him keeping the data he collected.

  4. I think were missing the point. This goes much farther then what people imagine. The robots.txt file is just one of a few avenues that hackers have known for years. The problem is though that its the basis of much of what is done on the net and the basis also for what just can’t be changed on a whim. The laws out there will limit us and as we all know, whom we can’t control, we lock up or sue to death.