7 Comments

Summary:

A pair of slices from a massive scrape of Twitter’s API could be of great use to programmers and researchers alike — as long as users don’t mind. The company behind the mining effort, Infochimps, is trying to demonstrate and promote its data aggregation service while […]

main_logoA pair of slices from a massive scrape of Twitter’s API could be of great use to programmers and researchers alike — as long as users don’t mind. The company behind the mining effort, Infochimps, is trying to demonstrate and promote its data aggregation service while offering up some useful information to interested parties.

At the end of last year, Infochimps posted a heftier version of its scrape of Twitter, which was taken down at the behest of the micro-messaging site over user privacy concerns. By releasing curated, anonymized chunks of data, the company may avoid most of the user privacy concerns that arose last time around. Then again, it may not.

One of the sets, a “token count,” adds up the number of particular tokens (individual hashtags, smileys and URLs) that have been tweeted since March 2006. The other links the ID strings between Twitter’s Search API and the standard Twitter API. The two APIs issue different ID numbers to users, which makes it annoying, if not impossible, for developers to link data across both services to one user.

Infochimps says it hopes “to send a signal that this data is valuable and useful to real-time search engines, Twitter apps, and social media researchers.” It also hopes to “start a conversation about where value really lies in this type of data, [and] the various ownership and privacy issues that arise.” Given the complaints from Twitter the first time data was posted, it’s a smart move on the part of Infochimps to add this disclosure and thoroughly anonymize the data. The company very much wants to avoid any sort of ill will or backlash from the Twitterati over the release of the data sets. Back in 2006, AOL Research released 20 million search keywords attached to user IDs for researchers to use. A number of individuals were identified as a result of the “anonymized” data, leading to a number of concerns over what sorts of data are kosher to be released.

Ownership and privacy aside, Infochimps is offering the “tokens” data set broken out by month for free, and $9,500 for a version broken out by hour. The “ID/API mapping” data set is being offered for $6,000.

You’re subscribed! If you like, you can update your settings

  1. Twitter data update | blog.infochimps.org Monday, November 16, 2009

    [...] the Twitter data was a great success, and we thank Marshal Kirkpatrick at ReadWriteWeb (also) and Jordan Golson at GigaOm for their coverage. The community reaction has been overwhelming and energizing. We accomplished [...]

  2. gut das ich kein twitter nutze… ;-)
    mehr Informationen hierzu findet man unter Timstyle.de Mobile datenloesungen VPN zumindest was zum lesen.

  3. A podcast with Flip Kromer of InfoChimps… and the end of an era | Paul Miller – The Cloud of Data Thursday, December 17, 2009

    [...] Is Infochimps’ Aggregated Data a Boon to Researchers or a Privacy Nightmare? (gigaom.com) [...]

  4. A podcast with Flip Kromer of InfoChimps… and the end of an era | CloudAve Thursday, December 17, 2009

    [...] Is Infochimps’ Aggregated Data a Boon to Researchers or a Privacy Nightmare? (gigaom.com) [...]

  5. 10 Austin Startups You Should Meet While You’re at SXSW – GigaOM Tuesday, March 9, 2010

    [...] — I love this startup because I love anything that makes access to data easier. Infochimps aggregates and then licenses data sets in formats that folks can then use to create new apps, demographic models or whatever; public data [...]

  6. Microsoft Wants to Build Its Business With Data Thursday, May 13, 2010

    [...] explore Microsoft’s efforts as well as those of a startup called Infochimps, which is also building a data marketplace, in a research note over on GigaOM Pro (sub req’d) [...]

  7. Microsoft Wants to Build Its Business With Data Thursday, May 13, 2010

    [...] explore Microsoft’s efforts as well as those of a startup called Infochimps, which is also building a data marketplace, in a research note over on GigaOM Pro (sub req’d) [...]

Comments have been disabled for this post