7 Comments

Summary:

Twitter today open-sourced the code that it used to build its database of users and manage their relationships to one another, called FlockDB. The move comes shortly after Twitter released its Gizzard framework, which it uses to send thousands of queries a second to FlockDB.

Twitter today open-sourced the code that it used to build its database of users and manage their relationships to one another, called FlockDB. The move comes shortly after Twitter released its Gizzard framework, which it uses to query the FlockDB distributed data store up to 10,000 times a second without creating a logjam.

The code was posted last night on GitHub, although as Twitter developer Nick Kallen writes (under a “warning” and a “what the hell is this?”) label:

This is in the process of being packaged for “outside of twitter use”. It is very rough as code is being pushed around. please forgive the mess.

This is a distributed graph database. we use it to store social graphs (who follows whom, who blocks whom) and secondary indices at twitter.

Still, ahead of Twitter’s Chirp conference this week — and in the wake of moves that may alienate some of the popular client applications through which many access Twitter — the company has released code that may improve the web for all. In a GigaOM Pro piece published over the weekend (sub. req’d), Derrick Harris said:

Twitter’s newly open-sourced Gizzard tool seems to have promise, as well. By eliminating some pain from the often difficult sharding process, Gizzard makes it easier to build and manage distributed data stores that can handle ultra-high query volumes without getting bogged down. Like Google, Yahoo and Facebook before it, Twitter has played a role in evolving how we use the web, and software developed within its walls should be a hot commodity for present and future Twitter-inspired sites and products.

Simply because of the number of users and the scale of its service, Twitter is solving problems that many other web-based startups hope they will have one day. So now I’m back to wondering if FlockDB and Gizzard will join the ranks of  Hadoop or Cassandra as open-source solutions for managing data at webscale.

Image courtesy of Flickr user Tim Morgan

  1. FlockDB is an interesting new database if you have “one-level graphs” and need massive distributed scale. If you want a graph database that supports infinite level graphs, that’s very mature (24/7 production since 2003) and with distribution in the progress (first replication for high availability and read-scale, then auto-sharding with clustering algorithms), then feel free to check out Neo4j:

    http://neo4j.org

    -EE

    Share
  2. [...] developer’s conference in San Francisco, Twitter on Monday made to bold new moves: it it threw open its database of user relationships, and introduced a new advertising [...]

    Share
  3. [...] and actions from user streams. The company has also made its developer resources much better and opened up some of the technologies it’s used to scale. Sarver said Twitter won’t try to treat [...]

    Share
  4. [...] Twitter developer’s conference in San Francisco, Twitter on Monday made two bold new moves: it threw open its database of user relationships, and introduced a new advertising [...]

    Share
  5. [...] Facebook and others are seeking ways to stay on top of the real-time flow of information and offering their own efforts to the open source community. [...]

    Share
  6. [...] and look at all those followers, and so on. It’s easy to see how the number of relationships can become incredibly large. But the COSI research proves it’s possible to get meaningful data out of that giant pool of [...]

    Share
  7. [...] Facebook and others are seeking ways to stay on top of the real-time flow of information and offering their own efforts to the open source [...]

    Share

Comments have been disabled for this post