Twitter today open-sourced the code that it used to build its database of users and manage their relationships to one another, called FlockDB. The move comes shortly after Twitter released its Gizzard framework, which it uses to query the FlockDB distributed data store up to 10,000 times a second without creating a logjam.
The code was posted last night on GitHub, although as Twitter developer Nick Kallen writes (under a “warning” and a “what the hell is this?”) label:
This is in the process of being packaged for “outside of twitter use”. It is very rough as code is being pushed around. please forgive the mess.
This is a distributed graph database. we use it to store social graphs (who follows whom, who blocks whom) and secondary indices at twitter.
Still, ahead of Twitter’s Chirp conference this week — and in the wake of moves that may alienate some of the popular client applications through which many access Twitter — the company has released code that may improve the web for all. In a GigaOM Pro piece published over the weekend (sub. req’d), Derrick Harris said:
Twitter’s newly open-sourced Gizzard tool seems to have promise, as well. By eliminating some pain from the often difficult sharding process, Gizzard makes it easier to build and manage distributed data stores that can handle ultra-high query volumes without getting bogged down. Like Google, Yahoo and Facebook before it, Twitter has played a role in evolving how we use the web, and software developed within its walls should be a hot commodity for present and future Twitter-inspired sites and products.
Simply because of the number of users and the scale of its service, Twitter is solving problems that many other web-based startups hope they will have one day. So now I’m back to wondering if FlockDB and Gizzard will join the ranks of Hadoop or Cassandra as open-source solutions for managing data at webscale.