Facebook is on an open source roll lately, and on Thursday announced its latest open source project — an embedded key-value store called RocksDB. The company uses it to power certain user-facing applications that would suffer too much from having to access an external database over the network and to eliminate the certain problems relating to non-fully utilized IO performance on flash storage devices.
Facebook database engineer Dhruba Borthakur describes the design of and rationale behind RocksDB in some detail in a blog post, but the biggest factor leading to its creation might be the emergence of relatively inexpensive flash storage cards for servers (or, in Facebook’s case, custom-built servers packed entirely with flash).
“With the advent of flash storage, we are starting to see newer applications that can access data quickly by managing their own dataset on flash instead of accessing data over a network. These new applications are using what we call an embedded database.
“… When database requests are frequently served from memory or from very fast flash storage, network latency can slow the query response time. Accessing the network within a data center can take about 50 microseconds, as can fast-flash access latency. This means that accessing data over a network could potentially be twice as slow as an application accessing data locally. “
RocksDB was designed with these new hardware realities in mind, so it can take full advantage of the IOPS potential of flash memory as well as the computing power of many-core servers, Borthakur explains. Facebook has posted the results of a benchmark test running on a Fusion-io-powered server on the RocksDB GitHub page, and claims it’s significantly faster than Google’s LevelDB embedded key-value store.
From a broader IT perspective, RocksDB signals that the shifts in storage and computing economics that made the big data movement possible are now making their way into web application development, albeit using a storage media most organizations would consider using for storing “big data.” Facebook is performance hungry, but it’s also cost-sensitive, and it wouldn’t be storing “close to a petabyte of data across different applications,” as Borthakur writes, if the cost to do so was out of control.
He offered a handful of application types an embedded database like RocksDB is suitable for, including:
1. A user-facing application that stores the viewing history and state of users of a website.
2. A spam-detection application that needs fast access.
3. A graph-search query that needs to scan a data set in realtime.
4. RocksDB can be used to cache data from Hadoop, thereby allowing an app to query Hadoop data in realtime.
5. A message-queue that supports a high number of inserts and deletes.
In fact, Facebook has been finding all sorts of new ways to utilize flash as stepping stone between slow disks on one hand and expensive-but-fast RAM on the other.
Facebook is no doubt an early adopter of flash-heavy application architectures, but it’s also probably serving as a guiding light for other companies and their developers who want to achieve Facebook-like performance. As flash prices continue to drop — and now that Amazon Web Services is offering a whole suite of flash-backed instances on EC2 (the prices of which should also drop) — it’s conceivable we’re approaching an era of ever-better web and mobile applications that communicate with the network and the hard drive as little as possible.