Blog Post

Calvin: A fast, cheap database that isn’t a database at all

Yale researchers Daniel Abadi and Alexander Thomson A team of Yale researchers think they have developed the cure for Oracle (s orcl) and IBM (s ibm) dominance in the world of database performance, and it isn’t even technically a database. In a blog post Wednesday morning written by team members Daniel Abadi and Alexander Thomson (and in a related research paper), the two researchers detail Calvin, a “transaction scheduling and replication coordination service” that they think can level the playing field between high-cost distributed relational databases and less-expensive, but limited, NoSQL and NewSQL databases.

Abadi and ThomsonThe researchers aren’t dismissing either NoSQL or NewSQL, but rather attempting to address the type of use case on which the popular TPC-C database peformance benchmark is based. That benchmark, which simulates an online retail application, requires ACID compliance — which NoSQL options can’t meet — and the ability to update records across database shards in the same transaction — something the authors claim NewSQL databases can’t do.

Why not just stick with Oracle Database and IBM DB2? Cost, especially at scale. As Abadi and Thomson point out in the blog, an Oracle system capable of handling 500,000 transactions per second costs $30 million in hardware and software expenditures.

So, what is Calvin? In a nutshell, it’s software that sits above above a scale-out storage system and turns it into a transaction-processing system by capturing, scheduling and executing transactions. Here’s how Abadi and Thomson describe it in the blog post, allthough the paper goes into much more detail.

Calvin requires all transactions to be executed fully server-side and sacrifices the freedom to non-deterministically abort or reorder transactions on-the-fly during execution. In return, Calvin gets scalability, ACID-compliance, and extremely low-overhead multi-shard transactions over a shared-nothing architecture. In other words, Calvin is designed to handle high-volume OLTP throughput on sharded databases on cheap, commodity hardware stored locally or in the cloud. … Calvin allows user transaction code to access the data layer freely, using any data access language or interface supported by the underlying storage engine (so long as Calvin can observe which records user transactions access).

Calvin, the researchers claim, can match Oracle’s 500,000 transaction-per-second performance running on commodity servers on Amazon EC2. The cost of the resources to run their benchmark was only $300. (Although, obviously, that doesn’t account for the cost of running the system continuously for years, potentially. Commodity physical hardware might be a better bet in the long term.)

Ultimately, Abadi and Thomson the researchers conclude, for transactions that can execute entirely on the server side, Calvin could be the foundation for an end to the current OLTP regime. The world certainly is hungry for something that can do what Oracle and IBM can do, but that costs what NoSQL databases cost (i.e., nothing, often). And Abadi has some distributed database street cred — the HadoopDB project he led is the foundation of Hadapt’s Hadoop-and-data-warehouse hybrid — so, especially if it’s open sourced, one can’t dismiss Calvin out of hand.

Feature image courtesy of Shutterstock user Semisatch.

2 Responses to “Calvin: A fast, cheap database that isn’t a database at all”

  1. Doron Levari

    Great reading!
    Disclaimer: I work for ScaleBase.
    I agree with your perception about ACID. NoSQLs do not give ACID and NewSQLs should by all means to give it, otherwise it’s not apple and apples with oldSQLs right?
    ScaleBase is not a NewSQL, we enable scale-out over numerous of standard MySQL databases, distribution and parallel processing and naturally we support full and complete ACID across all nodes. Our TPCC-like benchmark (DBT-2) in the cloud, AWS RDS, 1-14 database servers shows great linear scalability while maintaining ACID! See here: and address me directly (see my blog: for any more details.

    The discussion around RDBMS scale out is very interesting, from what I’ve seem it has a lot to do with distribution of the data and also the computing power. Seems like Calvin does exactly that on a distributed storage and commodity servers. Looking good!