A study released today by a team of leading database experts, among them Structure 09 speaker Michael Stonebraker, has been generating buzz for its assertion that clustered SQL database management systems (DBMS) actually perform significantly better for most tasks than does cloud golden child MapReduce. But how shocked should we be, really? After all, choosing a parallel data strategy is not an all-or-nothing proposition.
Google built MapReduce to handle its particular needs, which are a far cry from the needs of most businesses. Database analyst Curt Monash told Computerworld that the study just reinforced his belief that MapReduce is better for limited tasks like text searching or data mining — you know, the things Google does on an epic scale. For tasks that require relational database capabilities at web scale, database sharding has become a favorite practice. I’ve heard Google itself uses SQL, MapReduce and/or sharding depending on the task. Companies like Aster Data Systems and Greenplum give companies the functionality of both MapReduce and SQL in one user-friendly package.
I think MapReduce (and its variants, like Hadoop) have received a lot of unnecessary adoration thanks to the fervor over cloud computing. Some people, it seems, associated Google, Yahoo and their web brethren with cloud computing, and thus surmised that in order to do cloud computing, you must do exactly what Google and Yahoo do. This, of course, is not the case. From a business perspective, cloud computing is just as much about saving money and making life easier as it is about doing massive amounts of computing. If you don’t have unique computing needs like the web giants, but just want eliminate the joys of owning and managing machines, there are plenty of clustered SQL solutions available in the cloud. Like most things in life, it’s just a matter of finding the right tool for the job.