Cloudera says Impala is faster than Hive, which isn’t saying much

Source: Cloudera

Hadoop vendor Cloudera is singing the praises of its own SQL query engine, releasing on Monday the results of a benchmark ┬áthat shows how Cloudera Impala compares to Apache Hive and a mystery proprietary database. As one might expect, Impala easily bested its competitors in the benchmarks (no vendor has ever, to my knowledge, released results highlighting its product’s inferiority), but Hive and SQL databases probably aren’t Impala’s real rivals.

Its more-direct competition comes from other Hadoop vendors doing their own things to try and make Hadoop queries faster and more interactive. Because the choice right now isn’t to Hadoop or not to Hadoop, it’s which flavor of Hadoop to do. Companies that are using Hive are already using Hadoop, so that decision has been made. And even Cloudera — unless its stance has shifted drastically — acknowledges that Impala isn’t yet a replacement for a purpose-built data warehouse or relational database systems.

(Although, a future where Hadoop vendors do actively try to upset the database market would be interesting. Maybe we’ll get a sense of how realistic during sessions with the CEOs of Cloudera, Hortonworks and Pivotal at Structure Data in March.)

Source: Cloudera

Impala vs. “DBMS-Y.” Source: Cloudera

If having some degree of interactive SQL queries is important to users, they’ll likely be comparing one Hadoop distribution to another on this front. So while Cloudera is smart to position the choice as being between Impala, Hive and DMBS-Y (“one of the top 5 commercial MPP query engines on the market,” a Cloudera spokesperson confirmed), the more relevant comparison is probably between Impala and the Hortonworks-backed Apache Stinger/Tez, Pivotal HD Hawq, Presto (on Qubole), the MapR-backed Apache Drill, Hadapt, IBM BigSQL, Shark … you get the point.

For what it’s worth, everyone is faster than Hive — that’s the whole point of all of these SQL-on-Hadoop technologies. How they compare with each other is harder to gauge, and a determination probably best left to individual companies to test on their own workloads as they’re making their own buying decisions. But for what it’s worth, here is a collection of more benchmark tests showing the performance of various Hadoop query engines against Hive, relational databases and, sometimes, themselves.

Impala vs. Hive

Source: Cloudera

Source: Cloudera

Stinger/Tez vs. Hive

Source: Hortonworks. Check out this blog post for more details.

Source: Hortonworks. <a href="">Check out this blog post</a> for more details.

Pivotal HD Hawq vs. Impala and Hive

Source: Pivotal. Check out this whitepaper for more details.

Source: Pivotal. <a href="">Check out this whitepaper</a> for more details.

Shark vs. Impala, Hive and Amazon Redshift

Source: AMPlab (UC-Berkeley). Check out this blog post for more details.

Source: AMPlab (UC-Berkeley). <a href="">Check out this blog post</a> for more details.

You're subscribed! If you like, you can update your settings


Comments have been disabled for this post