Blog Post

Hortonworks’ effort to speed up Hive is coming along nicely

Stay on Top of Enterprise Technology Trends

Get updates impacting your industry from our GigaOm Research Community
Join the Community!

Hortonworks is making progress on its mission (via a project called Stinger) to speed up SQL-like queries in Hadoop using Apache Hive. New features in the latest version of Hortonworks’ Hadoop distribution have improved Hive performance tens of times in some instances, and the company is aiming for 100x improvements soon. Hortonworks has also added support for new types of SQL data. Competitor Cloudera opted to forgo Hive in favor of its own Impala technology for interactive queries.

5 Responses to “Hortonworks’ effort to speed up Hive is coming along nicely”

  1. Amr Awadallah

    Derrick, I would like to clarify something: it is incorrect to say that Cloudera forgoed Hive. Cloudera continues to actively commit code and support Hive (not to mention that Hive was the brainchild of Jeff Hammerbacher who is one of Cloudera’s co-founders). What we decided (based on reading the Tenzing/Dremel tea leafs from Google) is there are two different design points in this space: (1) a system focused on hi-throughput long-running batch jobs (that is Hive/Tenzing), and (2) a system focused on low-latency interactive queries (that is Impala/Dremel). We continue to support both systems, they are both important but for different kinds of workload goals. The design points are very different though, it is like designing a race car which is really about finishing the race as quickly as possible at highest speed regardless of fuel-consumption (or car tear and wear), versus designing a commute car which is really about traveling very long distances at consistent speed with the most efficient fuel consumption (and low probability of breakage). It is very hard to design a car that meets both of these design goals at same time, and similarly it is very hard to design a query system that can both handle low latency interactive workloads, and long-running high-throughput jobs. The Stinger initiative will be able to significantly improve the bandwidth of Hive jobs (which will benefit Cloudera’s distro), but I doubt they will be able to get the latency of Hive to be competitive with Impala.

    Thus, it is my [biased] opinion that with Cloudera you are guaranteed to get the best of both worlds.


    — amr