Blog Post

Look, IBM is doing SQL on Hadoop, too

Stay on Top of Enterprise Technology Trends

Get updates impacting your industry from our GigaOm Research Community
Join the Community!

Maybe this is just news to me, but IBM has a SQL-on-Hadoop product in the works called Big SQL. The company announced the technology preview version in March (well under my radar and, from what I’ve seen, nearly everyone else’s radar), and is offering up a cloud-based demo environment for a select group of early users.

As a refresher, the big difference between SQL on Hadoop and the Hadoop connectors that were popular a couple years ago is that SQL-on-Hadoop products query the data where it resides — in HDFS or HBase — rather than pulling it into a relational database environment to analyze it. We have been talking for months about the emergence of a large SQL-on-Hadoop market, but IBM’s name was conspicuously absent from that discussion. The company has Hadoop software called BigInsights and lots of SQL expertise, so it only made sense that IBM would get into the game at some point.

Details on Big SQL are still pretty sparse save for a few high-level blog posts and an instructional video (embedded below), but it looks to take the standard approach, as Cloudera is doing with Impala, of enabling access through traditional tools via JDBC and ODBC drivers.

Ultimately, I think the advent of big data will enable some new types of querying techniques quite a bit different than the SQL queries we’ve come to know and love over the past couple decades. But SQL is still the language du jour and might never go away, so there’s a lot of value to be had if people can put their SQL skills to work on data stored inside Hadoop or other environments, and if companies can work toward a nirvana where all the data is stored in a single place rather than across database environments.

That IBM got this message and got into the game isn’t surprising at all, but it is important. Lots of large companies buy IBM’s software.  If it wants them to follow it into the world of big data and Hadoop, it has to give them the tools they need to use it.

The YouTube ID of DCWig4-h1F4?feature=player_detailpage is invalid.

4 Responses to “Look, IBM is doing SQL on Hadoop, too”

  1. didihaveaname

    Doesn’t make sense to me. You are still relating the tables or like things. Is SQL – Structured Query Language best for this ..I mean from unstructured data to simply structure into rows, columns…I don’t think so. Producing JSON/XML is something a different ballgame…we know that SQL is not the tool for it.
    We had SQL because we had relational data.

  2. It’s interesting to see all this news coming out around SQL on Hadoop, as SQL is in Hadoop. Check out the Stinger initiative within the Apache Foundation. It improves Hive’s performance and better serves business intelligence use cases such as interactive data exploration, visualization and parameterized reporting.

    • LearningDBA

      Asking — but I think Hive and SQL are the same right? It is SQL-like, but things like Windowed aggregates and more you can’t do. So Stinger is about making Hive faster. BigSQL is about bringing SQL (it also has the ability to decide to run a job in MR if it is big enough that it would be beneficial; the approach of most is to bypass MR because it’s slow, and most of the time this is likely the case).