Ask a venture capitalist about big data and she will probably tell you about visualization. Only it won’t be visualization in the usual sense — the pretty charts and graphics that result from traditional business-intelligence software. Instead, it will be about visualization of the user interface. We’re talking about strikingly intuitive UIs that let users visually work with data using charts and tools instead of with algorithms and code. It’s hard work to do right — especially when you’re talking about massive data sets and complex computations — but the payoff could be huge.
Competition is so hot that if a startup has a good technology, it probably has its pick of investors. Some companies in the space are even turning away investors on a regular basis. What everyone realizes is that although big data is a big opportunity any way you slice it, it’s a huge opportunity if run-of-the-mill business users can leverage the underlying data sets as easily as data scientists can. Actually, the goal is to make it easier.
Why UIs matter
The UI has to be the connection between big data and business users, period. Those employees know what data is important to them, and they might know how to work with a traditional BI tool and a spreadsheet. But they don’t know MapReduce, data integration, applied mathematics or any of the myriad other skills necessary to really do big data right. And depending on what they’re doing, they probably don’t need to. What they want is something that looks like a Google product but acts like Hadoop.
The alternative to a user-friendly UI is that the IT department or engineering team becomes the conduit between the business users and the data. Selling big data software is already difficult, because customers don’t know which budget it comes out of. And it only complicates things when business users typically have to go through IT to get answers. If they’re expected to act on data in near real time, while it’s still relevant, business users must be able to slice and dice their data at their own pace. In other words, the absence of a user-friendly UI can not only wreak budgetary havoc but also slow productivity.
The state of the art is still coming
How well the current collection of products boasting user-friendly big data UIs measures up depends a lot on who’s using them. For a lot of users, a spreadsheet might do just fine. On Wall Street, for example, all roads go through Excel. That’s what makes Microsoft’s efforts to build an ODBC driver for connecting Hadoop and Excel such a big deal, and it explains why Hadoop startup Datameer has raised nearly $12 million.
If companies are willing to house their data in the cloud, 1010data takes the spreadsheet to the next level. There’s also a world of products targeting SQL-savvy users and everyday coders who want to use their skills against big data sets. These range from the data-warehouse-Hadoop-hybrid Hadapt to cloud services such as Google BigQuery and Kontagent’s DataMine, which presents users with a gussied-up Apache Hive interface to mine social and mobile data. For coders, there are services such as the Infochimps Platform with its Ruby-based Wukong tool or Mortar Data and its Python-plus-Pig approach.
But spreadsheet- and SQL-based approaches are limited in function. Code-based approaches can be more powerful, but they’re limited in skilled users. To really hit the broad range of potential users, you need something even easier. Right now, that’s where next-generation BI players such as Tableau come into play. Tableau CEO Christian Chabot has described his company’s product-development process as being akin to Apple’s, and you can see it in the result.
It’s not perfect, but Tableau definitely fills a gap in the analytics market. The product connects to numerous data stores — ranging from spreadsheets to data warehouses to Hadoop — and applies properties to the data therein based on their characteristics. If they know a little bit about data, novice users can then drag and drop values onto a palette and watch a chart take shape as the software automatically runs the appropriate functions in the background. In theory, the software also intelligently chooses the right chart type (and colors) for the given data set. Already, Tableau is proving useful among users who aren’t necessarily associated with BI, including doctors and journalists.
Tools like Tableau and its primary competitor QlikTech (or the big-data-connected Pentaho and Jaspersoft) are still just BI tools. They make analytics simpler, but they don’t fundamentally change the experience. That is, although they can connect to big data sources such as Hadoop or Teradata, they are designed for structured data and don’t actually perform big data analysis beyond doing typical SQL-like queries of the type Apache Hive enables.
A UI so simple a history major could use it
However, where Tableau leaves off, startups such as Platfora and ClearStory step in. They’re still early in their development, but they have grand goals. When Platfora founder and CEO Ben Werther explained his company’s ultimate goal upon its launch in Sept. 2011, he said it was to create an interface so intuitive that a history major (presumably with no computer science background) could use Hadoop to query large data sets and answer complicated questions.
Platfora is building an analytics product that sits between a Hadoop cluster and the user and provides an entirely new way of analyzing data. It’s a visual UI that Werther says is two generations beyond Hive, something that makes child’s play out of both queries and MapReduce workloads.
Only, it’s difficult to describe much more about what Platfora is up to, because the company is still working hard on actually building the product. The team Platfora has put together illustrates the level of complexity in engineering the product and also the type of user experience it hopes to enable. Werther himself comes from commercial Cassandra entity DataStax, and other key employees come from places such as Google, IBM and Stanford. He has three Ph.D.s on staff and a visual designer who has worked on a laundry list of blockbuster movies. They need to figure out the perfect UI but also perfect the connection between the UI and Hadoop to make the back end interact with the front end seamlessly.
When it’s complete, Platfora will be sold as enterprise software that sits atop a company’s Hadoop cluster and empowers business users to do things they couldn’t do before. They will still have their Teradata and Oracle databases to perform their legacy analytics tasks, but they will also have Platfora there to really dig into the mountains of unstructured data sitting inside Hadoop.
Slicing and dicing through the cloud
ClearStory is another startup trying to make analytics a visual experience, only it’s not limited to big data. Its product is a cloud-based service that connects to a large number of internal (e.g., relational databases, Hadoop) and external (e.g., data markets, social media, licensed data) data sources, and it lets users analyze and combine them at will. It was inspired by the co-founders’ time at Aster Data Systems, where they said customers would incorrectly attempt to use Aster as a hub for all of their data and then throw a BI layer on top of it for visualization.
The service is still in the design stages and is being put through its paces by beta users, but it aims to let anyone who isn’t an expert start slicing, dicing and comparing multiple data sources in a guided manner. They can analyze internal sources against external sources and discover trends in real time. A publisher, for example, might be able to compare page views against Twitter trends to determine which stories to promote, or a marketer might be able to find that certain products sell better in cities with certain characteristics.
Similarly to Platfora, getting ClearStory’s front end to interact smoothly with the service’s back end is a big challenge. Another one is just figuring out the small stuff, such as how many screens users have to go through to achieve any given result. That kind of attention to detail, co-founder and CEO Sharmila Mulligan told me, is critical to creating a turnkey experience that lets everyday business users “in the first day of use, be able to have an impact.”
Making data accessible to the masses means making sure it’s as easy, relatively speaking, as using Instagram.
What’s next: a secretary crunching numbers
When average users can actually delve into data and run complex analyses without calling for help and when business units can purchase software or services with minimal consideration for how they affect IT’s budgets or time (they’ll still need to connect to Hadoop and other internal data stores, of course), it will be a mini-revolution in the big data movement.
It will also be just a first step. Platfora and ClearStory are just the first (both are slated for some level of public consumption in the fall) — and most high-profile — attempts to rethink the process of analyzing data in the big data age. If companies are going to become truly data-centric and let data help guide them at every level of the business, analytics must become commonplace at every level. The easier and more robust that experience becomes, the faster that will happen.