Weekly Update

Cloud BI competition heats up while Apache projects move up

The holiday season news drought in data and analytics is definitely over. A bunch of announcements from commercial vendors, and others from the open source world, have hit in the last week, and two hit just yesterday.

Let’s start with Microsoft which, hot off its Windows 10 and HoloLens reveal, has turned its attention to machine learning and BI.  Microsoft dropped its first bombshells last week, announcing on Tuesday that it would acquire text analysis firm Equivio and on Friday announcing that it would acquire Revolution Analytics.  Revolution Analytics is the key commercial backer behind the Open Source R project and is the developer of its own R distribution, called Revolution R, which is specially designed to run in situ in distributed environments like Hadoop clusters and MPP data warehouses. Such co-location of R and analytics infrastructure minimizes data movement and improves performance, by parallelizing complex computing tasks among nodes in the cluster.

Power pricing
The acquisitions are huge news on their own, but on Tuesday, Microsoft upped the ante, announcing that the revamped version of its cloud-based Power BI suite, currently in free preview, would emerge from that preview with two tiers of service.  The fine print is bit confusing, but if I’ve understood it correctly, one of those tiers, Power BI Pro, will be billed at only $9.99/user/month; the other tier will be completely free of charge.  When you add to these extremely low pricing levels the fact that the new version of Power BI has eliminated all of its dependencies on (but not integration with) Excel, SharePoint and Office 365, suddenly Microsoft’s cloud BI play is lean, agile and very affordable.

Compare this to the pricing announced when Power BI launched last year, when fees were as high as $52/user/month and went absolutely no lower than the $20/user/month add-on for customers already paying big bucks for Office 365 E3 or E4 subscriptions.  What Microsoft had done previously was to bring an enterprise licensing mentality to the pricing of a cloud product, and seemingly had the modest goal of announcing it had a cloud BI offering.  What Microsoft has done now is to up its cloud BI game with the goal of winning.

Wisdom of crowds
And speaking of cloud BI, one of the best known pure plays in that space, GoodData, had an announcement of its own today. On the surface at least, the subject of the announcement seems to be very innovative technology.  Essentially, GoodData has used its pure play multi-tenant BI infrastructure to its advantage, and to the advantage of its customers. What GoodData calls its “Insights Engine” will now provide users with a guided experience in the Analytic Designer environment as they build visualizations.

The guidance is delivered in the form of smart defaults, based on best practices gleaned from observations of how its customers build effective analyses on their own.  Novice users can simply pick what attributes and metrics they want to analyze, and the Insights Engine will make intelligent decisions as to how that data should be visualized.

GoodData is essentially crowdsourcing BI design to help less advanced users overcome the burden of choice and the tyranny of the blank page. Any guided experience during visualization and dashboard design is a very good thing; for all intents and purposes, it’s self-service training. Combine that with the guidance being based on a model that is constantly and automatically improved, based on the wisdom of the users themselves, and it’s pretty slick solution.

And GoodData’s Data Explorer, which allows for self-service dataset acquisition and creation, without the need for IT support, rounds out the offering — which GoodData calls “Insights as a Service” — very nicely.  Ironically, Power Query, which performs the same function in Microsoft’s Power BI, used Data Explorer as its codename.

Instead of playing catch-up, and just trying to achieve parity with on-premises Enterprise BI suites, GoodData is using its unique model to create value and competitive distinction. An important question is how dynamic and automated the best practice models really are. If there’s a large degree of automation then GoodData has built a real virtuous cycle, where the assistance keeps getting better in helping the users, and this enables customers to become more skilled and thus help the models, and GoodData itself.

Apache pomp and circumstance
On Tuesday, the Apache Software Foundation announced that two more Hadoop-related projects — namely Samza and BookKeeper — have graduated to Top Level Project (TLP) Status.

Samza is a streaming data processing system that works on top of Apache Kafka and Hadoop’s YARN resource manager.  Samza is also “pluggable,” allowing Kafka and/or YARN to be swapped out for other publish/subscribe and execution engines.

Apache Storm, another stream processing project, uses Apache Zookeeper for the coordination of distributed processes.  Apache BookKeeper, was until recently a sub-project of Zookeeper. BookKeeper, which has now also reached TLP status, is a replicated logging service, logging events that take place across nodes, with the ability to play them back, in guaranteed order.

With so many Apache TLPs in the extended Hadoop family, the ecosystem can only grow.  The fact that these two projects (especially BookKeeper) are so focused and infrastructural just provides more evidence that 2015 will be a fit-and-finish year for Hadoop’s Enterprise-readiness.