During my week in San Francisco for the Gigaom Structure conference last week, and continuing the early part of this week, a slew of important announcements were made in the Big Data and analytics space. While the volume of announcements would normally be a lot to cover, we can save some time by breaking the announcements into groups: data visualization favorites on new platforms; support for Hadoop 2.0 and YARN; new products; as well as alliances and acquisitions.
On the new platforms front, Tableau Desktop and Tableau Public, which had previously been available only for Windows PCs, are now available for the Mac; and Roambi, a mobile data visualization platform, which had been iOS-exclusive, will available on Android starting next month.
Tableau’s release comes as part of a new version (8.2) of the venerable data discovery package, so Windows users benefit too, and Tableau achieves parity on both platforms. Tableau 8.2 sports a new “Story Points” facility which mashes up presentation and dashboard features, and also includes a redesigned data connection experience, updated maps and improvements to Tableau Server.
Roambi Analytics for Android will support the Card, Catalist, Layers and Superlist views, with support for additional visualizations, and the Roambi Flow publishing product, to be added “throughout the remainder of the year.”
Hadoop 2.0 gains adoption
Two different new version releases, announced this past week, add support for Hadoop 2.0 and YARN, Hadoop’s cluster management layer that decouples from the high-overhead MapReduce processing algorithm.
Pentaho announced version 5.1 of its Business Analytics platform, which includes full support for YARN. It also includes support for in-place processing of data with MongoDB, and a new Data Science Pack that provides interfaces from Pentaho Data Integration to the R programming language, and to Weka, Pentaho’s open source data mining engine.
RainStor, which has for some time provided advanced data compression for the Hadoop Distributed File System (HDFS), has released a new Archive Application, part of its RainStor 6 release, that is fully YARN-compatible. This application works much faster than its predecessor, since it no longer needs to use MapReduce under the covers.
There are two news items in the new products arena. The first is the release of “Driven” an application performance management (APM) tool geared specifically to Big Data applications, from Concurrent. Concurrent is best known for its Cascading product, which provides a developer platform for building such Big Data applications, be they on a single server node or an entire Hadoop cluster.
The other announcement here is, I must admit, not exactly around a new product. It concerns, Aerospike, a previously proprietary in-memory NoSQL database, that the company announced is now available in open source format. While Aerospike will not (at least not yet) be rolled out as an Apache Software Foundation project, it will nonetheless be an Apache-licensed project, which may put it on par with NoSQL databases like MongoDB, Cassandra and HBase.
Closing out this week’s array of industry developments are an acquisition and two partnerships. Specifically, rising predictive analytics star RapidMiner has acquired Budapest-based Radoop; Splunk has formed a partnership with Syncsort; and GigaSpaces has announced it’s teaming up with SanDisk.
RapidMiner makes an eponymous open source data mining/machine learning package that is visually-oriented, eliminating the need for complex programming to build predictive data models. Radoop also makes an eponymous product, focused on Hadoop analytics functionality, that is also visually-oriented and is “powered by” RapidMiner itself, making the union quite logical.
What’s also logical is performing what Splunk calls “operational intelligence” on top of mainframe machine data data. Splunk’s alliance with Syncsort does just that, specifically providing for collection and analysis of data from IBM z/OS systems.
Finally, GigaSpaces Technologies, the company behind an in-memory data grid called XAP, has announced a partnership with major flash memory player SanDisk, based around the latter’s ZetaScale software. The result is a new version of the product: XAP MemoryXtend, which allows XAP to use flash-based SSD storage at what the company calls “near the speed of DRAM memory.” This allows for the processing of larger datasets, at a lower cost basis than purely DRAM-based systems.
Taken together, this past week’s sprawl of announcements show that analytics products are moving solidly across platforms, beyond MapReduce, in-memory, and into the predictive analytics arena. Why so much activity in one fine week in June? Maybe to get ready for a slower summer…but I doubt we’ll have one.