5 Comments

Summary:

Terracotta is trying to bring real-time analytics to the masses (of Java users, at least) by letting Ehcache users query data stored in the product’s in-memory cache. With Ehcache Search, customers can perform real-time queries against terabytes of data stored in their transactional caches.

speed

Terracotta is bringing real-time analytics to the masses (of Java users, at least) by letting Ehcache users query data stored in the product’s in-memory cache. With Terracotta’s new Ehcache Search product, customers can perform simple queries in real time against as much as terabytes of data stored in their transactional caches,without having to install any new servers or purchase new appliances. The approach won’t replace the data warehouse, but it could have a significant effect on the future of analytics software development.

The open-source Ehcache is to Java what memcached is to dynamic web languages, in that it lets developers store certain data in-memory to avoid the inherent latency of interacting with the database every time an application needs to serve a piece of data. This setup is great for transactional workloads, but, generally, any analysis still requires a trip to the database. For queries that could benefit from real-time results, this latency can become troublesome, especially if the database is being bombarded, and slowed down, by large numbers of requests. Enter Ehcache Search.

According to Terracotta CEO Amit Pandey, one early customer that manages logistics for a fast-food chain was able to reduce latency times to the sub-second range from nearly a minute. Desperate for better performance, the company was considering Oracle’s high-performance Exadata Database Machine, but didn’t need all the complexity, and didn’t really want to pay the high price or deal with the 12-month installation process, either. A software-only product, it took only a month to install Ehcache, load the desired data into the memory of the existing application servers and start using the product.

But, as even Pandey acknowledges, databases and data warehouses aren’t going away. They’re still necessary for complex queries, especially against huge volumes of data that simply cannot be stored in-memory. Although, a Terracotta sales rep might be quick to point out that the line is blurring. When used in combination with Terracotta’s BigMemory product, users can store up to a terabyte of data in-memory (officially, although Pandey says users are storing up to 4 TB), and the company is planning to enrich the analytics capabilities within the next 18 months. Presently, Ehcache Search is available in both open-source and enterprise editions, and BigMemory is available solely as an enterprise edition.

This blend of transactional and analytical environments doesn’t start with Terracotta, however, and it won’t likely end with it. Already, SAP is selling its High-Performance Analytics Appliance (HANA) that relies on in-memory processing to let customers “instantly explore and analyze all of their transactional and analytical data,” and I have to think other vendors with their hands in both pots (e.g., Oracle and IBM will roll out their own offerings, as well. Pandey thinks they might even roll out lightweight versions in the same vain vein as the open source Ehcache Search, but said that will require strong customer demand. Considering those companies’ reliance on Java, and that Ehcache has a footprint hundreds of thousands of Java applications, Terracotta might be the company that makes Oracle, IBM and SAP customers see the light.

If that happens, it could represent a real shift in the advanced analytics market similar to the freemium trend we’re currently seeing in the SaaS space. Presently, vendors such as Terracotta, EMC (via Greenplum), and Jaspersoft and Pentaho are all approaching free, open-source analytics from different perspectives — real-time analytics, analytics database and BI, respectively — but getting huge software vendors on board with giving away advanced features to some degree might be considered a landscape shift that couldn’t have been imagined just several years ago.

Image courtesy of Flickr user jpctalbot.

Related content from GigaOM Pro (sub req’d):

You’re subscribed! If you like, you can update your settings

  1. Just a PR BS. OLAP queries are CPU – bound and Network I/O – bound when in cluster in most cases (Disk I/O is not a bottleneck for columnar databases at least).

  2. > “I have to think other vendors with their hands in both pots (e.g., Oracle and IBM will roll out their own offerings, as well.”

    Wow .. you’re only 4 years late on that prediction! Oracle Coherence (acquired by Oracle in 2007) was already doing all of this (search, analysis) and so much more (parallel query, map/reduce, real-time risk calculations, etc.)

    Peace,

    Cameron Purdy | Oracle Coherence

    1. Cameron,

      I stand corrected. I think all my discussions with you about Coherence focused on transactional performance rather than analytics. Still waiting for that free edition, though ;-)

  3. The Key in real time analytics relies on the ability to provide extensive query and processing ( the equivalent of stored procedure) power to manage big data streams.

    Both GigaSpaces and Coherence provided extended Query and processing capabilities for years. Ehcache/Terracotta is is therefore years behind the curve on that regard.

    The new stuff on that regard is providing standard SQL/JPA query ontop of an in-memory storage which brings the best of both worlds (speed & scale without forcing a complete re-write). The other part is the support for schemaless data structure ( which enables continues changes of data without bringing the system down.

    Nati Shalom | GigaSpaces

  4. spelling error, though slightly ironic:

    lightweight versions in the same vain as the open source

    should read “vein”, though I suspect vanity motivates them and their efforts may be in vain.

Comments have been disabled for this post