GigaOm Radar for Data Lakes and Lakehousesv2.0

Table of Contents

  1. Executive Summary
  2. Market Categories and User Segments
  3. Decision Criteria Comparison
  4. GigaOm Radar
  5. Solution Insights
  6. Analyst’s Outlook
  7. About Andrew Brust

1. Executive Summary

Data lakehouses are platforms intended to combine the flexibility of data lakes with the governance, structural optimizations, and query processing technologies of data warehouses. This combination of technologies is thought by its proponents to be the optimal blend for analytics.

There are a number of key technologies enabling the data lakehouse paradigm. These include:

  • Open table formats—such as Delta Lake, Apache Iceberg, and Apache Hudi, along with the Apache Parquet columnar data file format that typically underlies all three—are intended to bring structure to data lakes, aid query performance, and facilitate atomicity, consistency, isolation, and durability (ACID) guarantees.
  • Analytics query engines allow analytics to be performed across a broad, distributed variety of data without having to apply extensive transformations to that data first.
  • Query accelerations—including in-memory caching, indexing, and vector processing on CPUs—are key techniques used by such engines to optimize query processing.

For organizations that have stretched the versatility of data warehouses or struggled with the performance of first-generation data lakes, the modern data lakehouse provides an elegant solution. It functions as a single platform that can store and manage widely varied types of data and can still enable diverse and powerful analytics over that data.

Business Imperative
An organization’s data provides no value unless there is a way to derive meaningful insights from that data. To obtain the most value from their data, organizations must have control over it and they need to extract meaning from it. This is especially challenging when it comes to big data, which is characterized by larger volumes, increased varieties, higher velocities, and a greater number of sources. Data lakehouses evolved to meet this need. They provide powerful, reliable, and versatile systems that enable organizations to manage their data and facilitate analytics to power their operations and strategic decisions.

This is our second year evaluating the data lake and lakehouse space in the context of our Key Criteria and Radar reports. This report builds on our previous analysis and considers how the market has evolved over the last year.

This GigaOm Radar report examines 10 of the top data lake and lakehouse solutions and compares offerings against the capabilities (table stakes, key features, and emerging features) and non-functional requirements (business criteria) outlined in the companion Key Criteria report. Together, these reports provide an overview of the market, identify leading data lake and lakehouse offerings, and help decision-makers evaluate these solutions so they can make a more informed investment decision.

GIGAOM KEY CRITERIA AND RADAR REPORTS

The GigaOm Key Criteria report provides a detailed decision framework for IT and executive leadership assessing enterprise technologies. Each report defines relevant functional and non-functional aspects of solutions in a sector. The Key Criteria report informs the GigaOm Radar report, which provides a forward-looking assessment of vendor solutions in the sector.