Table of Contents
- Market Categories and User Segments
- Key Criteria Comparison
- GigaOm Radar
- Vendor Insights
- Analyst’s Take
- About Andrew Brust
Organizations need to manage data on a large scale that is stored in different formats—structured, unstructured, or semi-structured—without having to rely on proprietary software, as with data warehouses. Data lakes allow organizations to easily, and with very little maintenance or structure, store and query large amounts of data.
As a result, many data lakes are compatible with many different types of file formats, including CSV (comma-separated values), Parquet, and newer formats like Delta Lake and Iceberg. Additionally, many data lakes (and the query engines built to analyze the large-scale datasets within them) leverage an underlying open source technology, support open file formats, and handle security and governance through integration with additional open source technologies, such as Apache Ranger and Atlas.
The past, present, and future of data lakes are intertwined with those of the data warehouse. Both solutions originated with attempts to find a single optimal solution to enterprise data management. Additionally, over the past year, the term “lakehouse” has moved from a novel, somewhat esoteric moniker into the mainstream. A lakehouse is a solution that attempts to blend capabilities of data warehouses and data lakes together. The blending is done by implementing query engine features that are designed to bring the optimizations and performance of a data warehouse to a data lake. Proponents of this architecture describe a lakehouse as an optimal blend of data lake and data warehouse approaches.
Today, there is a wide range of opinions, philosophies, and marketing biases within the industry regarding the relationship between data lakes and data warehouses. Some vendors, like Snowflake, are proponents of a data-warehouse-only approach. Others—like Microsoft, Google, and Oracle—provide users with the choice of some combination of a data lake, lakehouse, and/or data warehouse within the same product offering. Still others—like Databricks, Cloudera, and Dremio—stick to lakehouse offerings exclusively but emphasize that they are elegant hybrids of data lake and warehouse technology, obviating the need for a combination.
Regardless of the specific technology or label—lake, lakehouse, warehouse—the most important factor for organizations to focus on when selecting a product is the use case it must address. To that end, this report aims to assist organizations in their decision-making process, to help them select the solution that best suits their needs.
This GigaOm Radar report highlights key data lake and lakehouse vendors and equips IT decision-makers with the information needed to select the best fit for their business and use case requirements. In the corresponding GigaOm report “Key Criteria for Evaluating Data Lake and Lakehouse Solutions,” we describe in more detail the capabilities and metrics that are used to evaluate vendors in this market.
How to Read this Report
This GigaOm report is one of a series of documents that helps IT organizations assess competing solutions in the context of well-defined features and criteria. For a fuller understanding, consider reviewing the following reports:
Key Criteria report: A detailed market sector analysis that assesses the impact that key product features and criteria have on top-line solution characteristics—such as scalability, performance, and TCO—that drive purchase decisions.
GigaOm Radar report: A forward-looking analysis that plots the relative value and progression of vendor solutions along multiple axes based on strategy and execution. The Radar report includes a breakdown of each vendor’s offering in the sector.
Solution Profile: An in-depth vendor analysis that builds on the framework developed in the Key Criteria and Radar reports to assess a company’s engagement within a technology sector. This analysis includes forward-looking guidance around both strategy and product.