Table of Contents
- Executive Summary
- The Data Lakehouse
- Platforms Tested
- Field Test Setup
- Field Test Results
- Total Cost of Ownership
- About Starburst
- About William McKnight
- About Jake Dolezal
1. Executive Summary
Recently, several architectural patterns have emerged that decentralize most components of the enterprise analytics architecture. Data lakes are a large part of that advancement.
A GigaOm field test was devised to determine the differences between two popular enterprise data architectural patterns: a modern cloud data warehouse based on a Snowflake architecture and a modern data lakehouse with a Starburst-based architecture. The test was created with multiple components to determine the differences in performance and capability, as well as the amount of time and effort required to migrate to these systems from a legacy environment.
In terms of price-performance, the four scenarios are:
- Snowflake: Complete migration to Snowflake
- Starburst option 1: Lake adoption and on-premises federation
- Starburst option 2: On-premises migration and cloud federation
- Starburst option 3: Lakehouse adoption in its entirety
In our field tests, Starburst options required between 47% and 67% less migration effort than Snowflake, slashing time-to-insight and enabling analytical insights that drive business decision-making and yield significant financial impact.
We further divided Snowflake into a Snowflake migrate semi-structured option and a Snowflake additional compute for a semi-structured option for calculating the post-migration effort and the three-year total cost of ownership (TCO). These refer to the initial process and associated costs of transferring and organizing semi-structured data types into the Snowflake platform and supplementing Snowflake’s computational capacity to handle the more resource-intensive processing requirements of semi-structured data. Breaking down the Snowflake usage like this allows the organization to get a clearer picture of the total cost of ownership (TCO) over a three-year period. See Table 1.
Table 1. Costs by Migration Type
|Migration Type||Post-Migration Effort Cost||3-Year TCO|
|Snowflake Migrate Semi-Structured||$1,898,354||$3,366,800|
|Snowflake Additional Compute for Semi-Structured||$1,290,385||$3,278,344|
|Starburst Option 1: Lake Adoption & On-Premises Federation||$645,192||$1,597,748|
|Starburst Option 2: On-Premises Migration & Cloud Federation||$542,548||$1,522,620|
|Starburst Option 3: Full Lakehouse Adoption||$762,500||$1,853,549|
|Source: GigaOm 2023|
The results of this study show that the Starburst options are the most economical, while the Snowflake migrate semi-structured option and the Snowflake additional compute for the semi-structured option are the costliest. Using Starburst in a data lakehouse approach is the most economical in terms of transition to the architecture and the associated long-term cost.
Our online analytical processing (OLAP) and online transaction processing (OLTP) source data were derived from the TPC-DS benchmark, with the 24 tables of the TPC-DS schema divided between the two source databases.
We believe that by utilizing our legacy systems, we have effectively captured a representation of the current state of many businesses, the majority of which are considering migrating to a more modern architecture. This field test is intended to provide a glimpse into the available options.
In addition, we compared the efficacy of Snowflake-loaded raw JSON in a VARIANT column versus Starburst’s federated query, representing common workloads such as customer analytics, log analytics, clickstream analytics, and security analytics.
Our use cases entail migrations via lift-and-shift. We determined beforehand what constitutes an acceptable level of performance, as measured by the ability to complete the TPC-DS 99 query set in less than 15 minutes and a geometric mean of fewer than five seconds for a single user. According to our experience with businesses conducting efficient migrations, this level of performance meets the user’s requirements. Using these criteria and a pattern observed in enterprise evaluations, we determine the minimal cost-based compute infrastructure required by each platform to meet these performance thresholds.