Table of Contents
- Executive Summary
- Log Analytics Platforms Tested
- Field Test Setup
- Field Test Results
- Conclusion
- Disclaimer
- Appendix
- About Microsoft
- About William McKnight
- About Jake Dolezal
- About GigaOm
- Copyright
1. Executive Summary
The number of connected devices, including the machines, sensors, and cameras that make up the Internet of Things (IoT), continues to grow rapidly. By 2025, we can expect tens of billions of Internet of Things (IoT) devices to produce zettabytes of data, even as machine-generated data outstrips that generated by humans. At the same time, data-driven organizations are generating expanding volumes of log-like data and relying on analytic databases to load, store, and analyze volumes of log data at high speed to derive insights and take timely actions.
This report focuses on the performance of cloud-enabled, enterprise-ready, popular log analytical platforms Microsoft Azure Data Explorer (part of Azure Synapse Analytics), Google BigQuery, and Snowflake. Due to cost limitations with Elasticsearch and AWS OpenSearch, we could not run our tests on Elasticsearch. Microsoft invited GigaOm to measure the performance of the Azure Data Explorer engine and compare it with its leading competitors in the log analytics space. The tests we designed intend to simulate a set of basic scenarios to answer fundamental business questions that an organization from nearly any industry might encounter in their log analytics.
In this report, we tested complex workloads with a volume of 100TB of data and concurrency of 1 and 50 concurrent users. The testing was conducted using comparable hardware configurations on Microsoft Azure and Google Cloud.
Of course, testing platforms across cloud vendors is challenging. Configurations can favor one cloud vendor over another in feature availability, virtual machine processor generations, memory amounts, optimal input/output storage configurations, network latencies, software and operating system versions, and the testing workload. Our testing demonstrates a slice of potential configurations and workloads.
As the sponsor of the report, Microsoft selected the specific configuration of its platform that it wanted to test. GigaOm then selected the configuration for Snowflake closest to the ADX configuration in terms of CPU and memory. Google BigQuery is offered “as-is” with no configuration choices other than slot commitments (discussion provided).
We leave the test’s fairness for the reader to determine. We strongly encourage you to look past marketing messages and discern its value for yourself. We hope this report is informative and helpful in uncovering some of the challenges and nuances involved in platform selection.
The parameters to replicate this test are provided in this document. We developed the testing suite from scratch because no existing benchmark framework met our needs. The testing application and the queries used are part of a code repository on GitHub.
Overall, the test results were insightful and revealed the query execution performance of the three platforms tested. Some of the highlights include:
- Azure Data Explorer (ADX) outperformed Google BigQuery and Snowflake on all 19 test queries with a single user and 18 of 19 with 50 concurrent users.
- ADX completed all 19 queries under 1 second with a single user, while the average execution time on BigQuery and Snowflake was 15 and 13 seconds per query, respectively1.
- ADX completed all 19 queries in under 10 seconds with 50 concurrent users.
- BigQuery and Snowflake both had 8 queries (different ones) that did not complete within the 2-minute timeout2.
- We found the cost of data ingestion and query execution on ADX to be significantly lower than on Google BigQuery, Snowflake, and Elasticsearch/OpenSearch
- We found it infeasible to test Elasticsearch and OpenSearch with this workload due to slow data loading speeds, expansion of the on-disk data volume (as compared to compression), and the inability to scale hot-tier (SSD) storage independently or in a cost-effective manner. The parameters to replicate this test are provided. You are encouraged to compile your own representative queries, data sets, sizes, and test compatible configurations applicable to your requirements.
- There were several performance outliers. The geometric mean of query execution for BigQuery and Snowflake was approximately 4.5 seconds per query.
- Using the 50th percentile or the 25th query to complete (or be canceled)