The fundamental underpinning of an organization is its transactions. It must do them well, with integrity and performance. Not only has transaction volume soared of late, but the level of granularity in the transaction details has also reached new heights. Fast transactions greatly improve the efficiency of a high-volume business. Performance is incredibly important.
There are a variety of databases available to the transactional application. Ideally, any database would have the required capabilities; however, depending on the application’s scale and the chosen cloud, some database solutions can be prone to delays. Recent trends in information management see organizations shifting their focus to cloud-based solutions. In the past, the only clear choice for most organizations has been on-premises data using on-premises hardware. However, costs of scale are chipping away the notion that this remains the best approach for some, in not all, of a company’s transactional needs. The factors driving operational and analytical data projects to the cloud are many. Still, the advantages, like data protection, high availability, and scale, are realized with infrastructure as a service (IaaS) deployment. In many cases, a hybrid approach serves as an interim step for organizations migrating to a modern, capable cloud architecture.
This report outlines the results from a GigaOm Transactional Field Test, derived from the industry-standard TPC Benchmark™ E (TPC-E), to compare two IaaS cloud database offerings:
Both are installations of Microsoft SQL Server, and we tested Red Hat Enterprise Linux OS.
The results of the GigaOm Transactional Field Test are valuable to all operational functions of an organization such as human resource management, production planning, material management, financial supply chain management, sales and distribution, financial accounting and controlling, plant maintenance, and quality management. The underlying data for many of these departments today are in SQL Server, which is also frequently the source for operational interactive business intelligence (BI).
With the Azure feature of local cache, Microsoft SQL Server on Microsoft Azure Virtual Machines (VM) indicated 3x better performance over AWS when tested on RedHat Enterprise Linux (RHEL) 8.2. SQL Server on Microsoft Azure Virtual Machines (VM) had up to 68% better price-performance when comparing both on-demand and pay-as-you-go rates.
Testing hardware and software across cloud vendors is very challenging. Configurations favor one cloud vendor over another in feature availability, virtual machine processor generations, memory amounts, storage configurations for optimal input/output, network latencies, software and operating system versions, and the benchmarking workload itself. Our testing demonstrates a narrow slice of potential configurations and workloads.
As the sponsor of the report, Microsoft selected the particular Azure configuration it desired to test. GigaOm selected the AWS instance configuration closest in terms of CPU, memory, and disk configuration. There were tradeoffs which resulted in an input-output operations per second (IOPS) disadvantage to AWS (which we discuss in the report).
We leave the issue of fairness for the reader to determine. We strongly encourage you, as the reader, to look past marketing messages and discern for yourself what is of value. We hope this report is informative and helpful in uncovering some of the challenges and nuances of platform selection.
In the same spirit of the TPC, price-performance is intended to be a normalizer of performance results across different configurations. Of course, this has its shortcomings, but at least one can determine “what you pay for and configure is what you get.”
The parameters to replicate this test are provided. You are encouraged to compile your own representative queries, data sets, and data sizes and test compatible configurations applicable to your requirements.
The parameters to replicate this test are provided. We used the BenchCraft tool, which was audited by a TPC-approved auditor, who reviewed all updates to BenchCraft. All the information required to reproduce the results are documented in the TPC-E specification. BenchCraft implements the requirements documented in Clauses 3, 4, 5, and 6 of the benchmark specification. There is nothing in BenchCraft that alters the performance of TPC-E or this TPC-E derived workload.
The scale factor in TPC-E is defined as the number of required customer rows per single transactions per second. We did change the number of Initial Trading Days (ITD). The default value is 300, which is the number of 8-hour business days to populate the initial database. For these tests, we used an ITD of 30 days rather than 300. This reduces the size of the initial database population in the larger tables. The overall workload behaves identically with an ITD of 300 or 30 as far as the transaction profiles are concerned. Since the ITD was reduced to 30, any results obtained would not be compliant with the TPC-E specification and, therefore, not comparable to published results. This is the basis for the standard disclaimer that this is a workload derived from TPC-E.
However, BenchCraft is just one way to run TPC-E. All the information necessary to recreate the benchmark is available at TPC.org (this test used the latest version 1.14.0). Just change the ITD, as mentioned above.
We have provided enough information in the report for anyone to reproduce this test. Again, you are encouraged to compile your own representative queries, data sets, data sizes, and test compatible configurations applicable to your requirements.
Relational databases are a cornerstone of an organization’s data ecosystem. While alternative SQL platforms are growing with the data deluge, and have their place, today, workload platforming decision-makers highly consider and usually choose the relational database. This is for a good reason. Since 1989, Microsoft SQL Server has proliferated to near-ubiquity as the relational Server of choice for the original database use case—On-Line Transaction Processing (OLTP)—and beyond. Now SQL Server is available on fully-functional infrastructure offered as a service, taking complete advantage of the cloud. These infrastructure as a service (IaaS) cloud offerings provide predictable costs, cost savings, fast response times, and strong non-functionals.
As our testing confirms, the major difference between SQL Server on Azure and SQL Server on AWS is the storage I/O performance.
Microsoft SQL Server on Azure Virtual Machines Storage Options
For operationally-intensive, business-critical workloads, Azure recommends either Premium Managed Disk or Ultra Disk. While Ultra Disk is Azure’s high-end disk, we chose to test Premium Managed Disks. Premium Managed Disks are high-performance SSDs designed to support I/O intensive workloads and provide high throughput and low latency, but with a balanced cost compared to Ultra Disk. Premium SSD Managed Disks are provisioned as a persistent disk with configurable size and performance characteristics. They can also be detached and reattached to different virtual machines.
The cost of Premium SSD Managed Disks depends on the number and size of the disks selected, and by the number of outbound data transfers. These disk sizes provide different IOPS, throughput (MB/second), and monthly price per GiB. Several persistent disks attached to a VM can support petabytes of storage per VM. Premium four disk configurations can achieve up to 80,000 IOPS and 1,600 MB per second disk throughput per VM—which translates to less than one millisecond latency for read operations with applications which can take advantage of read caching. Premium SSD Managed Disks are supported by DS-series, FS-series, and GS-series VMs. The largest single disk is the P80 with 32TB of storage, IOPS up to 20,000, and 900 MB per second of throughput.
For additional performance, Azure offers local cache options of Read/Write and Read-only cache. The local cache is a specialized component that stores data, typically in memory, so that it can be accessed more quickly. Read cache attempts to reduce read latency, while with write cache data to be written to permanent storage is queued in the cache. This feature is not available on all disk types, nor is it available for temporary storage disks. Also, not all applications can leverage cache. According to Microsoft:
Caching uses specialized, and sometimes expensive, temporary storage that has faster read and write performance than permanent storage. Because cache storage is often limited, decisions need to be made as to what data operations will benefit most from caching. But even where the cache can be made widely available, such as in Azure, it’s still important to know the workload patterns of each disk before deciding which caching type to use.
Use of write cache could cause data loss. Specifically in regards to write caching, Microsoft cautions:
If you are using Read/Write caching, you must have a proper way to write the data from cache to persistent disks. For example, SQL Server handles writing cached data to the persistent storage disks on its own. Using Read/Write cache with an application that does not handle persisting the required data can lead to data loss, if the VM crashes.
Additionally, for the SQL Server temporary database (tempdb), we used a high-performance NVMe solid state drive. NVMe solid state drives (SSD) deliver high random I/O performance with very low latency.
Microsoft SQL Server on Amazon Web Services Elastic Cloud Compute (AWS EC2) Instances Storage Options
Amazon Web Services offers Elastic Block Store (EBS) as an easy to use, high performance block storage service designed for use with Amazon Elastic Compute Cloud (EC2). EBS supports a broad range of workloads, such as relational and non-relational databases, enterprise applications, containerized applications, big data analytics engines, and file systems. With EBS, AWS customers can choose from four different volume types to balance optimal price and performance. You can achieve single-digit, millisecond-latency for high performance database workloads.
Amazon EBS has two different types of solid state drives General Purpose SSD (gp2) and Provisioned IOPS SSD (io1). Provisioned IOPS disks are more akin to Azure Ultra Disk, so we chose General Purpose SSD (gp2) volumes to balance price and performance for these workloads. AWS recommends this drive type for most workloads. With an IOPS ceiling of only 16,000, it tends to underperform unless relegated to system boot volumes, virtual desktops, low-latency applications, and development/test environments. However, if the drive is attached to an AWS Nitro-based instance, it’s IOPS ceiling doubles to 32,000.
For the test, we chose one of AWS’s Nitro-based instances to give AWS the best possible performance using gp2 volumes. Like Azure, for the SQL Server temporary database (tempdb), we used a high-performance NVMe solid state drive.
One of our main objectives in this benchmark is to test an I/O intensive workload on Amazon and Azure’s speed-cost balanced SSD volume types head-to-head—to understand both the performance and price-per-performance differences of the two leading cloud vendor’s SQL Server offerings. AWS does not offer the local cache feature available in Azure, and we desired to see the difference in performance when Read-only cache was enabled on Azure data disks compared to AWS without the benefit of the local cache.
The GigaOm Transactional Field Test is a workload derived from the well-recognized industry-standard TPC Benchmark™ E (TPC-E). The workload was modified, i.e. transaction mix, from the standard TPC-E benchmark for ease of benchmarking and as such, the results generated are not comparable to official TPC Results. From tpc.org:
TPC Benchmark™ E (TPC-E) is an OLTP workload. It is a mixture of read-only and update intensive transactions that simulate the activities found in complex OLTP application environments. The database schema, data population, transactions, and implementation rules have been designed to be broadly representative of modern OLTP systems. The benchmark exercises a breadth of system components associated with such environments.
The TPC-E benchmark simulates the transactional workload of a brokerage firm with a central database that executes transactions related to the firm’s customer accounts. The data model consists of 33 tables, 27 of which have the 50 foreign key constraints. The results of TPC-E are valuable to all operational functions of an organization, many driven by SQL Server and frequently the source for operational interactive business intelligence (BI).
The data sets used in the benchmark were generated based on the information provided in the TPC Benchmark™ E (TPC-E) specification. For this testing, we used the database scaled for 1,000,000 customers. This scaling determined the initial data volume of the database. For example, a total of 1,000,000 customers is multiplied by 17,280 to determine the number of rows in the TRADE table: 17,280,000,000. All of the other tables were scaled according to the TPC-E specification and rules. On our Azure virtual machines and AWS EC2 instances, we allocated 4TB of storage—which was more than enough for this workload. Besides the scale factor of 1,000,000 customers, the test offers a few other “knobs” we turned in order to determine the database engine’s maximum throughput capability for both AWS and Azure. See Table 1.
Table 1: Configuration Changes to Maximize Throughput for Both Microsoft Azure and AWS
We completed three runs per test on each platform. Each test run lasted a duration of at least two hours each. We then took the average transactions per second for the last 30 minutes of the test runs. Also, a full backup was restored to reset the database back to its original state between each run. The results are shared in the Field Test Results section.
Selecting and sizing the compute and storage for comparison can be challenging, particularly across two different cloud vendors’ offerings. There are various offerings between AWS and Azure for transaction-heavy workloads. As you will see in Table 2, at the time of testing and publication, there was not an exact match across the offerings in neither processors nor memory.
We considered the variety of offerings on AWS and selected the memory-optimized R5 family. We have used the R5 family in other testing, and we believed it to be a solid performer. It is also described very similar to the Azure offering. R5 is described as “optimized for memory-intensive and latency-sensitive database workloads, including data analytics, in-memory databases, and high-performance production workloads.”
On the Azure side, we expect mission-critical-minded customers to gravitate towards Ev4 family which are described as “ideal for memory-intensive enterprise applications.” We opted for the ESv4 series which offers premium storage disks, as opposed to the standard disk offered with the Ev4 series. Thus, we decided on R5d for AWS RDS and E64ds_v4 for Azure SQL VM. Our approach was to find the “nearest neighbor” best fit. The challenge was selecting a balance of both CPU and memory. R5d.16xlarge on AWS has 64 vCPUs and 512 GiB memory. Azure offers a 64 core instance in E64ds_v4, but it only has 504 GiB of memory, which is slightly less than the r5d.16xlarge. This was our best, most diligent effort at selecting compute hardware compatibility for our testing.
In terms of storage, our objective was to test a data disk with at least 4TB of storage and the same number of maximum IOPS for both Azure Premium Disks and AWS General Purpose (gp2). Read-only cache on Azure was not available for a volume size as large as 4TB, so instead we deployed four (4) 1TB disks, each with Read-cache enabled, and combined them using RAID0 disk striping on RHEL. Striping is not required to enable Caching for a disk on Azure—caching is available for a single disk. Striping the disks was recommended to us by Azure because of the design of the platform. Azure Premium Storage is offered in pre-fixed sizes and IOPS caps. Thus, striping of equal size disks is the publicly documented and recommended best practice to achieve desired size/IOPS configuration. Each Azure 1TB disk has 5,000 maximum IOPS, giving a total of 20,000 IOPS when they were striped together.
On AWS, we mimicked the same storage configuration of four (4) gp2 volumes striped with RAID0. However, since AWS gp2 volumes have a fixed 3 IOPS per 1 GB of storage, 4TB of total disk would only have 12,288 maximum IOPS. Thus, we increased the capacity of each volume to 1.667 TB, giving the striped RAID0 array a capacity of 6.7 TB and a maximum potential IOPS of 20,000–the same as Azure.
Any chance of data loss (a potential downside to RAID0 disk striping) of SQL Server is recommended to be protected by having an asynchronous Disaster Recovery replica or by restoring from a backup.
Another configuration difference to note that may have impacted our results was for both the Azure virtual machine and AWS EC2 instance we used the locally attached temporary storage for the SQL Server ‘tempdb’ database. The SQL Server tempdb stores internal objects created by the database engine, such as, work tables for sorts, spools, hash joins, hash aggregates, and intermediate results. Having the tempdb on local temporary storage usually means higher I/O performance.
Both the Azure and AWS configurations had 20,000 IOPS available through the storage configurations. We don’t know for sure how much the Read Cache helps Azure. We wanted both AWS and Azure to have the same 4TB of storage. As previously stated, with AWS gp2 drives, you can’t choose the IOPS. It’s fixed. You get 3 IOPS per 1 GB of disk. So in order to give AWS 20,000 IOPS, we allocated 6.7TB of storage, while Azure only had 4TB. Unfortunately, due to the different configuration profiles of AWS and Azure, it is impossible to get both storage capacity and maximum IOPS equivalent for a test.
Results may vary across different configurations and again, you are encouraged to compile your own representative queries, data sets, data sizes, and test compatible configurations applicable to your requirements. All told, our testing included two different database environments. For more information on storage type and IOPS, please reference the footnote.
Table 2: Configurations Used for Tests
|Operating System||Red Hat Enterprise Linux Server 8.2 <X64>||Red Hat Enterprise Linux Server 8.2 <X64>|
|Database||Microsoft SQL Server 2019 15.0.4053.23-2 (X64) Aug 04 2020 Enterprise Edition||Microsoft SQL Server 2019 15.0.4053.23-2 (X64) Aug 04 2020 Enterprise Edition|
|Region||US West 2||West US|
|Storage Configuration||4x 1.667TB gp2 data + 1x 1TB gp2 log||4x P30 1TB RAID0 data (Read Only Cache) + 1x P40 1TB log (No Cache)|
|Data Disk IOPS||20,000||20,000|
|Log Disk IOPS||3,072||5,000|
For these tests, we tested the platforms with the following tuned SQL Server configurations—identical settings for both Azure and AWS:
Table 3: OS Kernel Settings
Table 4: SQL Server Settings
This section analyzes the transactions per second (tps) from the fastest of the five runs of each of the three GigaOm Transactional Field Tests described in the table below. A higher tps is better—meaning more transactions are being processed every second.
Figure 1. Azure Premium SSD Managed Disks vs. AWS EBS General Purpose SSD (gp2) in Transactions per Second
Azure SQL Server 2019 Enterprise on RHEL 8.2 best tps was more than triple the best tps of AWS SQL Server 2019 Enterprise on RHEL 8.2.
It is worth noting that on one of the three runs, AWS did achieve 449 transactions per second. However, the transaction distribution for Market Feed and Trade Result transactions were too low–making it an invalid run according to the TPC-E standards.
We sought to understand the gap in performance–given the configurations we chose. In reviewing the performance of AWS, we uncovered some interesting findings that may begin to explain the differences we saw. We even sought advice and confirmation from AWS, and they did not recommend any different configuration or provide any conflicting results in the telemetry they observed from their side.
In terms of the Benchcraft tests, the tool does provide output of all transactions that occurred and their timeline. We combined this with Linux SAR CPU and IOSTAT output. The following charts detail transactions per second, IOPS, and CPU utilization we measured during AWS’s best run.
Figure 2: AWS Transactions Per Second, Data Disk IOPS, and CPU Utilization
As you can see, the CPU on AWS EC2 instance spent an average of 38% of its time waiting for I/O bandwidth to become available.
Also from the Benchcraft output, we were given the response times for the different transaction types. The performance metric, transactions per second, is taken from Trade Result transactions.
Table 3: Azure vs. AWS TPC-E Transaction Details
On the AWS side, the average response times are high for other important transactions, like Market Feed and Trade Order. Trade Order averages almost 1 second with a standard deviation 0.76. This behavior indicates a backing up–ultimately slowing down the overall Trade Result metric.
To be diligent in our analysis, we performed some other testing to ensure it was not the Benchcraft tool itself, which was the bottleneck. Even though Benchcraft does not throttle transactions (when No Pacing is set like it was for these tests.) We used the Linux utility FIO (Flexible I/O Tester).
Our findings do point to performance below our expectations and may explain why AWS did not perform as well on the TPC-E test.
On a sequential Log Disk block device performance test, we observed:
This does not seem like stellar write performance for an NVMe device.
On a 100% random read block devices performance with 8 threads on our Data Disk RAID0 array, we observed:
While the IOPS achieved the 20,000 stated in the AWS specification, the throughput was lower than we expected for software raided NVMe devices.
On a random 60% read/40% write performance with 8 threads on our Data Disk RAID0 array, we observed:
Both throughput and latency for reads and writes also seem high.
While we can not know for certain what ultimately caused the performance gap between these two platforms, these statistics do offer some light.
The price-performance metric is price/throughput (tps). This is defined as the cost of running each of the cloud platforms continuously for three years divided by the transactions per second throughput uncovered in the previous tests 1-3. The calculation is as follows:
Price Per Performance = $/transactions per second (tps) =
[(Compute with on-demand SQL Server license Hourly Rate × 24 hours/day × 365 days/year × 3 years)
+ (Data disk(s) monthly cost per TB × 4 TB × 12 months × 3 years)
+ (Log disk monthly cost per TB × 1 TB × 12 months × 3 years)] ÷ tps
When evaluating price-per-performance, the lower the number, the better. This means you get more compute power, storage I/O, and capacity for your budget.
We performed this calculation across two different pricing structures:
The prices in table x were at the time of testing and reflect the US West 2 region on AWS and West US region on Azure. The compute prices include both the actual AWS EC2/Azure VM hardware itself and the license costs of the operating system and Microsoft SQL Server Enterprise Edition. We also included Azure Hybrid Benefit versus AWS License Mobility rates for existing SQL Server license holders.
Note: Prices do not include support costs for either Azure or AWS.
Each platform has different pricing options. Buyers should evaluate all of their pricing choices, not just the ones presented in this paper.
Figure 4: Price-performance, Azure vs. AWS on RHEL 8.2, Pay-As-You-Go and a 3-Year Commitment Pricing Without SQL Server License Mobility (lower means less cost to complete same workload)
AWS’ price-performance is just less than double the price-performance of Azure SQL Server on RHEL without license mobility.
Figure 5: Price-performance, Azure vs. AWS on RHEL 8.2, Pay-As-You-Go and 3-Year Commitment Pricing, With SQL Server License Mobility (lower means less cost to complete same workload)
AWS’ price-performance is more than double the price-performance of Azure SQL Server on RHEL with license mobility.
This report outlines the results from a GigaOm Transactional Field Test to compare the same SQL Server infrastructure as a service (IaaS) offering or two cloud vendors: Microsoft SQL Server on Amazon Web Services (AWS) Elastic Cloud Compute (EC2) instances and Microsoft SQL Server Microsoft on Azure Virtual Machines (VM).
We have learned that the database, along with the cloud, and the storage, matters to latency which is the killer for important transactional applications. Microsoft Azure presents a powerful cloud infrastructure offering for the modern transactional workload.
Microsoft SQL Server on Microsoft Azure Virtual Machines (VM) showed 3x better performance on RHEL 8.2 than Microsoft SQL Server on Amazon Web Services (AWS) Elastic Cloud Compute (EC2) in our configurations, which include similar maximum IOPS. This performance difference is seemingly due to enabling local cache on the Azure disks and the additional IOPS on the Azure disks compared to the AWS gp2 volumes of the same size. Microsoft SQL Server on Azure Virtual Machines (VM) had up to 68% better price-performance when comparing on-demand to pay-as-you-go rates.
Keep in mind, tests are configured to get the best from each platform according to publicly documented best practices. Optimizations on both platforms would be possible as their offering evolve or internal tests point to different configurations.
Performance is important but it is only one criterion for a business-critical database platform selection. This test is a point-in-time check into specific performance. There are numerous other factors to consider in selection across factors of Administration, Integration, Workload Management, User Interface, Scalability, Vendor, Reliability, and numerous other criteria. It is also our experience that performance changes over time and is competitively different for different workloads. Also, a performance leader can hit up against the point of diminishing returns and viable contenders can quickly close the gap. The benchmark setup was informed by the TPC Benchmark™ E (TPC-E) specification. The workload was derived from TPC-E and is not an official TPC benchmark nor may the results be compared to official TPC-E publications.
GigaOm runs all of its performance tests to strict ethical standards. The results of the report are the objective results of the application of queries to the simulations described in the report. The report clearly defines the selected criteria and process used to establish the field test. The report also clearly states the data set sizes, the platforms, the queries, etc. used. The reader is left to determine for themselves how to qualify the information for their individual needs. The report does not make any claim regarding the third-party certification and presents the objective results received from the application of the process to the criteria as described in the report. The report strictly measures performance and does not purport to evaluate other factors that potential customers may find relevant when making a purchase decision.
This is a sponsored report. Microsoft chose the competitors, the test, and the Microsoft configuration. GigaOm chose the most compatible configurations for the other tested platform and ran the testing workloads. Choosing compatible configurations is subject to judgment. We have attempted to describe our decisions in this paper.
Microsoft (Nasdaq “MSFT” @microsoft) enables digital transformation for the era of an intelligent cloud and an intelligent edge. Its mission is to empower every person and every organization on the planet to achieve more.
Microsoft offers SQL Server on Azure. To learn more about Azure SQL Database visit https://azure.microsoft.com/en-us/services/sql-database/.
William McKnight is a former Fortune 50 technology executive and database engineer. An Ernst & Young Entrepreneur of the Year finalist and frequent best practices judge, he helps enterprise clients with action plans, architectures, strategies, and technology tools to manage information.
Currently, William is an analyst for GigaOm Research who takes corporate information and turns it into a bottom-line-enhancing asset. He has worked with Dong Energy, France Telecom, Pfizer, Samba Bank, ScotiaBank, Teva Pharmaceuticals, and Verizon, among many others. William focuses on delivering business value and solving business problems utilizing proven approaches in information management.
Jake Dolezal is a contributing analyst at GigaOm. He has two decades of experience in the information management field, with expertise in analytics, data warehousing, master data management, data governance, business intelligence, statistics, data modeling and integration, and visualization. Jake has solved technical problems across a broad range of industries, including healthcare, education, government, manufacturing, engineering, hospitality, and restaurants. He has a doctorate in information management from Syracuse University.
GigaOm provides technical, operational, and business advice for IT’s strategic digital enterprise and business initiatives. Enterprise business leaders, CIOs, and technology organizations partner with GigaOm for practical, actionable, strategic, and visionary advice for modernizing and transforming their business. GigaOm’s advice empowers enterprises to successfully compete in an increasingly complicated business atmosphere that requires a solid understanding of constantly changing customer demands.
GigaOm works directly with enterprises both inside and outside of the IT organization to apply proven research and methodologies designed to avoid pitfalls and roadblocks while balancing risk and innovation. Research methodologies include but are not limited to adoption and benchmarking surveys, use cases, interviews, ROI/TCO, market landscapes, strategic trends, and technical benchmarks. Our analysts possess 20+ years of experience advising a spectrum of clients from early adopters to mainstream enterprises.
GigaOm’s perspective is that of the unbiased enterprise practitioner. Through this perspective, GigaOm connects with engaged and loyal subscribers on a deep and meaningful level.