High-Volume Data Replicationv1.0

Evaluating Fivetran HVR and Qlik Replicate

Table of Contents

  1. Executive Summary
  2. Platform Summary
  3. Test Setup
  4. Performance Test Results
  5. Total Cost of Ownership
  6. Conclusion
  7. Appendix
  8. Disclaimer
  9. About Fivetran
  10. About William McKnight
  11. About Jake Dolezal

1. Executive Summary

This report was commissioned by Fivetran.

Whether for operational or analytical purposes – databases are the backbone of how many businesses run; from collecting consumer behavior on your website to processing IOT data across your supply chain and so much more. Accessing and replicating massive volumes of database content is key to business success and the responsibility of managing this crucial element of your infrastructure falls to data leaders and their teams.

Ensuring your solution for database replication can keep up with your business is a pressing need for every data leader across every industry and company size. In this report, we investigate two major vendors in database replication and put them to the test in terms of speed and cost.

Behind the Scenes: How it Works
The process of locating and recording modifications to data in a database and instantly sending those updates to a system or process downstream is known as data replication or change data capture (CDC).

Data is extracted from a source, optionally transformed, and then loaded into a target repository—such as a data lake or data warehouse. Ensuring that all transactions in a source database are recorded and instantly transferred to a target keeps the systems synchronized and facilitates movement of data between on-premises sources and the cloud with minimal to no downtime for dependable data replication.

CDC–an incredibly effective method for moving data across technologies–is essential to modern cloud architectures. The real-time data transfer accelerates analytics and data science use cases. Enterprise data architectures utilize CDC to efficiently power continuous data transport between systems. Log-based CDC is a CDC method that uses a database’s transaction log to capture changes and replicate them downstream.

Using competing technologies Fivetran HVR and Qlik Replicate, our scenario assessed the replication latency and the total cost of ownership (TCO) of syncing 50 GB to 200 GB per hour of change data between a source Oracle database and a target Snowflake data warehouse using log-based CDC on the source. These tests simulate scenarios commonly encountered by large enterprises when utilizing technologies for log-based CDC.

At 200 GB/hour Fivetran HVR produced 27x lower

latency and proved 63% less costly than Qlik Replicate.

In this study, we found significant differences in replication latency and total cost of ownership between Fivetran HVR and Qlik Replicate.

  • Fivetran HVR showed a flat linear trend in replication latency as volumes increased, while Qlik Replicate showed an accelerated growth trend in replication latency with larger redo log change data volumes. The replication latency difference between Fivetran HVR and Qlik Replicate increased with greater change data volumes: Fivetran HVR showed 27 times lower latency at 200 GB/hour than Qlik Replicate.
  • Total Cost of Ownership (TCO) calculations reveal that Fivetran HVR is less expensive than Qlik Replicate across all tested volumes. When using these two high-volume data replication platforms for a year, Fivetran HVR is 63% less expensive than Qlik Replicate at 200 GB/hour.

Full content available to GigaOm Subscribers.

Sign Up For Free