Table of Contents
- Summary
- Differences Between Pulsar and Kafka
- Performance Test
- Cost Scenarios
- Conclusion
- Disclaimer
- About DataStax
- About William McKnight
- About Jake Dolezal
- About GigaOm
- Copyright
1. Summary
With machine learning progressing into the enterprise and bringing with it the ability to utilize every bit of data possible, it is essential to process an increasing amount of data in real-time.
There are a number of applications being developed that make autonomous decisions about where data is produced, consumed, analyzed, and reacted to in real-time. The technology is making pragmatic, tactical decisions on its own as a result.
However, if data is not captured within a certain window of time, its value is lost and the decision or action that needs to take place as a result never occurs.
There are, fortunately, technologies designed to handle large volumes of time-sensitive, streaming data. Known by names like streaming, messaging, live feeds, real-time, and event-driven, this category of data needs special attention because delayed processing can negatively affect its value. A sudden price change, a critical threshold met, an anomaly detected, a sensor reading changing rapidly, an outlier in a log file—any of these can be of immense value to a decision-maker or a process, but only if alerted in time to affect the outcome.
The focus of this report is on real-time data and how autonomous systems can be fed at scale while producing reliable performance. To shed light on this challenge, we assess and benchmark two leading streaming data technologies—Apache KafkaTM and Apache PulsarTM. Both solutions process massive amounts of streaming data generated from social media, logging systems, clickstreams, Internet-of-Things devices, and more. However, they also differ in important ways, including throughput and cost, which we uncover in our hands-on testing.
Our findings? In the three real-world scenarios we devised, we found that the production-ready distribution of Pulsar from DataStax with administration and monitoring tools and 24/7 enterprise support known as Luna Streaming produced higher average throughput and lower costs in all the testing workloads. Kafka and Pulsar are both streaming solutions. But with Kafka, you’ll be challenged with throughput, which will result in a potential 81% greater overall cost. The results offer a compelling case for Pulsar/Luna Streaming to support organizations’ growing streaming needs.