The team that created Kafka is leaving LinkedIn to build a company around it

1 Comment

Credit: Confluent

A trio of LinkedIn engineers led by Jay Kreps — the person behind a good deal of LinkedIn’s recent infrastructure advances — is leaving the company to start their own business, called Confluent. Confluent is centered around the open source Apache Kafka real-time messaging technology that Kreps and his co-founders, Neha Narkhede and Jun Rao, created and developed. They have raised $6.9 million in venture capital from Benchmark, LinkedIn and Data Collective.

Kreps describes Kafka as a “central nervous system” for [company]LinkedIn[/company] and other companies that already use it, managing the streams of information that feed into it from various applications, processing each piece of data and then sending them where they need to go next. That might be Apache Storm, DataTorrent or, in the case of LinkedIn, Samza (which Kreps also built) for stream processing; Hadoop for batch processing; or just to a database to be served up later.

Unlike traditional enterprise messaging software, Kreps explained, Kafka is built to handle all the data flowing through a company, and to do it in near real time.

With Confluent, he and his co-founders hope to help companies outside the web build the types of real-time platforms that Kafka anchors at places including LinkedIn, Netflix, Uber and Verizon. Kreps said Confluent has talked to many of the thousands of Kafka users to learn about how their adoption and use patterns, and to figure out the things they typically needed to build around Kafka to make it really work. There’s no product yet, but those best practices in terms of deployment and technology will help inform whatever Confluent ends up building.

L to R: Jun Rao, Jay Kreps, Neha Narkhede. Source: Confluent

L to R: Jun Rao, Jay Kreps, Neha Narkhede. Source: Confluent

Kreps acknowledged he initially wondered whether non-web companies would be interested in a technology like Kafka (we’ve already seen other web-borne big data startups, such as Continuuity, change course) but seeing how widely it has been adopted in fields such as financial services and telecommunications helped change his mind. In March, I covered a Huntsville, Alabama-based company called Synapse Wireless that used Kafka to power a sensor network system for tracking the hygiene practices of hospital personnel.

“I think the need is absolutely there,” Kreps said.

The recent integration of Kafka into multiple Hadoop distributions shouldn’t hurt either, at least in terms of ensuring the technology works the data store that’s becoming the focal point of many big data environments. Kafka support from companies such as Hortonworks and Cloudera could also help seed a potential customer base for Confluent and perhaps expanding the development pool beyond Confluent and LinkedIn.

How Kafka fits into the Netflix data pipeline. Source; Netflix

How Kafka fits into the Netflix data pipeline. Source; Netflix

Kreps thinks it’s a safer bet to form a company around a messaging technology like Kafka rather than around an open-source stream-processing technology like Apache Storm because messaging is a more foundational component of advanced data-processing architectures. He remembers joining LinkedIn when it only had batch processes in place, and how excited everyone was when a startup came pitching a stream-processing system — until they realized LinkedIn didn’t have the architecture in place to support it.

Maybe they could have turned daily jobs into hourly jobs, he said, “but at that point you might as well load it into your data warehouse.”

“The big gap in most companies today is there’s very little data available in real time at all,” Kreps explained. Once a companies get the right technology stack in place, though, they can start looking at building internet-of-things or other sensor-based applications, or really anything that requires getting lots of data from lots of sources into the backend systems that need it.

“It actually opens up a whole range of use cases,” Kreps said, “that are otherwise not really available.”

1 Comment

John Haddad

This is great validation of the need for real-time streaming data. At Informatica, we’re seeing this technology take off for predictive maintenance in Manufacturing and Oil & Gas, fraud detection in Financial Services, pricing optimization in Insurance (e.g. telemetry data), and machine log analysis (e.g. clickstream). Last year we introduced Informatica Vibe Data Stream to collect real-time machine device data and stream it into Hadoop, a CEP engine, or NoSQL systems like Cassandra, etc. You can download a free trial of Vibe Data Stream web http://infa.media/1vQHO8N if you’d like to try it out.

Comments are closed.