Home > Positioning > Subjects > Apache Kafka

Apache Kafka

Apache Kafka is a distributed event-streaming platform built around a deceptively simple idea: a durable, append-only log of records that many producers write to and many consumers read from independently, each at its own pace. What began as a messaging system became a general substrate for moving and storing streams of events, and the log abstraction it popularised reshaped how event-driven and data-integration systems are built.

The organising idea is the log: an ordered, append-only sequence of records. Producers append; each consumer tracks its own position (an offset) and reads forward; records are retained and can be re-read. Because the log is durable and replayable rather than a transient queue, it serves at once as messaging, as a system of record, and as the seam between systems.

Origin

Kafka was built at LinkedIn around 2010 by Jay Kreps, Neha Narkhede, and Jun Rao, to carry the company’s high-volume activity and operational data as a single pipeline where existing messaging systems did not scale. It was open-sourced in early 2011 and became a top-level Apache project in October 2012. Kreps named it after the writer Franz Kafka — a system “optimised for writing,” and he liked the author’s work. In 2014 the three creators founded Confluent, the company built around Kafka. The conceptual case for the design was set out in Kreps’ widely read 2013 essay, “The Log: what every software engineer should know about real-time data’s unifying abstraction.”

Pages

Persons

Sources


See also: Jay Kreps · Apache Avro · Martin Kleppmann