Home > Positioning > Subjects > Apache Avro

Apache Avro

Apache Avro is a data serialization system and remote-procedure-call framework. Schemas are written in JSON; data is encoded in a compact, untagged binary form; and — this is the move that organises everything else — the schema is carried alongside the data rather than assumed by the code that reads it. A reader always has both the schema the data was written with and the schema it expects, and reconciles the two at read time. Because the schema is always present, no generated, compiled class is required to read or write a record. The current release is 1.12.1 (October 2025); the canonical reference is the Avro specification.

This page gives the through-line; the sub-pages give the structured depth, each linking out to the specification for exhaustive detail rather than reproducing it.

Design philosophy

Four commitments distinguish Avro from the binary serialization formats it grew up beside, Apache Thrift and Protocol Buffers:

Origin

Avro was created by Doug Cutting — the creator of Lucene and Hadoop — and proposed as a Hadoop sub-project in 2009. Hadoop needed a serialization format and a wire protocol for moving data between processes written in different languages, and the existing options required code generation and stable field tags that sat awkwardly with Hadoop’s dynamic, multi-language environment. Avro’s answer was to keep the schema with the data and resolve differences at read time. It graduated to a top-level Apache project in May 2010 and is now used well beyond its origin — most visibly in the Apache Kafka ecosystem, in Apache Spark, and across data lakes and pipelines where schemas change over time.

Pages

Persons

Sources


See also: Domain-Specific Languages · Apache Kafka · Doug Cutting