Home > Engineering > Infrastructure > bare-for-pear > avsc > Container Files
avsc Container Files
Avro container files — self-describing binary files that embed the writer schema with the data. The container is how data travels with its own carrier description.
What Containers Are
An Avro container file (also called Object Container File or OCF) bundles:
- A header — magic bytes, writer schema, sync marker, optional codec
- Data blocks — sequences of records encoded with the writer schema
The writer schema travels with the data. Any reader with a compatible reader schema can decode the contents. No external registry needed.
This is AVRO’s native answer to the question: how does data describe itself? The container embeds the carrier description. The reader brings the meaning lens.
Reading
const avro = require('avsc')
avro.createFileDecoder('./data.avro')
.on('metadata', (type) => {
// type is the writer's schema — what the data is
})
.on('data', (val) => {
// val is a decoded record
})
With a reader schema (schema evolution):
const readerType = avro.Type.forSchema({ ... })
avro.createFileDecoder('./data.avro', { readerType })
.on('data', (val) => {
// val decoded through reader's lens
})
Writing
const schema = {
type: 'record',
name: 'Entry',
fields: [
{ name: 'key', type: 'string' },
{ name: 'value', type: 'bytes' }
]
}
const encoder = avro.createFileEncoder('./out.avro', schema)
encoder.write({ key: '/path', value: Buffer.from('data') })
encoder.write({ key: '/other', value: Buffer.from('more') })
encoder.end()
Header Extraction
Synchronous header reading without streaming the entire file:
const header = avro.extractFileHeader('./data.avro')
// header.meta['avro.schema'] — parsed writer schema
// header.meta['avro.codec'] — compression codec
// header.sync — 16-byte sync marker
Stream Classes
The underlying stream implementations, available for custom pipelines:
const { BlockDecoder, BlockEncoder,
RawDecoder, RawEncoder } = avro.streams
| Class | Direction | Description |
|---|---|---|
BlockDecoder |
Read | Decodes container-format input |
BlockEncoder |
Write | Encodes records into container blocks |
RawDecoder |
Read | Decodes raw Avro binary (no container framing) |
RawEncoder |
Write | Encodes records as raw Avro binary |
Block streams handle the container format — header, sync markers, block boundaries. Raw streams handle bare record sequences — no container overhead.
All stream classes use bare-stream Duplex in the
fork.
Container Structure
[magic: 4 bytes "Obj\x01"]
[header: avro record]
meta: map<string, bytes>
"avro.schema": writer schema as JSON
"avro.codec": "null" | "deflate" | "snappy"
sync: 16 bytes (random marker)
[block]*
count: long (number of records)
size: long (byte size of encoded records)
data: bytes (encoded records, possibly compressed)
sync: 16 bytes (must match header)
The sync marker detects corruption and enables random access — seek to any sync marker and start reading blocks from there.