Home > Engineering > Infrastructure > bare-for-pear > avsc > Serialization
avsc Serialization
Avro binary encoding as implemented in avsc. Compact, self-describing where needed, and deterministic.
Binary Encoding
Avro uses a compact binary format. No field names in the output, no type tags on primitives. The schema is the codec — encoder and decoder must agree on the schema to communicate.
This is the carrier principle made physical. The binary stream carries content. The schema — present elsewhere — provides meaning.
const type = avro.Type.forSchema({
type: 'record',
name: 'Entry',
fields: [
{ name: 'key', type: 'string' },
{ name: 'value', type: 'bytes' }
]
})
const buf = type.toBuffer({ key: '/path', value: Buffer.from('data') })
// buf is compact binary — no field names, no framing
const val = type.fromBuffer(buf)
// val is { key: '/path', value: <Buffer 64 61 74 61> }
The Tap
avsc’s internal binary reader/writer. A cursor over a byte buffer that reads and writes Avro primitives in sequence.
The Tap is not public API in the conventional sense, but it is the mechanism behind all encoding and decoding. Understanding it clarifies how Avro binary works.
const { Tap } = require('avsc/lib/utils')
const buf = new Uint8Array(1024)
const tap = new Tap(buf)
// Write
tap.writeLong(42)
tap.writeString('hello')
// Read
tap.pos = 0
tap.readLong() // 42
tap.readString() // 'hello'
Integers use zigzag varint encoding — small values take fewer bytes. Strings and bytes are length-prefixed with a varint.
Schema Fingerprints
Every schema has a deterministic fingerprint — a hash of its canonical JSON representation. Used for schema identification without transmitting the full schema.
const fp = type.fingerprint('md5')
// 16-byte Buffer — the schema's identity
In mycelium, schema fingerprints are used in the RPC handshake — client and server compare fingerprints to determine if they share a protocol.
The platform.js module provides getHash() using
bare-crypto for MD5 computation.
Encoding Characteristics
| Aspect | Detail |
|---|---|
| Null | 0 bytes |
| Boolean | 1 byte |
| Int/Long | 1–10 bytes (zigzag varint) |
| Float | 4 bytes (IEEE 754) |
| Double | 8 bytes (IEEE 754) |
| Bytes | varint length + raw bytes |
| String | varint length + UTF-8 bytes |
| Record | concatenated field encodings |
| Array | blocks of [count, items…], terminated by 0 |
| Map | blocks of [count, [key, value]…], terminated by 0 |
| Union | varint branch index + branch encoding |
| Enum | varint symbol index |
| Fixed | raw bytes (size from schema) |
No framing, no delimiters, no padding. The encoding is fully determined by the schema. Two records of the same type concatenated in a buffer are distinguishable only because the schema says where one ends and the next begins.
JSON Encoding
avsc also supports JSON encoding for debugging and interop:
const jsonStr = JSON.stringify(type.toJSON(val))
const val = type.fromJSON(JSON.parse(jsonStr))
JSON encoding includes type tags for unions and uses string representations for bytes. Larger than binary but human-readable.