CompactBinaryData: A Lean Binary Serialization Format for Developers

#json #webdev #programming

In the world of data serialization, JSON reigns supreme for its readability, but its verbosity can be a bottleneck for bandwidth-constrained applications like IoT or high-performance APIs. Enter CompactBinaryData (CBD), an open-source binary serialization format designed to be leaner, faster, and JSON-compatible. Let’s dive into why CBD is a game-changer and how you can start using it.

What is CBD?

CBD, detailed in its GitHub repo, is a lightweight alternative to JSON, BSON, and MessagePack. It achieves 40-60% size reduction compared to JSON for datasets with repetitive keys, thanks to dictionary compression and a 3-bit type system. Unlike gzipped JSON, CBD requires no decompression, making it 20-30% faster to parse. It supports JSON-like structures (objects, arrays, strings, numbers, booleans, null) with lossless round-tripping, ensuring seamless integration with existing systems.

The format’s design is simple yet powerful:

Header: A 5-byte structure with a magic number (0xCBD1), version, and dictionary size.
Dictionary: Stores repetitive keys as UTF-8 strings, replaced by 1-byte IDs in the data.
Data: Uses a 3-bit type system and variable-length encoding (varints) for compact numbers and strings.

For example, a JSON object like {"name":"John","age":30,"scores":[95,87,92],"active":true} is encoded in just 46 bytes with CBD, compared to 54 bytes for JSON.

Why Choose CBD?

CBD shines where size and speed matter:

Compactness: Dictionary compression reduces redundant keys, ideal for IoT or data-heavy APIs.
Performance: No decompression overhead, unlike gzipped JSON, and competitive with MessagePack.
Extensibility: Reserved type codes allow for future data types like dates or binary blobs.
Debugging: Offers a human-readable mode for easy inspection.

Benchmarks show CBD is 40-60% smaller than JSON and 20-30% faster than gzipped JSON, making it perfect for resource-constrained environments.

Getting Started

Using CBD is straightforward with its Python implementation. Install it via:

git clone https://github.com/makalin/CBD.git
pip install -r requirements.txt

Here’s a quick example to serialize and deserialize data:

from cbd import CBD

data = {"name": "John", "age": 30, "scores": [95, 87, 92], "active": True}
binary_data = CBD.serialize(data)
original_data = CBD.deserialize(binary_data)
print(original_data)  # {"name": "John", "age": 30, "scores": [95, 87, 92], "active": True}

CBD also supports format conversion (JSON, MessagePack, BSON) and includes a benchmark suite to compare performance.

What’s Next?

CBD’s roadmap includes JavaScript and Rust libraries, streaming support, and custom data types. Its MIT-licensed codebase welcomes contributions—check out CONTRIBUTING.md to get involved.

For a deeper dive into CBD’s binary format, benchmarks, and use cases, read my Medium article: Unpacking CompactBinaryData (CBD). Explore the project on GitHub and share your thoughts below—how would you use CBD in your projects?

CompactBinaryData: Serialize smarter, not harder.