In the world of data serialization, JSON reigns supreme for its readability, but its verbosity can be a bottleneck for bandwidth-constrained applications like IoT or high-performance APIs. Enter CompactBinaryData (CBD), an open-source binary serialization format designed to be leaner, faster, and JSON-compatible. Let’s dive into why CBD is a game-changer and how you can start using it.
What is CBD?
CBD, detailed in its GitHub repo, is a lightweight alternative to JSON, BSON, and MessagePack. It achieves 40-60% size reduction compared to JSON for datasets with repetitive keys, thanks to dictionary compression and a 3-bit type system. Unlike gzipped JSON, CBD requires no decompression, making it 20-30% faster to parse. It supports JSON-like structures (objects, arrays, strings, numbers, booleans, null) with lossless round-tripping, ensuring seamless integration with existing systems.
The format’s design is simple yet powerful:
-
Header: A 5-byte structure with a magic number (
0xCBD1
), version, and dictionary size. - Dictionary: Stores repetitive keys as UTF-8 strings, replaced by 1-byte IDs in the data.
- Data: Uses a 3-bit type system and variable-length encoding (varints) for compact numbers and strings.
For example, a JSON object like {"name":"John","age":30,"scores":[95,87,92],"active":true}
is encoded in just 46 bytes with CBD, compared to 54 bytes for JSON.
Why Choose CBD?
CBD shines where size and speed matter:
- Compactness: Dictionary compression reduces redundant keys, ideal for IoT or data-heavy APIs.
- Performance: No decompression overhead, unlike gzipped JSON, and competitive with MessagePack.
- Extensibility: Reserved type codes allow for future data types like dates or binary blobs.
- Debugging: Offers a human-readable mode for easy inspection.
Benchmarks show CBD is 40-60% smaller than JSON and 20-30% faster than gzipped JSON, making it perfect for resource-constrained environments.
Getting Started
Using CBD is straightforward with its Python implementation. Install it via:
git clone https://github.com/makalin/CBD.git
pip install -r requirements.txt
Here’s a quick example to serialize and deserialize data:
from cbd import CBD
data = {"name": "John", "age": 30, "scores": [95, 87, 92], "active": True}
binary_data = CBD.serialize(data)
original_data = CBD.deserialize(binary_data)
print(original_data) # {"name": "John", "age": 30, "scores": [95, 87, 92], "active": True}
CBD also supports format conversion (JSON, MessagePack, BSON) and includes a benchmark suite to compare performance.
What’s Next?
CBD’s roadmap includes JavaScript and Rust libraries, streaming support, and custom data types. Its MIT-licensed codebase welcomes contributions—check out CONTRIBUTING.md to get involved.
For a deeper dive into CBD’s binary format, benchmarks, and use cases, read my Medium article: Unpacking CompactBinaryData (CBD). Explore the project on GitHub and share your thoughts below—how would you use CBD in your projects?
CompactBinaryData: Serialize smarter, not harder.
Top comments (0)