Sharvit Kashikar

Posted on Jun 24

I Built a Database from Scratch in Go - Here's What I Learned About B+ Trees, ACID Transactions, and 1,800 ops/sec Performance 🚀

#go #opensource #database #showdev

Ever wondered how databases actually work under the hood? I spent months building one from scratch to find out. Here's my journey and what I discovered.

The "Why" Behind FiloDB 🤔

Like many developers, I used databases daily but never really understood the magic happening inside. SQLite, PostgreSQL, MySQL - they just worked. But how?

So I decided to build my own database engine in Go. Not a wrapper around existing libraries, not a simple key-value store, but a full relational database with:

B+ Tree storage engine
ACID transactions
Memory-mapped I/O
Concurrent reads
SQL-like interface
Cross-platform support

The result? FiloDB - a lightweight database that achieves 1,800+ operations per second with sub-millisecond latency.

The Architecture That Actually Works 🏗️

The B+ Tree Implementation 🌳

The heart of FiloDB is its B+ tree implementation. Unlike simple binary trees, B+ trees are optimized for disk storage:

type BNode struct {
    data []byte // Raw node data
}

type BTree struct {
    root uint64   // Root node page number
    get  func(uint64) BNode // Page getter
    new  func(BNode) uint64 // Page allocator  
    del  func(uint64)       // Page deallocator
}

Why B+ trees?

O(log n) performance for search, insert, delete
Sequential access for range queries
Disk-friendly with configurable page sizes
Cache efficient with high branching factor

Memory-Mapped I/O Performance 💾

Instead of traditional read/write system calls, FiloDB uses memory-mapped files:

// Platform-specific mmap implementation
func (kv *KV) ExtendMmap(npages int) error {
    if kv.mmap.total >= npages*BTREE_PAGE_SIZE {
        return nil
    }

    // Extend the memory mapping
    chunk := int64(npages) * BTREE_PAGE_SIZE
    kv.mmap.total = int(chunk)
    kv.mmap.chunks = kv.mmap.chunks[:0]
    return mmap_init(kv)
}

The performance impact is huge:

Direct memory access - no system call overhead
OS-level caching - let the kernel handle page management
Simplified code - treat files like memory arrays

Real Performance Numbers 📊

I didn't just build it - I benchmarked it extensively:

=== FiloDB Performance Benchmark ===
Insert Performance: ~1,813 ops/sec
Query Performance:  ~1,848 ops/sec  
Storage Efficiency: 0.88 KB per record
Average Latency:    <1ms per operation

How does it compare?

Database	Insert (ops/sec)	Query (ops/sec)	Use Case
FiloDB	~1,800 ⚡	~1,850 ⚡	Educational, Small Apps
SQLite	1,000-10,000	10,000+	Embedded Applications
PostgreSQL	5,000-50,000	50,000+	Production Applications

Not bad for an educational database!

ACID Transactions: Theory Meets Practice 🔒

Implementing ACID compliance taught me more about databases than any textbook:

func (db *DB) Begin(tx *DBTX) {
    tx.read = db.kv.get
    tx.write = db.kv.set
    tx.del = db.kv.del

    // Copy-on-write for isolation
    tx.pages = make(map[uint64]*BNode)
}

func (db *DB) Commit(tx *DBTX) error {
    // Atomically write all changes
    for pgid, node := range tx.pages {
        db.kv.set(pgid, node.data)
    }
    return db.kv.Sync()
}

The four pillars in action:

Atomicity: All-or-nothing commits
Consistency: Data validation at every step
Isolation: Copy-on-write for concurrent reads
Durability: Sync to disk on commit

The Most Challenging Parts 😅

1. Page Management

Managing disk pages and free space efficiently is harder than it looks. I implemented a free list to track available pages:

type FreeList struct {
    head uint64   // First free page
    total uint64  // Total free pages
}

2. Concurrent Safety

Allowing multiple readers while maintaining consistency required careful lock management and copy-on-write semantics.

3. Cross-Platform mmap

Memory mapping works differently on Linux, macOS, and Windows. I ended up with platform-specific implementations:

filodb_mmap_unix.go - Linux/macOS
filodb_mmap_windows.go - Windows
filodb_mmap_darwin.go - macOS optimizations

What I Learned (And You Can Too) 🎓

Database Internals Aren't Magic

Once you understand B+ trees, page management, and transactions, databases become much less mysterious.

Go is Perfect for Systems Programming

Memory safety without garbage collection overhead
Excellent standard library (golang.org/x/sys)
Great tooling for performance analysis
Simple concurrency model

Performance Comes from Architecture

The biggest performance gains came from:

B+ tree design (O(log n) operations)
Memory-mapped I/O (zero-copy access)
Page-based storage (disk-friendly)
Efficient locking (minimal contention)

Documentation Matters

I spent almost as much time on documentation as code. The result? A project that others can actually learn from.

Try FiloDB Yourself 🚀

Want to explore database internals? FiloDB is perfect for learning:

# Clone and build
git clone https://github.com/sharvitKashikar/FiloDB.git
cd FiloDB
go build -o filodb

# Create your first table
./filodb
> create
Enter table name: users
Enter columns: id,name,email
Enter types: 1,2,2  # INT64, BYTES, BYTES

What's Next? 🔮

FiloDB is just the beginning. I'm working on:

REST API for remote access
Query optimization for better performance
Replication for high availability
JSON queries for modern applications

The project is live on Peerlist where I'm sharing the development journey, and I'd love your feedback!

Resources & Links 📚

🔗 GitHub: https://github.com/sharvitKashikar/FiloDB
📊 Peerlist: https://peerlist.io/sharvit/project/filodblightweight-relational-database-system

What would you build if you wanted to understand how technology really works?

Drop a comment below - I'd love to hear about your "build it from scratch" projects!

Tags that describe this project:
🏷️ #golang #database #opensource #tutorial #performance #systems #buildInPublic #DatabaseInternals

If you're interested in:

Database internals and B+ trees
Systems programming in Go
Building projects from scratch
Performance optimization
Open source development

Then this project is for you! Give it a ⭐ on GitHub and let's connect!

Building FiloDB taught me that the best way to understand complex systems is to build them yourself. Sometimes the journey is more valuable than the destination.

DEV Community