DEV Community

Sharvit Kashikar
Sharvit Kashikar

Posted on

I Built a Database from Scratch in Go - Here's What I Learned About B+ Trees, ACID Transactions, and 1,800 ops/sec Performance 🚀

Ever wondered how databases actually work under the hood? I spent months building one from scratch to find out. Here's my journey and what I discovered.

The "Why" Behind FiloDB 🤔

Like many developers, I used databases daily but never really understood the magic happening inside. SQLite, PostgreSQL, MySQL - they just worked. But how?

So I decided to build my own database engine in Go. Not a wrapper around existing libraries, not a simple key-value store, but a full relational database with:

  • B+ Tree storage engine
  • ACID transactions
  • Memory-mapped I/O
  • Concurrent reads
  • SQL-like interface
  • Cross-platform support

The result? FiloDB - a lightweight database that achieves 1,800+ operations per second with sub-millisecond latency.

The Architecture That Actually Works 🏗️

Image description

The B+ Tree Implementation 🌳

The heart of FiloDB is its B+ tree implementation. Unlike simple binary trees, B+ trees are optimized for disk storage:

type BNode struct {
    data []byte // Raw node data
}

type BTree struct {
    root uint64   // Root node page number
    get  func(uint64) BNode // Page getter
    new  func(BNode) uint64 // Page allocator  
    del  func(uint64)       // Page deallocator
}
Enter fullscreen mode Exit fullscreen mode

Why B+ trees?

  • O(log n) performance for search, insert, delete
  • Sequential access for range queries
  • Disk-friendly with configurable page sizes
  • Cache efficient with high branching factor

Memory-Mapped I/O Performance 💾

Instead of traditional read/write system calls, FiloDB uses memory-mapped files:

// Platform-specific mmap implementation
func (kv *KV) ExtendMmap(npages int) error {
    if kv.mmap.total >= npages*BTREE_PAGE_SIZE {
        return nil
    }

    // Extend the memory mapping
    chunk := int64(npages) * BTREE_PAGE_SIZE
    kv.mmap.total = int(chunk)
    kv.mmap.chunks = kv.mmap.chunks[:0]
    return mmap_init(kv)
}
Enter fullscreen mode Exit fullscreen mode

The performance impact is huge:

  • Direct memory access - no system call overhead
  • OS-level caching - let the kernel handle page management
  • Simplified code - treat files like memory arrays

Real Performance Numbers 📊

I didn't just build it - I benchmarked it extensively:

=== FiloDB Performance Benchmark ===
Insert Performance: ~1,813 ops/sec
Query Performance:  ~1,848 ops/sec  
Storage Efficiency: 0.88 KB per record
Average Latency:    <1ms per operation
Enter fullscreen mode Exit fullscreen mode

How does it compare?

Database Insert (ops/sec) Query (ops/sec) Use Case
FiloDB ~1,800 ~1,850 Educational, Small Apps
SQLite 1,000-10,000 10,000+ Embedded Applications
PostgreSQL 5,000-50,000 50,000+ Production Applications

Not bad for an educational database!

ACID Transactions: Theory Meets Practice 🔒

Implementing ACID compliance taught me more about databases than any textbook:

func (db *DB) Begin(tx *DBTX) {
    tx.read = db.kv.get
    tx.write = db.kv.set
    tx.del = db.kv.del

    // Copy-on-write for isolation
    tx.pages = make(map[uint64]*BNode)
}

func (db *DB) Commit(tx *DBTX) error {
    // Atomically write all changes
    for pgid, node := range tx.pages {
        db.kv.set(pgid, node.data)
    }
    return db.kv.Sync()
}
Enter fullscreen mode Exit fullscreen mode

The four pillars in action:

  • Atomicity: All-or-nothing commits
  • Consistency: Data validation at every step
  • Isolation: Copy-on-write for concurrent reads
  • Durability: Sync to disk on commit

The Most Challenging Parts 😅

1. Page Management

Managing disk pages and free space efficiently is harder than it looks. I implemented a free list to track available pages:

type FreeList struct {
    head uint64   // First free page
    total uint64  // Total free pages
}
Enter fullscreen mode Exit fullscreen mode

2. Concurrent Safety

Allowing multiple readers while maintaining consistency required careful lock management and copy-on-write semantics.

3. Cross-Platform mmap

Memory mapping works differently on Linux, macOS, and Windows. I ended up with platform-specific implementations:

  • filodb_mmap_unix.go - Linux/macOS
  • filodb_mmap_windows.go - Windows
  • filodb_mmap_darwin.go - macOS optimizations

What I Learned (And You Can Too) 🎓

Database Internals Aren't Magic

Once you understand B+ trees, page management, and transactions, databases become much less mysterious.

Go is Perfect for Systems Programming

  • Memory safety without garbage collection overhead
  • Excellent standard library (golang.org/x/sys)
  • Great tooling for performance analysis
  • Simple concurrency model

Performance Comes from Architecture

The biggest performance gains came from:

  1. B+ tree design (O(log n) operations)
  2. Memory-mapped I/O (zero-copy access)
  3. Page-based storage (disk-friendly)
  4. Efficient locking (minimal contention)

Documentation Matters

I spent almost as much time on documentation as code. The result? A project that others can actually learn from.

Try FiloDB Yourself 🚀

Want to explore database internals? FiloDB is perfect for learning:

# Clone and build
git clone https://github.com/sharvitKashikar/FiloDB.git
cd FiloDB
go build -o filodb

# Create your first table
./filodb
> create
Enter table name: users
Enter columns: id,name,email
Enter types: 1,2,2  # INT64, BYTES, BYTES
Enter fullscreen mode Exit fullscreen mode

What's Next? 🔮

FiloDB is just the beginning. I'm working on:

  • REST API for remote access
  • Query optimization for better performance
  • Replication for high availability
  • JSON queries for modern applications

The project is live on Peerlist where I'm sharing the development journey, and I'd love your feedback!

Resources & Links 📚

What would you build if you wanted to understand how technology really works?

Drop a comment below - I'd love to hear about your "build it from scratch" projects!


Tags that describe this project:
🏷️ #golang #database #opensource #tutorial #performance #systems #buildInPublic #DatabaseInternals

If you're interested in:

  • Database internals and B+ trees
  • Systems programming in Go
  • Building projects from scratch
  • Performance optimization
  • Open source development

Then this project is for you! Give it a ⭐ on GitHub and let's connect!

Building FiloDB taught me that the best way to understand complex systems is to build them yourself. Sometimes the journey is more valuable than the destination.

Top comments (0)