Ever wondered how databases actually work under the hood? I spent months building one from scratch to find out. Here's my journey and what I discovered.
The "Why" Behind FiloDB 🤔
Like many developers, I used databases daily but never really understood the magic happening inside. SQLite, PostgreSQL, MySQL - they just worked. But how?
So I decided to build my own database engine in Go. Not a wrapper around existing libraries, not a simple key-value store, but a full relational database with:
- B+ Tree storage engine
- ACID transactions
- Memory-mapped I/O
- Concurrent reads
- SQL-like interface
- Cross-platform support
The result? FiloDB - a lightweight database that achieves 1,800+ operations per second with sub-millisecond latency.
The Architecture That Actually Works 🏗️
The B+ Tree Implementation 🌳
The heart of FiloDB is its B+ tree implementation. Unlike simple binary trees, B+ trees are optimized for disk storage:
type BNode struct {
data []byte // Raw node data
}
type BTree struct {
root uint64 // Root node page number
get func(uint64) BNode // Page getter
new func(BNode) uint64 // Page allocator
del func(uint64) // Page deallocator
}
Why B+ trees?
- O(log n) performance for search, insert, delete
- Sequential access for range queries
- Disk-friendly with configurable page sizes
- Cache efficient with high branching factor
Memory-Mapped I/O Performance 💾
Instead of traditional read/write system calls, FiloDB uses memory-mapped files:
// Platform-specific mmap implementation
func (kv *KV) ExtendMmap(npages int) error {
if kv.mmap.total >= npages*BTREE_PAGE_SIZE {
return nil
}
// Extend the memory mapping
chunk := int64(npages) * BTREE_PAGE_SIZE
kv.mmap.total = int(chunk)
kv.mmap.chunks = kv.mmap.chunks[:0]
return mmap_init(kv)
}
The performance impact is huge:
- Direct memory access - no system call overhead
- OS-level caching - let the kernel handle page management
- Simplified code - treat files like memory arrays
Real Performance Numbers 📊
I didn't just build it - I benchmarked it extensively:
=== FiloDB Performance Benchmark ===
Insert Performance: ~1,813 ops/sec
Query Performance: ~1,848 ops/sec
Storage Efficiency: 0.88 KB per record
Average Latency: <1ms per operation
How does it compare?
Database | Insert (ops/sec) | Query (ops/sec) | Use Case |
---|---|---|---|
FiloDB | ~1,800 ⚡ | ~1,850 ⚡ | Educational, Small Apps |
SQLite | 1,000-10,000 | 10,000+ | Embedded Applications |
PostgreSQL | 5,000-50,000 | 50,000+ | Production Applications |
Not bad for an educational database!
ACID Transactions: Theory Meets Practice 🔒
Implementing ACID compliance taught me more about databases than any textbook:
func (db *DB) Begin(tx *DBTX) {
tx.read = db.kv.get
tx.write = db.kv.set
tx.del = db.kv.del
// Copy-on-write for isolation
tx.pages = make(map[uint64]*BNode)
}
func (db *DB) Commit(tx *DBTX) error {
// Atomically write all changes
for pgid, node := range tx.pages {
db.kv.set(pgid, node.data)
}
return db.kv.Sync()
}
The four pillars in action:
- Atomicity: All-or-nothing commits
- Consistency: Data validation at every step
- Isolation: Copy-on-write for concurrent reads
- Durability: Sync to disk on commit
The Most Challenging Parts 😅
1. Page Management
Managing disk pages and free space efficiently is harder than it looks. I implemented a free list to track available pages:
type FreeList struct {
head uint64 // First free page
total uint64 // Total free pages
}
2. Concurrent Safety
Allowing multiple readers while maintaining consistency required careful lock management and copy-on-write semantics.
3. Cross-Platform mmap
Memory mapping works differently on Linux, macOS, and Windows. I ended up with platform-specific implementations:
-
filodb_mmap_unix.go
- Linux/macOS -
filodb_mmap_windows.go
- Windows -
filodb_mmap_darwin.go
- macOS optimizations
What I Learned (And You Can Too) 🎓
Database Internals Aren't Magic
Once you understand B+ trees, page management, and transactions, databases become much less mysterious.
Go is Perfect for Systems Programming
- Memory safety without garbage collection overhead
- Excellent standard library (
golang.org/x/sys
) - Great tooling for performance analysis
- Simple concurrency model
Performance Comes from Architecture
The biggest performance gains came from:
- B+ tree design (O(log n) operations)
- Memory-mapped I/O (zero-copy access)
- Page-based storage (disk-friendly)
- Efficient locking (minimal contention)
Documentation Matters
I spent almost as much time on documentation as code. The result? A project that others can actually learn from.
Try FiloDB Yourself 🚀
Want to explore database internals? FiloDB is perfect for learning:
# Clone and build
git clone https://github.com/sharvitKashikar/FiloDB.git
cd FiloDB
go build -o filodb
# Create your first table
./filodb
> create
Enter table name: users
Enter columns: id,name,email
Enter types: 1,2,2 # INT64, BYTES, BYTES
What's Next? 🔮
FiloDB is just the beginning. I'm working on:
- REST API for remote access
- Query optimization for better performance
- Replication for high availability
- JSON queries for modern applications
The project is live on Peerlist where I'm sharing the development journey, and I'd love your feedback!
Resources & Links 📚
- 🔗 GitHub: https://github.com/sharvitKashikar/FiloDB
- 📊 Peerlist: https://peerlist.io/sharvit/project/filodblightweight-relational-database-system
What would you build if you wanted to understand how technology really works?
Drop a comment below - I'd love to hear about your "build it from scratch" projects!
Tags that describe this project:
🏷️ #golang #database #opensource #tutorial #performance #systems #buildInPublic #DatabaseInternals
If you're interested in:
- Database internals and B+ trees
- Systems programming in Go
- Building projects from scratch
- Performance optimization
- Open source development
Then this project is for you! Give it a ⭐ on GitHub and let's connect!
Building FiloDB taught me that the best way to understand complex systems is to build them yourself. Sometimes the journey is more valuable than the destination.
Top comments (0)