Yashodhan Singh

Posted on Jun 4

I Tried Vibe Coding a Semantic Layer From Scratch

#softwareengineering #devjournal #vibecoding

Before this project, we already had reporting—dozens of fixed reports written in raw SQL. But our users kept coming back with one-off requests:

“Can I see this by region instead of by customer?”

“Can we compare Q1 vs Q2 just for this segment?”

“What about filtering this down to fleet customers only?”

Each request meant hand-tweaking SQL, validating the output, and adding yet another fork to our growing report list.

Our team is lean. We don’t have many senior engineers. And honestly, I had never done data engineering before.
I barely knew what “grain” meant, let alone how to design a scalable reporting system.

But I was determined to solve the problem.
And I had a secret weapon: long conversations with AI.

“Isn’t This Already Solved?”

In theory? Yes.
Big BI vendors offer semantic layers. Big tech has internal tools. There are open source solutions like Cube.js and Apache Superset.

But those solutions assume:

A team of experienced data engineers
Complex org-wide needs across many datasets
Dedicated data modeling cycles

We had none of that.

What we had was:

A well-scoped schema
A stream of custom user requests
And the need for a solution with a low learning curve and high adaptability

So I decided to build something tailored—simple, durable, and easy to own by anyone, not just me.

Phase 1: Learning the Terrain with AI

To get started, I did what I always do when entering unfamiliar territory: I asked a lot of questions.

I fed AI:

The full schema
All business-defined metrics and dimensions
Role-based access rules and exceptions
Legacy SQL logic from old reports

I asked about:

How to resolve join paths safely
What makes certain metrics non-additive
What is a non-additive metric
When fan-out happens and how to prevent it
Whether a derived metric should be nested or flattened

And I got answers that made sense. Not code—but conceptual clarity.

That’s how I went from knowing almost nothing about semantic layers…

To understanding the building blocks of semantic architecture.

Phase 2: Vibe Coding My Way into Failure

Armed with that new understanding, I did what most devs would do: I built fast.

Prompt-driven join builders
Schema-agnostic query engines
Filters and dimensions as abstract objects

It felt powerful—until it wasn’t.

Soon, I hit:

Grain mismatches that caused inflated metrics
Derived KPIs breaking due to missing dependencies
Filters applied too early, too late, or not at all

It was clever code. But unstable.

Great for a demo. Bad for production. Worse for handoff.

I'd gladly accept if vibe coding was a skill issue I can improve but I want to be sure if it's a skill issue or I was chasing shadows.

Phase 3: Burning It Down and Rebuilding with Contracts

So I reset. From scratch. Again.

This time:

Measures explicitly declare their grain and aggregation
Dimensions define their valid join paths and fact compatibility
Filters are parsed in context—validated before execution
Derived metrics rely on dependency graphs and outer aggregations
Security rules are baked into the query structure by default

I wrote:

A join resolver for composing minimal, valid paths
A filter parser that understands nesting and table origin
A query compiler that builds safe, predictable, performant SQL

Now, AI was my code companion:

Helping draft utility functions
Validating logic edge cases
Explaining odd query plan behaviors

It wasn’t building the system.

I was.

But it helped me learn faster and write better.

Where Things Stand Now

The system is still in development—but the foundation is strong.

Once done, the layer will:

Accept any compatible dimensions + metrics
Build dynamic, grain-safe SQL with reusable CTEs
Enforce multi-tenant access at the query layer
Return compatibility metadata for frontend UI guards

It’s not fully feature-complete yet—but the hardest parts are solved:

✅ Data correctness
✅ Structural safety
✅ Extensibility
✅ Teachability

The entire thing is built with handoff in mind—so that even a junior developer can understand it, extend it, and trust it.

What’s Next: The AI Chatbot — My Next Challenge

Once this layer is complete, I want to explore building an AI-powered chatbot on top of it.

Something that can take a question like:

“Compare RO sales growth by segment over the last two quarters.”

…and:

Parse the natural language
Map it to known dimensions, filters, and metrics
Validate the request
Run it safely through the semantic engine

But I know this will bring a whole new set of challenges:

Ambiguity in user intent
Fuzzy synonyms vs structured fields
Invalid requests that sound perfectly valid
Performance and security trade-offs in real time

In many ways, it’ll be like starting over—with unknown unknowns waiting at every turn.
And honestly? I’m looking forward to it.

Closing: From Blank Slate to Ownership

I started this project with zero background in data engineering.

What I had was:

Curiosity
A well-scoped schema
Real pain points
And AI as my co-pilot

Along the way, I learned to:

Respect grain
Handle fan-out
Design for clarity
Build with the next developer in mind

It’s still in progress.
But now I see the shape of the system, and I know how to complete it.

From vibes → validation → vision—with a whole lot of learning in between.
That’s what this project gave me.

DEV Community