DEV Community

Cover image for I Tried Vibe Coding a Semantic Layer From Scratch
Yashodhan Singh
Yashodhan Singh

Posted on

I Tried Vibe Coding a Semantic Layer From Scratch

Before this project, we already had reporting—dozens of fixed reports written in raw SQL. But our users kept coming back with one-off requests:

“Can I see this by region instead of by customer?”

“Can we compare Q1 vs Q2 just for this segment?”

“What about filtering this down to fleet customers only?”

Each request meant hand-tweaking SQL, validating the output, and adding yet another fork to our growing report list.

Our team is lean. We don’t have many senior engineers. And honestly, I had never done data engineering before.
I barely knew what “grain” meant, let alone how to design a scalable reporting system.

But I was determined to solve the problem.
And I had a secret weapon: long conversations with AI.


“Isn’t This Already Solved?”

In theory? Yes.
Big BI vendors offer semantic layers. Big tech has internal tools. There are open source solutions like Cube.js and Apache Superset.

But those solutions assume:

  • A team of experienced data engineers
  • Complex org-wide needs across many datasets
  • Dedicated data modeling cycles

We had none of that.

What we had was:

  • A well-scoped schema
  • A stream of custom user requests
  • And the need for a solution with a low learning curve and high adaptability

So I decided to build something tailored—simple, durable, and easy to own by anyone, not just me.


Phase 1: Learning the Terrain with AI

To get started, I did what I always do when entering unfamiliar territory: I asked a lot of questions.

I fed AI:

  • The full schema
  • All business-defined metrics and dimensions
  • Role-based access rules and exceptions
  • Legacy SQL logic from old reports

I asked about:

  • How to resolve join paths safely
  • What makes certain metrics non-additive
  • What is a non-additive metric
  • When fan-out happens and how to prevent it
  • Whether a derived metric should be nested or flattened

And I got answers that made sense. Not code—but conceptual clarity.

That’s how I went from knowing almost nothing about semantic layers…

To understanding the building blocks of semantic architecture.


Phase 2: Vibe Coding My Way into Failure

Armed with that new understanding, I did what most devs would do: I built fast.

  • Prompt-driven join builders
  • Schema-agnostic query engines
  • Filters and dimensions as abstract objects

It felt powerful—until it wasn’t.

cte prompt

prompt6

Soon, I hit:

  • Grain mismatches that caused inflated metrics
  • Derived KPIs breaking due to missing dependencies
  • Filters applied too early, too late, or not at all

It was clever code. But unstable.

Great for a demo. Bad for production. Worse for handoff.

I'd gladly accept if vibe coding was a skill issue I can improve but I want to be sure if it's a skill issue or I was chasing shadows.


Phase 3: Burning It Down and Rebuilding with Contracts

So I reset. From scratch. Again.

This time:

  • Measures explicitly declare their grain and aggregation
  • Dimensions define their valid join paths and fact compatibility
  • Filters are parsed in context—validated before execution
  • Derived metrics rely on dependency graphs and outer aggregations
  • Security rules are baked into the query structure by default

I wrote:

  • A join resolver for composing minimal, valid paths
  • A filter parser that understands nesting and table origin
  • A query compiler that builds safe, predictable, performant SQL

Now, AI was my code companion:

  • Helping draft utility functions
  • Validating logic edge cases
  • Explaining odd query plan behaviors

It wasn’t building the system.

I was.

But it helped me learn faster and write better.


Where Things Stand Now

The system is still in development—but the foundation is strong.

Once done, the layer will:

  • Accept any compatible dimensions + metrics
  • Build dynamic, grain-safe SQL with reusable CTEs
  • Enforce multi-tenant access at the query layer
  • Return compatibility metadata for frontend UI guards

It’s not fully feature-complete yet—but the hardest parts are solved:

  • ✅ Data correctness
  • ✅ Structural safety
  • ✅ Extensibility
  • ✅ Teachability

The entire thing is built with handoff in mind—so that even a junior developer can understand it, extend it, and trust it.


What’s Next: The AI Chatbot — My Next Challenge

Once this layer is complete, I want to explore building an AI-powered chatbot on top of it.

Something that can take a question like:

“Compare RO sales growth by segment over the last two quarters.”

…and:

  • Parse the natural language
  • Map it to known dimensions, filters, and metrics
  • Validate the request
  • Run it safely through the semantic engine

But I know this will bring a whole new set of challenges:

  • Ambiguity in user intent
  • Fuzzy synonyms vs structured fields
  • Invalid requests that sound perfectly valid
  • Performance and security trade-offs in real time

In many ways, it’ll be like starting over—with unknown unknowns waiting at every turn.
And honestly? I’m looking forward to it.


Closing: From Blank Slate to Ownership

I started this project with zero background in data engineering.

What I had was:

  • Curiosity
  • A well-scoped schema
  • Real pain points
  • And AI as my co-pilot

Along the way, I learned to:

  • Respect grain
  • Handle fan-out
  • Design for clarity
  • Build with the next developer in mind

It’s still in progress.
But now I see the shape of the system, and I know how to complete it.

From vibes → validation → vision—with a whole lot of learning in between.
That’s what this project gave me.

Top comments (0)