For the past few weeks, I’ve been using Cursor AI to help me build a mobile-first app with a brand new stack. I went in optimistic, ready to lean into the vibe coding workflow I’ve been exploring in this series.
In practice, it hasn’t felt magical. Not yet.
To be fair, I’ve thrown a lot at Cursor. I’m working with React Native, Expo, Supabase, and GraphQL—all technologies that are new to me (well, GraphQL not so much). So I expected a learning curve. But I didn’t expect to feel like I was constantly fighting the assistant meant to help me.
Every little thing I try to implement turns into a negotiation. Sometimes the AI nails it. But more often, it veers off course. For example, I’ve had to repeatedly remind Cursor that this is a mobile app—on multiple occasions, it generated web-only code. When I asked it to refactor utility functions into a new module, it moved the code but forgot to update all the references. These aren’t rare slip-ups. They’re part of my daily workflow.
I'm using "Auto" for the model, so I can’t tell if I’m hitting the limits of the model, not being precise enough in my prompts, or just not giving the right context. And maybe that’s the real issue: vibe coding seems to require a certain prompting fluency that I haven’t yet mastered.
That’s not to say the approach is flawed. In fact, I’ve seen others get fantastic results with it—much better than what I’m managing. When it works, it really does feel like the future: scaffolding components, wiring up queries, producing clean type definitions. But the inconsistency has left me wondering whether I’m getting in my own way.
So I’m not ready to call vibe coding overrated. What I will say is that it’s not effortless, and it’s not plug-and-play. At least not for me. It requires iteration, vigilance, and a deep enough understanding of your tools to know when the AI is leading you astray.
I’m still hopeful. Maybe with more practice—and better prompting—I’ll start seeing the magic. Until then, vibe coding feels less like autopilot and more like a bumpy ride with an eager co-pilot who sometimes grabs the wrong controls.
Top comments (4)
We are finding the same. The LLM gets pretty confused once the project gets complex, and you need lots of hacks to increase the context window so it knows more about the project.
Can I ask if you have been using multi agent instances to cross check and supervise the coding?
No I haven't even thought about that tbh. Can you elaborate?
I will say that supervision is absolutely needed though. Just last night I asked for the agent to implement a new graphql resolver, following patterns established in other units. The generated output implemented an interface, which decidedly did not fit the pattern of the other resolvers, and that caused all kinds of type errors scattered throughout the code. It wasn't a huge deal to fix, but if I had not been paying attention, it could have been problematic later.
Sure, happy to elaborate on my experience. My partner is working on a highly complex project that the context window can't fully digest so the AI gets a full picture of the project. Therefore, he uses one instance of an AI model to keep the bigger picture and prepares instructions, reviews the work, and updates the documentation when a task is complete. The Supervisor instructs an Executor instance of a model, that proceeds to undertake the specific task on hand, without blowing through the context window too quickly. It is also a review and bug fix layer as part of the development, which helps with reducing the common errors in vibe coding.
Oh ok I think that makes more sense. I assume the Supervisor model is outside of Cursor? And that both models have full access to the source/repository?
I have started a workflow that somewhat resembles that. I've been talking to ChatGPT about this project for a month before the first line of code was written, and then have continued to engage it during this exercise. So it has all of the historical context of the project overall (well, scattered through various chats). So I am using that to create implementation plans: define features and subtasks, acceptance criteria, etc. I will then ask ChatGPT to generate a prompt for cursor, to achieve/implement a given subset of subtasks. I literally started that like two days ago, and the output in Cursor has been improved.
However in my case, my repo is private, so none of the LLM's can use it for context directly outside the ide/Cursor.