The potential of AI to transform software development is undeniable, but what happens when you actually put it to the test? I decided to run a focused internal experiment using Claude 3.5 Sonnet embedded within the Windsurf IDE to build a small internal application, Scopic People.
The goal wasn’t to create a production-ready system, but to understand how AI could assist real developers under real constraints: limited time, basic requirements, and a constrained scope.
I also wanted to explore how prompting strategies, tooling setup, and task structure impacted development output and productivity.
The result? A ~90% reduction in development time compared to a traditional estimate of 80–100 hours of development time plus overhead.
In this post, I will walk you through the exact setup, the tools I used, how I structured the experiment, and the takeaways that shaped my conclusions.
Note: This article is powered by information from my whitepaper: AI-Powered Development: Promise and Perils
Tools I Used: Claude 3.5 Sonnet + Windsurf
To explore how AI could accelerate development, I paired Claude 3.5 Sonnet with Windsurf, a conversational IDE designed for prompt-based workflows.
Claude 3.5 Sonnet
I used Claude 3.5 Sonnet to generate code for frontend components, backend logic, authentication, and data integration. The model showed strong performance on structured tasks but was highly dependent on prompt clarity. Broad or vague instructions often led to inefficiencies or looping behavior.
Windsurf IDE
Windsurf served as the development environment, enabling inline prompting and output management directly in the codebase. The platform supported structured workflows, allowed quick iterations, and minimized context switching - key factors in our time savings.
The Setup and Process
I approached the project as a greenfield build - starting from scratch with no existing code. The tool was developed in vanilla PHP with no frameworks, using Windsurf and Claude 3.5 Sonnet exclusively.
My process was structured around iterative prompting:
Tasks were broken into small steps.
Natural language instructions were entered via Windsurf’s Cascade interface.
AI-generated code was reviewed and either accepted or refined.
Every accepted change was committed to Git, enabling version control and easy rollback.
This cycle continued until the entire tool was completed, including authentication, UI, role-based access, caching, and database containerization.
The Results: Time, Output, and Intervention
After completing the development of Scopic People, I compared the results against traditional benchmarks to evaluate whether the AI-assisted workflow delivered real value.
I looked at 3 key areas: how much time was saved, the quality of the output, and where human developers still had to step in.
Time Savings
The traditional estimate for building Scopic People was 80–100 development hours, plus 80% overhead for planning, QA, and leadership - totaling approximately 144–180 hours.
Using Claude 3.5 and Windsurf, I completed the same scope in just 9 hours.
That’s a ~90% reduction in development time, and an estimated 75–80% overall productivity gain when factoring in reduced overhead.
Additionally, within the same amount of time I managed to add things beyond the original specs - such as database-driven admin access instead of hardcoded roles.
Code Quality & Final Output
Despite the time savings, code quality remained strong. The AI-produced code:
Met all defined requirements
Followed logical structure and good abstraction
Was readable, functional, and extensible
Where I Still Had to Step In
While the AI generated most of the code, human oversight was essential. I intervened to:
Break complex tasks into smaller prompts
Refine instructions when Claude entered repetition loops
Manually explore the Zoho People API and provide endpoint info for integration
Decide when to skip AI prompts and implement small changes manually
The most efficient approach proved to be a hybrid one: letting AI handle structure, boilerplate, and logic - but stepping in for fine-tuning or domain-specific decisions.
Was It Worth It?
Yes - under the right conditions.
Claude 3.5 Sonnet significantly accelerated development, but only when used with clear, structured prompts and frequent review. Success wasn’t about letting AI take over - it was about how I worked with it.
What I found:
Vague instructions led to confusion or looping
Specific, step-by-step prompts yielded fast, accurate output
Direct manual edits were sometimes faster for small tweaks
Used properly, AI was not a replacement - but a powerful collaborator that amplified developer productivity.
Conclusion: What I’d Recommend to Other Teams
This experiment wasn’t meant to replace traditional development. It was a proof of concept for how AI tools can streamline workflows when used thoughtfully.
Key takeaways from the experiment:
Break work into discrete tasks – large prompts overwhelm LLMs
Review each iteration – catch issues early
Use version control – recover easily from errors
Don’t force AI into every decision – edit manually where faster
Choose the right tools – Windsurf + Claude 3.5 made prompting seamless
For teams testing AI in development, start with contained, well-scoped projects. The biggest gains came not from raw AI output, but from structured workflows that paired AI capabilities with human judgment.
See what actually worked (and what didn’t) when I used AI to build a real app - prompts, time savings, tools, and all.
Check out the whitepaper!
Top comments (0)