Week 37: Testing whether better prompts actually matter
(and why I've been overthinking my AI conversations)
The Experiment
I've become completely systematic about prompt engineering. While most people wing it with AI tools, I've built a ChatGPT project dedicated to crafting perfect prompts. Here's how nerdy this gets: I'll feed my rough question into my prompt optimization framework (based on numerous guides), let it ask me clarifying questions until we've refined everything, then use that polished prompt. I even have a separate system for image prompts. Peak optimization/nerd behaviour, I know.
I also have custom instructions set up in most of my AI tools. The kind of detailed preferences that tell Claude exactly how I like my responses structured, what tone to use, and what background context to consider. Between the prompt guidelines and the custom instructions, I've basically built a whole system around getting better AI responses.
But what's been bugging me: am I actually getting better results from all this extra effort? Or am I just making myself feel more in control while getting the same mediocre outputs I'd get from typing "help me with this thing"?
So I decided to test it properly. I wanted to figure out whether spending time on prompt engineering actually delivers better results, or if custom instructions can do the heavy lifting for lazy prompts.
The Process
I designed what felt like a proper scientific experiment 🧪:
Four different approaches:
Three different types of questions to test:
I figured if prompt engineering really mattered, I'd see consistent improvements across different types of thinking.
The Outcome
The results were clearer than I expected—and honestly, a bit surprising.
The winner by a mile: My systematic prompt guidelines. They scored an average of 81% across all three question types, crushing the basic prompts (62%) and even slightly outperforming the "everything optimized" approach (73%).
Here's what really stood out: the improved prompt handled technical questions brilliantly, delivering structured analysis with actual metrics (like "50-150ms local vs 600-1200ms cloud latency"). For the business strategy question, it created a comprehensive blueprint with detailed revenue streams and a clear implementation roadmap. Even for the creative question about AI and human creativity, it produced a well-researched essay with psychological foundations.
Custom instructions were... fine. When paired with improved prompts, they actually slightly decreased performance. When paired with basic prompts, results were all over the place—great for creativity (80%) but weak for business analysis (65%).
Recommended by LinkedIn
The technical question showed the biggest improvement gap: 18 points between basic and improved prompts. Apparently, analytical tasks really do benefit from clear structure and specific requirements.
What This Actually Means
I went into this expecting that custom instructions + improved prompts would be the clear winner. More optimization = better results, right? But it turns out that good prompting technique trumps everything else.
The improved prompts worked because my guidelines framework does three things consistently:
Custom instructions, while handy for setting general preferences, couldn't compensate for vague or poorly structured questions. And when layered on top of already-good prompts, they seemed to add unnecessary complexity without much benefit.
Key Takeaway
Stop overthinking the system and start improving the conversation. Well-crafted individual prompts consistently outperform complex instruction layering or elaborate setups. The time you spend writing clearer, more specific questions pays off immediately.
Pro Tips for Beginners:
What's Next?
I'm relooking at my custom instructions and testing what will give me better feedback. One thing I really like to have an educated opinion in my responses, but I'm going to tweak further to see what I like best (and hopefully get a better response)
Want to Try It Yourself?
Glad to see that my no custom instructions approach with ChatGPT seems to be okay. But I am in favour of Cursor rules. Same concept but slightly different intent