Week 37: Testing whether better prompts actually matter

Brendan McNulty

Published Sep 10, 2025

(and why I've been overthinking my AI conversations)

The Experiment

I've become completely systematic about prompt engineering. While most people wing it with AI tools, I've built a ChatGPT project dedicated to crafting perfect prompts. Here's how nerdy this gets: I'll feed my rough question into my prompt optimization framework (based on numerous guides), let it ask me clarifying questions until we've refined everything, then use that polished prompt. I even have a separate system for image prompts. Peak optimization/nerd behaviour, I know.

I also have custom instructions set up in most of my AI tools. The kind of detailed preferences that tell Claude exactly how I like my responses structured, what tone to use, and what background context to consider. Between the prompt guidelines and the custom instructions, I've basically built a whole system around getting better AI responses.

But what's been bugging me: am I actually getting better results from all this extra effort? Or am I just making myself feel more in control while getting the same mediocre outputs I'd get from typing "help me with this thing"?

So I decided to test it properly. I wanted to figure out whether spending time on prompt engineering actually delivers better results, or if custom instructions can do the heavy lifting for lazy prompts.

The Process

I designed what felt like a proper scientific experiment 🧪:

Four different approaches:

Basic prompt - Just the bare minimum question, no fancy formatting
Improved prompt - The full treatment from my prompt guidelines project: context, structure, specific requirements, format specifications
Improved prompt + Custom instructions - My systematic prompt framework PLUS my detailed custom instructions
Custom instructions + Basic prompt - Testing whether good instructions can save lazy prompting

Three different types of questions to test:

Technical analysis: "Why running AI models locally matters more than cloud APIs"
Strategic thinking: "Design a business model avoiding subscription fatigue and VC dependence"
Creative exploration: "Why AI might make humans more creative"

I figured if prompt engineering really mattered, I'd see consistent improvements across different types of thinking.

The Outcome

The results were clearer than I expected—and honestly, a bit surprising.

The winner by a mile: My systematic prompt guidelines. They scored an average of 81% across all three question types, crushing the basic prompts (62%) and even slightly outperforming the "everything optimized" approach (73%).

Here's what really stood out: the improved prompt handled technical questions brilliantly, delivering structured analysis with actual metrics (like "50-150ms local vs 600-1200ms cloud latency"). For the business strategy question, it created a comprehensive blueprint with detailed revenue streams and a clear implementation roadmap. Even for the creative question about AI and human creativity, it produced a well-researched essay with psychological foundations.

Custom instructions were... fine. When paired with improved prompts, they actually slightly decreased performance. When paired with basic prompts, results were all over the place—great for creativity (80%) but weak for business analysis (65%).

Recommended by LinkedIn

Fine-Tuning, Prompt Engineering, and Embeddings: How…

Przemek Majewski 1 year ago

Why GenAI is more Dr. Watson and less Sherlock Holmes

📊 Dan Maycock 5 months ago

Your AI outputs are embarrassing you (and costing you…

Bryan Blair 1 month ago

The technical question showed the biggest improvement gap: 18 points between basic and improved prompts. Apparently, analytical tasks really do benefit from clear structure and specific requirements.

What This Actually Means

I went into this expecting that custom instructions + improved prompts would be the clear winner. More optimization = better results, right? But it turns out that good prompting technique trumps everything else.

The improved prompts worked because my guidelines framework does three things consistently:

Clear structure: Instead of "tell me about X," they asked for specific frameworks and comparisons
Evidence requirements: They explicitly requested metrics, examples, and concrete data
Format guidance: They specified how to present the information (executive summaries, comparison tables, implementation steps)

Custom instructions, while handy for setting general preferences, couldn't compensate for vague or poorly structured questions. And when layered on top of already-good prompts, they seemed to add unnecessary complexity without much benefit.

Key Takeaway

Stop overthinking the system and start improving the conversation. Well-crafted individual prompts consistently outperform complex instruction layering or elaborate setups. The time you spend writing clearer, more specific questions pays off immediately.

Pro Tips for Beginners:

Structure your asks: Instead of "What do you think about X?" try "Compare X and Y across these three dimensions: [specific criteria]"
Request specific formats: "Give me an executive summary, then detailed analysis, then three actionable recommendations" works better than hoping the AI guesses your preferred structure
Ask for evidence: "Include specific metrics and examples" gets you concrete, useful responses rather than generic overviews
Don't layer complexity: One well-written prompt beats multiple optimization layers. Start with clear communication before adding bells and whistles
Match effort to task type: Spend more time structuring prompts for analytical questions, less for creative exploration

(or be lazy and use the files I used to set up my prompt project so you can do it all quickly)

What's Next?

I'm relooking at my custom instructions and testing what will give me better feedback. One thing I really like to have an educated opinion in my responses, but I'm going to tweak further to see what I like best (and hopefully get a better response)

Want to Try It Yourself?

Pick a question you'd normally ask an AI tool
Rewrite it with specific structure, evidence requirements, and format guidance
Compare the results to your usual approach
Focus on clarity over complexity

52 AI experiments

462 followers

+ Subscribe

2 Comments

Dagmar Timler

2mo

Glad to see that my no custom instructions approach with ChatGPT seems to be okay. But I am in favour of Cursor rules. Same concept but slightly different intent

LinkedIn respects your privacy

Week 37: Testing whether better prompts actually matter

Brendan McNulty

The Experiment

The Process

The Outcome

Recommended by LinkedIn

What This Actually Means

Key Takeaway

Pro Tips for Beginners:

What's Next?

52 AI experiments

462 followers

More articles by Brendan McNulty

Others also viewed

Real Time Lessons from DeepSeek's Disruption of the Market Landscape

How AI Tools Differ from Human Tools

Beyond AI Playtime: The Shift Toward Vertical Intelligence

Understanding RAG, AI Agents, and Agentic RAG

The Mysteries of the "AI Employee" Conundrum and the 80/20 Rule

Super Prompts: Creativity and Collective Intelligence with Advanced Prompt Engineering

Why Causal AI is the Missing Link in Building Truly Agentic AI

What Yale MBAs Don't Know About AI Prompting (But Should)

5 Common AI mistakes organizations make - and what to do instead

Why you should design and implement Humans-in-the-loop (HITL) AI systems

Explore content categories

The Experiment

The Process

The Outcome

Recommended by LinkedIn

What This Actually Means

Key Takeaway

Pro Tips for Beginners:

What's Next?

52 AI experiments

462 followers

More articles by Brendan McNulty

Week 48: Claude Code vs Google…

Week 47: Building an AI experiment…

Week 45: Downloading an AI companion

Week 44: Making newsletter merch with AI

Week 43: Building an AI purchase intent…

Week 42: Using AI to find a trail run

Week 41: Using AI to analyze marketing…

Week 40: Using AI as my gym coach

Week 39: When your AI assistant…

Week 38: Building South Africa's…

Others also viewed

Real Time Lessons from DeepSeek's Disruption of the Market Landscape

How AI Tools Differ from Human Tools

Beyond AI Playtime: The Shift Toward Vertical Intelligence

Understanding RAG, AI Agents, and Agentic RAG

The Mysteries of the "AI Employee" Conundrum and the 80/20 Rule

Super Prompts: Creativity and Collective Intelligence with Advanced Prompt Engineering

Why Causal AI is the Missing Link in Building Truly Agentic AI

What Yale MBAs Don't Know About AI Prompting (But Should)

5 Common AI mistakes organizations make - and what to do instead

Why you should design and implement Humans-in-the-loop (HITL) AI systems

Explore content categories