Boost Your Coding Agent and Understand Its Reasoning with 3 Simple Prompts

12 min read5 days ago

TL;DR: Use the custom prompts from this article and the linked repository to have the agent (1) plan, (2) implement, and (3) review any code before considering it complete.
These are straightforward, proven client-side prompt engineering techniques. This approach consistently improves results, regardless of the LLM used.
You’ll need to apply them manually, each in a separate conversation. Some LLM Agents already use these techniques, but most don’t — if you’re using one that does, you’re probably already aware of it.

Current landscape of LLM Coding Agents

If you’ve recently used any coding agent — whether in an IDE like Cursor or cloud-based like Codex — you’ve likely noticed significant variability in the agent’s output. Sometimes they nail it, providing effective solutions effortlessly. Yet, when tackling complex tasks, they often get bogged down, overcomplicating solutions unnecessarily.

In recent months, we’ve witnessed a proliferation of cloud-based LLM agents marketed as intelligent development copilots. However, in practice, most are mere wrappers around foundational models like Claude or GPT-4. They primarily offer integrations with tools such as GitHub, Slack, or Notion, coupled with a sleek UI, but add minimal actual intelligence to the core functionality. These “Claude-in-the-cloud” agents rely entirely on the base models for reasoning and execution, using the orchestration layer primarily for convenience rather than enhancing capabilities. Essentially, they’re a cloud version of the Claude that is already present in your favourite IDE.

In order to provide better solutions, these agents should adopt proven prompt engineering strategies, like those implemented by open-source projects such as SWE-agent and Refact.ai. Client-side prompting techniques used by these advanced systems ensure two key outcomes: (1) the correct problem is identified, and (2) the solution is implemented effectively. Employing even a basic manual version of these techniques with readily available agents provides valuable insight into the high-level workings of LLMs, significantly enhancing the quality of the output.

It’s really fascinating how using these super simple steps results in better outputs immediately. I expect that the workflow I’m detailing in this article will soon become a de facto standard, widely adopted by all the major players, and second nature to all developers using LLM Agents for work. If you’ve used LLM Agents already, you might have noticed that these steps work nicely on their own as well — but since each step gives the LLM a chance to improve the code, combining them leads to the best possible results. I’m pretty sure that once LLMs get cheap enough, even more complex algorithms will become standard offerings from major providers.

Usual flow nowadays: a single LLM conversation handles everything.

Context: LLMs as pseudo-humans

LLMs are trained on a large knowledge base of human texts. This makes them “behave” a lot like humans when solving problems: misunderstanding the task, getting lost in the details, cutting corners, etc. Luckily, we as humans have already solved these problems — it’s called a software development team. Multiple people working together on a problem. For example, it’s no wonder why code reviews work so well even when the reviewer has about the same expertise as the coder: we’re only human, and we make mistakes that another set of eyes can often catch. Now that we (humans) have created machines that mirror our way of thinking, it’s time to make them solve problems in a similar fashion. Some of this can be handled directly on the Agent level — but most of the time, a fresh set of LLM eyes is more effective. (This could change in the future, and I expect that a lot of research is already focused on solving this problem. What I’m talking about is something developers can do today to make better use of Coding Agents.)

Some Existing Papers

Here are some recently published papers that all introduce strong approaches to client-side prompting techniques:

Self-Refine: Iterative Refinement with Self-Feedback (2023)
Presents Self-Refine, an approach where the LLM first generates an answer, then critiques its own output and iteratively refines it. The same model produces both the feedback and the revised answers across multiple rounds. This method requires no additional training or data — just the model itself at inference time. Across seven tasks (ranging from dialogue to math), Self-Refine improved solution quality by ~20% on average, even boosting GPT-4’s performance.

Self-Consistency Improves Chain-of-Thought Reasoning (2023)
Proposes a decoding strategy called self-consistency: instead of relying on a single chain of thought, the model samples multiple reasoning paths and selects the most consistent answer. This ensemble-of-thought approach significantly boosts reasoning accuracy on math and commonsense tasks (e.g., +17.9% on GSM8K).

Tree of Thoughts: Deliberate Problem Solving with LLMs (2023)
Introduces Tree-of-Thoughts (ToT), a framework that generalizes chain-of-thought prompting into a tree search. The LLM explores multiple reasoning branches, self-evaluates them, and can backtrack or look ahead as needed. This deliberate search strategy dramatically improved performance on tasks requiring planning (e.g., solving 74% of “Game of 24” puzzles compared to 4% with standard CoT).

There’s a lot more research out there related to LLMs and problem solving — these are just the tip of the iceberg. But as you can see,

there’s a recurring theme in all of them: use multiple LLMs, make them work together like a team, and aim for consensus.

This can be achieved on different levels and in different ways, but the result always seems to be the same: more LLMs (and more computation) lead to better-quality outputs.

Some existing tools using the mentioned techniques

SWE-Agent
This is the only open-source framework for solving programming tasks that I’m going to mention here, but if you take a look at the top contenders on https://www.swebench.com/, you’ll find plenty more. These frameworks use a lot of the client-side prompting techniques mentioned earlier (and even add more!) to get the best results possible for programming-related tasks. It’s no surprise that LLMs on their own don’t take the top spots in these benchmarks — virtually all top-performing systems rely heavily on a framework in addition to an agent to get the job done the best way possible.

Refact.ai
Refact.ai recently took the top spot on SWE-bench, and it also offers plugins for VSCode and other IDEs. It’s great! This is exactly the kind of tooling that shows how much more effective LLMs can be when they’re guided by a structured framework. It’s not just about asking the model to code — it’s about orchestrating the entire workflow: planning, editing, reviewing, and iterating. That’s where the real performance gains come from.

Umm… so?

Despite their proven effectiveness, surprisingly few developers or companies are actively integrating these client-side prompting techniques into their workflows. One possible reason is that major tech companies focus heavily on pushing easy-to-use tools broadly into every developer’s hands, often prioritizing accessibility over optimal performance. Moreover, these advanced prompting approaches usually require additional API calls, increasing both execution time and cost significantly if billed per usage.

We’ve already established that using client-side prompting techniques definitely improves the quality of LLMs in programming tasks. To use these techniques, one has to set up separate open-source tools, which takes time and effort — not to mention the additional cost of using LLMs via API for these more expensive computations.

I did all that, then realized that I simply enjoy IDE-based workflows more. Having a flat monthly fee for Cursor — and me being a cheapskate — also motivated me to get better programming results using just my IDE. So I came up with a much simpler workflow based on the previously mentioned, widely adopted techniques. This approach can be easily adapted to multiple repos, IDEs, or LLM Agents, and is overall just much simpler to use.

Improved flow: each step is a separate LLM conversation.

The 3 steps:

1. Plan

You are a world-class software engineer with decades of experience. You are given a task that is related to the current project. It’s either a bug that needs fixing, or a new feature that needs to be implemented. Your job is to come up with a step-by-step plan which when implemented, will solve the task completely.
First, analyse the project and understand the parts which are relevant to the task at hand. Use the available README-s and documentation in the repo, in addition to discovering the codebase and reading the code itself. Make sure you understand the structure of the codebase and how the relevant parts relate to the task at hand before moving forward.
Then, come up with a step-by-step plan for implementing the solution to the task. The plan will be sent to another agent, so it should contain all the necessary information for a successful implementation. Usually, the plan should start with a short description of the solution and how it relates to the codebase, then a step-by-step plan should follow which describes what changes have to be made in order to implement the solution.
Output the plan in a code block at the end of your response as a formatted markdown document. Do not implement any changes. Another agent will take over from there.
This is the task that needs to be solved: {{Add your task description here.}}

This step mirrors the human software development workflow — even if we don’t consciously realize it, we usually have an idea of the solution or come up with a plan before actually starting to code. This step makes that explicit — it allows the human developer to intervene early if the Agent’s plan seems completely off. If you often run into the issue of the plans being totally wrong, I’d suggest giving the AI Agent more context about the codebase — see the section at the end of the article.

In any case, the plan the Agent comes up with should be good enough in most cases, and by making the Agent think about this before starting to code, we ensure it doesn’t get lost in the details early on — a common source of bad solutions.

2. Code

You are a world-class software engineer with decades of experience. You are given a task that is related to the current project, which is either a bug that needs fixing, or a new feature that needs to be implemented. You are also given a step-by-step plan created by another expert in the field which describes a solution for the task in detail. Your job is to use the step-by-step plan to provide a comprehensive solution to the task at hand.
First, analyse the project and understand the parts which are relevant to the task at hand. Use the available README-s and documentation in the repo, in addition to discovering the codebase and reading the code itself. Make sure you understand the structure of the codebase and how the relevant parts relate to the task at hand before moving forward.
Then, implement the step-by-step plan by following the steps described there. Make sure to implement all the necessary changes described in the plan in order to solve the original problem. Before considering your job complete, make sure that the task is in fact solved fully, completely and without any bugs or issues. Do not commit your changes to git. Another agent will take over from here.
This is the task that needs to be solved: {{Add your task description here.}}
This is the step-by-step plan provided by an expert: {{Add the output of the previous step here.}}

This is a straightforward step, and one that’s mostly expected from the Agent. Make sure this prompt is used in a separate conversation from the previous step, as the thinking contents of the earlier step can influence the implementation of the plan — if something goes weird in the thinking step, it can cause issues in the output. More tokens in the conversation also generally make LLMs perform worse.

3. Review

You are a world-class software engineer with decades of experience. You are given a task that is related to the current project, which is either a bug that needs fixing, or a new feature that needs to be implemented. The codebase also contains uncommitted code changes created by other experts that should solve the task at hand. Your job is to review the uncommitted code, fix all issues, and make sure it is ready to merge.
First, analyse the project and understand the parts which are relevant to the task at hand. Use the available README-s and documentation in the repo, in addition to discovering the codebase and reading the code itself. Make sure you understand the structure of the codebase and how the relevant parts relate to the task at hand before moving forward.
Then, do a code review on the uncommitted changes in the repo. Be ruthless and raise all concerns.
Then, fix all the concerns you raised in order to finish the implementation of the task. Before considering your job complete, make sure that the task is in fact solved fully, completely and without any bugs or issues.
When you are done with the changes, create a short list that summarizes the problems you found and the fixes you added to the code.
This is the task that needs to be solved: {{Add your task description here.}}

Interestingly, this step often turns out to be surprisingly beneficial. Most of the time, it will catch issues — sometimes just minor ones related to edge cases, but occasionally real bugs that need to be addressed. Make sure this is also a new LLM conversation — a fresh set of eyes helps ensure the implementation and thinking steps don’t influence the code review.

This was 100% correct. If only Claude knew who wrote the code in the first place…

This approach might seem a bit pedestrian at first

…and yeah, in a way, it is. You’re basically just copy-pasting outputs across different chat windows and fixing issues as they come up.

But it works. Surprisingly well. And if you’re using an IDE with a flat monthly subscription, it’s also super economical.

Right now is the perfect time to do this.

AI Agents out there vary wildly — some already automate parts of this and even get better results, but those usually run on API-based pricing, which gets expensive fast. The tools most people use today offer unlimited interactions for a fixed price. So for those users, adding these simple steps into the workflow can seriously boost the Agent’s performance — without costing anything extra.

Eventually, I’m sure these kinds of techniques will be baked right into the big platforms. But for now, the focus seems to be on making these tools widely accessible, not necessarily performant. That’s why this three-step workflow is such a solid upgrade — it’s simple, effective, and gives you a taste of where AI coding tools are headed.

Could not resist generating an AI image in an article about AI.

+1 tip:

To work effectively with LLM agents — whether local or cloud-based — it is critical to provide them with context. One of the most effective ways to do this is by including an AI-README.md file in your repository. This document serves as a high-level guide tailored specifically for AI agents, outlining the structure, conventions, and critical background knowledge they need to operate reliably within your codebase.

For frontend projects, this should include the folder structure, state management architecture (e.g., Redux, Zustand, or React Context), styling conventions (e.g., Tailwind, CSS Modules, or Emotion), and any design system in use. For backend codebases, it should cover database schema and access layers, preferred design patterns, service architecture, and authentication flows. Most importantly, the AI-README.md must clearly define the mandatory checks that every feature or bugfix must pass before it can be considered complete — such as unit test coverage, linting rules, integration with CI/CD, or adherence to API contracts.

By anchoring the agent’s understanding with this document, you create a foundation for far more reliable planning, implementation, and review stages. It also dramatically reduces hallucination risk and improves the agent’s ability to follow implicit team norms that aren’t otherwise visible in the code alone. Think of it as giving your junior dev their onboarding packet — only this one speaks JSON.

Hope this was an interesting read, and you managed to improve your LLM Agent — or at least learned something new in the process. Let me know if I missed anything either here or at the GitHub repo, and feel free to clap this article or star the repo to help the algorithms spread the word. : ) Happy coding!

I got the cover photo from here: https://willbl.com/land-of-giants/ Amazing photography!

ITNEXT