Introduction
ReAct agents are tricky to implement correctly, and in this article I will show how to do it using a cybersecurity AI Agent example that can find vulnerabilities in any provided web target. Today I will explain:
- How to use tokens efficiently in ReAct agents
- How to force ReAct agents to use tools efficiently and not be too lazy
Subscribe to my Substack to not miss my new articles 😊
Theory
Before we jump to the implementation, let's first define what an AI Agent is and how I can build it.
AI Agent
An AI Agent is a hot architecture pattern with LLM, loop, and actions (tools) that the LLM can perform. The LLM is used here as a brain that can decide what to do as a replacement for regular code that software used previously to make decisions.
Traditional automation breaks when conditions change. Agents adapt. That's the real value - resilience, not just "LLM with extra steps."
There is a very basic AI Agent architecture:
- User executes AI Agent
- LLM makes its own decision to call some tool to perform an action
- Tool returns a result to LLM and allows LLM to perform a new decision. This process loops until the LLM decides that a result can be provided to the user or certain conditions are met.
- LLM produces a final result for the agent.
AI Agents have different patterns used to build them, and for me they are similar to classic software patterns like Factory, Singleton, or Strategy. But in this article I focused on the simplest one - the ReAct pattern.
ReAct Agent Pattern
ReAct is a pattern for AI Agents with these steps:
- Reason - LLM thinks about data or tool results
- Act - LLM calls tools to perform some actions
- Observe - LLM handles results of tool execution
- LLM provides final result
Project Requirements
To learn how to build a ReAct Agent, I decided to build a vulnerability scanning AI Agent. It will accept a web service URL as input and provide a vulnerability report as output.
I used this technology stack to build it:
- Python
- LangGraph
System Design
- User specifies a target to scan.
-
scan_node
asks LLM to reason about context and make a decision about the next step. -
scan_node
uses tools to perform Web Target scanning if LLM decides to do so. -
tools
scan Web Target. -
scan_node
callssummary_node
if LLM decides that no additional tool usage is required. -
summary_node
provides context about scanning results to LLM. - LLM produces summary output
So it's a basic ReAct pattern but with an extra node to perform summary generation. This approach produces a summary with higher quality rather than direct ReAct pattern result consumption.
Implementation
Short term memory - graph state
Code available on GitHub
To implement short term memory I used this graph state:
class ReActAgentState(MessagesState):
usage: ReActUsage
tools_usage: ToolsUsage
tools: Tools
results: Annotated[list[ToolResult], operator.add]
target: Target
Which contains:
-
tools_usage
- to track current tool usage and check if it doesn't exceed limits. -
usage
- to track the depth of graph recursion execution and prevent reaching limits. -
tools
- a dynamic list of tools that users can specify during graph execution, which makes this graph reusable. -
results
- list of tool execution results used to reduce LLM token usage.
In the ReAct agent implementation, we are allowing LLM to call tools and LLM should parse tool results to perform a reason step to decide what to do next. The problem with the default approach in LangGraph:
- Call tool.
- Receive tool result as a message.
- Perform reasoning
This history of tool executions is saved until the current execution of the graph reaches the end. This makes token usage enormously high. To reduce it, I decided to save current tool execution in the state.results
field and pass to LLM only when it's needed and not on each LLM call.
Common node - ReActNode
Code available on GitHub.
This node inherits from the common ReAct node which I introduced to omit duplicate work in the future:
system_prompt = """
You are an agent that should act as specified in escaped content <BEHAVIOR></BEHAVIOR>.
TOOLS AVAILABLE TO USE:
{tools}
TOOLS USAGE LIMITS:
{tools_usage}
TOOLS CALLING LIMITS:
{calling_limits}
PREVIOUS TOOLS EXECUTION RESULTS:
{tools_results}
<BEHAVIOR>
{behavior}
</BEHAVIOR>
"""
class ReActNode[StateT: ReActAgentState](ABC):
def __init__(self, llm_with_tools: Runnable[LanguageModelInput, BaseMessage]):
self.llm_with_tools = llm_with_tools
def __call__(self, state: StateT) -> dict:
prompt = system_prompt.format(
tools=json.dumps(state["tools"].to_dict()),
tools_usage=json.dumps(state["tools_usage"].to_dict()),
calling_limits=json.dumps(state["usage"].to_dict()),
tools_results=json.dumps([r.to_dict() for r in state.get("results", [])]),
behavior=self.get_system_prompt(state),
)
system_message = SystemMessage(prompt)
res = self.llm_with_tools.invoke([system_message])
logging.debug(
"[ReActNode] Executed LLM request: state = %s, response = %s", state, res
)
return {"messages": [res]}
@abstractmethod
def get_system_prompt(self, state: StateT) -> str:
pass
The StateT
type is generic, which means that any subclass can specify a custom state used in it, which makes this node very flexible. In the __call__
method I'm building a prompt that controls tool usage. Since the tools
field is dynamic, I don't need to hardcode the tool usage guide inside my system prompt because I can generate it dynamically, based on the tools
field state.
Subclasses of the ReActNode
class should implement the get_system_prompt
method which should return a node-specific prompt, but the subclass shouldn't care about common things implemented in the ReActNode
class.
Core of the system - scan_node
Code available on GitHub.
The scan_node
node is a subclass of the ReActNode
class:
from typing import override
from langchain_core.language_models import LanguageModelInput
from langchain_core.messages import AIMessage, BaseMessage, SystemMessage
from langchain_core.runnables import Runnable
from agent_core.node import ReActNode
from scan_agent.state import ScanAgentState
SCAN_BEHAVIOR_PROMPT = "Omitted for simplicity. Full prompt available on GitHub."
class ScanNode(ReActNode[ScanAgentState]):
def __init__(self, llm_with_tools: Runnable[LanguageModelInput, BaseMessage]):
super().__init__(llm_with_tools=llm_with_tools)
@override
def get_system_prompt(self, state: ScanAgentState) -> str:
target = state.get("target", {})
target_url = getattr(target, "url", "Unknown") if target else "Unknown"
target_description = (
getattr(target, "description", "No description provided")
if target
else "No description provided"
)
return SCAN_BEHAVIOR_PROMPT.format(
target_url=target_url, target_description=target_description
)
And it basically just provides scan-specific task prompt.
Control Edge - ToolRouterEdge
Code available on GitHub.
To control graph execution I used a dynamic routing edge:
import logging
from dataclasses import dataclass
from langchain_core.messages import AIMessage
from agent_core.state import ReActAgentState
@dataclass
class ToolRouterEdge[StateT: ReActAgentState]:
origin_node: str
end_node: str
tools_node: str
def __call__(self, state: StateT) -> str:
"""Route based on tool calls and limits"""
last_message = state["messages"][-1]
usage = state["usage"]
tools_usage = state["tools_usage"]
tools = state["tools"]
tools_names = [t.name for t in tools.tools]
if usage.is_limit_reached():
logging.info(
"Limit is reached, routing to end node: usage = %s, end_node = %s",
usage,
self.end_node,
)
return self.end_node
if isinstance(last_message, AIMessage) and last_message.tool_calls:
logging.info("Routing to tools node: %s", self.tools_node)
return self.tools_node
if not tools_usage.is_limit_reached(tools_names):
logging.info(
"Limit is not reached: tools = %s, usage = %s, origin_node = %s",
tools_names,
tools_usage,
self.origin_node,
)
return self.origin_node
logging.info(
"ToolRouterEdge: No tool calls found in the last message. "
"Usage limit reached. Routing to end node: %s. "
"Last message: %s",
self.end_node,
last_message,
)
return self.end_node
This decides what node to call next based on the LLM decision and current tool usage.
Usually in the ReAct pattern, LLM decides what tools to call, but LLM is not a deterministic system and sometimes it can be lazy and use a tool only once or even never use it. We never know when LLM will decide to behave like this. To omit such cases, I decided to use LLM only as a "decision engine" but control tool calling from good old code with if-else
statements:
Problem | Solution |
---|---|
LLM constantly calls tools | If tool usage exceeds the limit - stop using tools even if LLM decided to do so and go to the end_node
|
LLM doesn't call tools | If the LLM decides not to use any tool but tool usage didn't exceed the limit - restart previous node to force the LLM to decide what tool to use until tool usage exceeds the limit |
Tools Results Processor - ProcessToolResultsNode
Code available on GitHub.
As mentioned above, I fought with the high LLM token usage problem and I introduced a special field in the state:
class ReActAgentState(MessagesState):
results: Annotated[list[ToolResult], operator.add]
This contains tool result executions. But to populate this field I need to parse the LangGraph messages list in the ProcessToolResultsNode
class:
from langchain_core.messages import (
AIMessage,
AnyMessage,
ToolMessage,
)
from agent_core.state import ReActAgentState, ToolResult
import logging
class ProcessToolResultsNode[StateT: ReActAgentState]:
def __call__(self, state: StateT) -> dict:
messages = state["messages"]
tools_usage = state["tools_usage"]
new_results = []
results = state.get("results", [])
call_id_to_result = {
result.tool_call_id: result for result in results if result.tool_call_id
}
reversed_messages = list(reversed(messages))
for msg in reversed_messages:
if isinstance(msg, ToolMessage):
if msg.tool_call_id not in call_id_to_result:
if msg.name is not None:
tools_usage.increment_usage(msg.name)
new_results.append(
ToolResult(
result=str(msg.content),
tool_name=msg.name,
tool_arguments=self._find_tool_call_args(
reversed_messages, msg.tool_call_id
),
tool_call_id=msg.tool_call_id,
)
)
logging.debug(
"ProcessToolResultsNode: Processed tool results: %s",
new_results,
)
return {
"results": list(reversed(new_results)),
"tools_calls": tools_usage,
}
def _find_tool_call_args(
self, messages: list[AnyMessage], tool_call_id: str
) -> dict | None:
for msg in messages:
if isinstance(msg, AIMessage):
for tool_call in msg.tool_calls:
if tool_call.get("id") == tool_call_id:
return tool_call.get("args")
This basically associates current tool results with tool request messages and saves them in the graph state for next processing.
Summary Generation
Code available on GitHub.
After finishing scanning or reaching limits, we need to generate a summary that will be easy to consume by people or the next AI Agent in the chain:
import json
from langchain_core.language_models import BaseChatModel
from langchain_core.messages import SystemMessage
from scan_agent.state import ScanAgentState
from scan_agent.state.scan_agent_state import ScanAgentSummary
SUMMARY_BEHAVIOR_PROMPT = "Omitted for simplicity. Full prompt available on GitHub."
class SummaryNode:
def __init__(self, llm: BaseChatModel):
self.structured_llm = llm.with_structured_output(ScanAgentSummary)
def __call__(self, state: ScanAgentState) -> dict:
target = state["target"]
system_prompt = SUMMARY_BEHAVIOR_PROMPT.format(
target_url=target.url,
target_description=target.description,
target_type=target.type,
tool_results=json.dumps([r.to_dict() for r in state.get("results", [])]),
)
prompt_messages = [SystemMessage(content=system_prompt), state["messages"][-1]]
summary = self.structured_llm.invoke(prompt_messages)
return {"summary": summary}
Graph
Code available on GitHub.
To build a graph I used this code:
from langchain_openai import ChatOpenAI
from langgraph.checkpoint.memory import MemorySaver
from langgraph.graph import END, START, StateGraph
from langgraph.graph.state import CompiledStateGraph
from langgraph.prebuilt import ToolNode
from agent_core.edge import ToolRouterEdge
from agent_core.node import ProcessToolResultsNode
from agent_core.tool import ffuf_directory_scan, curl_tool
from scan_agent.node import ScanNode
from scan_agent.node.summary_node import SummaryNode
from scan_agent.state import ScanAgentState
def create_scan_graph() -> CompiledStateGraph:
llm = ChatOpenAI(model="gpt-4.1-2025-04-14", temperature=0.3)
tools = [ffuf_directory_scan, curl_tool]
llm_with_tools = llm.bind_tools(tools, parallel_tool_calls=True)
scan_node = ScanNode(llm_with_tools=llm_with_tools)
summary_node = SummaryNode(llm=llm)
process_tool_results_node = ProcessToolResultsNode[ScanAgentState]()
tools_router = ToolRouterEdge[ScanAgentState](
origin_node="scan_node",
end_node="summary_node",
tools_node="scan_tools",
)
builder = StateGraph(ScanAgentState)
builder.add_node("scan_node", scan_node)
builder.add_node("summary_node", summary_node)
builder.add_node("scan_tools", ToolNode(tools))
builder.add_node("process_tool_results_node", process_tool_results_node)
builder.add_edge(START, "scan_node")
builder.add_edge("scan_tools", "process_tool_results_node")
builder.add_edge("process_tool_results_node", "scan_node")
builder.add_edge("summary_node", END)
builder.add_conditional_edges("scan_node", tools_router)
return builder.compile(checkpointer=MemorySaver())
Testing
To perform testing of my scan agent, I asked Claude Code to develop a vulnerable REST API with FastAPI and launched it locally. Code of that service is available here.
After execution of my agent on the specified target with this script:
import uuid
from datetime import timedelta
from langchain_core.runnables.config import RunnableConfig
from agent_core.graph import run_graph
from agent_core.state import ReActUsage, Target, Tools, ToolsUsage
from agent_core.tool import CURL_TOOL, FFUF_TOOL
import logging
logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s")
state = {
"target": Target(
url="http://localhost:8000", description="Local REST API target", type="web"
),
"usage": ReActUsage(limit=25),
"tools_usage": ToolsUsage(
limits={
FFUF_TOOL.name: 2,
CURL_TOOL.name: 5,
}
),
"tools": Tools(tools=[FFUF_TOOL, CURL_TOOL]),
}
thread_id = str(uuid.uuid4())[:8]
config = RunnableConfig(
max_concurrency=10,
recursion_limit=25,
configurable={"thread_id": thread_id},
)
print(f"🚀 Starting improved event processing with thread ID: {thread_id}")
print("=" * 80)
event = await run_graph(graph, state, config)
I got pretty solid results:
My agent found critical vulnerabilities and that was just a scan agent, not an attack agent (about which I will explain in a next article). So I was very happy that this agent worked so well. LLMs have real power to make unexpected decisions which were literally impossible to code in the previous software era.
Summary
I built a simple agent to perform cybersecurity scanning and it worked amazingly well. Modern LLMs provide great power for software engineers to build really powerful systems that were just impossible to build before LLMs. I'm excited to build more agents and solve real world problems.
Main insights for ReAct agent development:
- To minimize LLM token usage, you need to save tool output in the graph state instead of simply using a list of messages.
- To guarantee sufficient tool usage, you need to control it from the source code instead of relying on the LLM decision.
In the next article I will explain how to combine multiple AI Agents to perform complete cybersecurity assessment for a system with LangGraph.
Subscribe to my Substack to not miss my new articles 😊
Top comments (0)