DEV Community

Cover image for How to Build a ReAct AI Agent for Cybersecurity Scanning with Python and LangGraph
Vitalii Honchar
Vitalii Honchar

Posted on • Originally published at vitaliihonchar.com

How to Build a ReAct AI Agent for Cybersecurity Scanning with Python and LangGraph

Introduction

ReAct agents are tricky to implement correctly, and in this article I will show how to do it using a cybersecurity AI Agent example that can find vulnerabilities in any provided web target. Today I will explain:

  • How to use tokens efficiently in ReAct agents
  • How to force ReAct agents to use tools efficiently and not be too lazy

Subscribe to my Substack to not miss my new articles 😊

Theory

Before we jump to the implementation, let's first define what an AI Agent is and how I can build it.

AI Agent

An AI Agent is a hot architecture pattern with LLM, loop, and actions (tools) that the LLM can perform. The LLM is used here as a brain that can decide what to do as a replacement for regular code that software used previously to make decisions.

Traditional automation breaks when conditions change. Agents adapt. That's the real value - resilience, not just "LLM with extra steps."

There is a very basic AI Agent architecture:

Image description

  1. User executes AI Agent
  2. LLM makes its own decision to call some tool to perform an action
  3. Tool returns a result to LLM and allows LLM to perform a new decision. This process loops until the LLM decides that a result can be provided to the user or certain conditions are met.
  4. LLM produces a final result for the agent.

AI Agents have different patterns used to build them, and for me they are similar to classic software patterns like Factory, Singleton, or Strategy. But in this article I focused on the simplest one - the ReAct pattern.

ReAct Agent Pattern

Image description

ReAct is a pattern for AI Agents with these steps:

  1. Reason - LLM thinks about data or tool results
  2. Act - LLM calls tools to perform some actions
  3. Observe - LLM handles results of tool execution
  4. LLM provides final result

Project Requirements

To learn how to build a ReAct Agent, I decided to build a vulnerability scanning AI Agent. It will accept a web service URL as input and provide a vulnerability report as output.

Image description

I used this technology stack to build it:

  • Python
  • LangGraph

System Design

Image description

  1. User specifies a target to scan.
  2. scan_node asks LLM to reason about context and make a decision about the next step.
  3. scan_node uses tools to perform Web Target scanning if LLM decides to do so.
  4. tools scan Web Target.
  5. scan_node calls summary_node if LLM decides that no additional tool usage is required.
  6. summary_node provides context about scanning results to LLM.
  7. LLM produces summary output

So it's a basic ReAct pattern but with an extra node to perform summary generation. This approach produces a summary with higher quality rather than direct ReAct pattern result consumption.

Implementation

Short term memory - graph state

Code available on GitHub

To implement short term memory I used this graph state:

class ReActAgentState(MessagesState):
    usage: ReActUsage
    tools_usage: ToolsUsage
    tools: Tools
    results: Annotated[list[ToolResult], operator.add]
    target: Target
Enter fullscreen mode Exit fullscreen mode

Which contains:

  • tools_usage - to track current tool usage and check if it doesn't exceed limits.
  • usage - to track the depth of graph recursion execution and prevent reaching limits.
  • tools - a dynamic list of tools that users can specify during graph execution, which makes this graph reusable.
  • results - list of tool execution results used to reduce LLM token usage.

In the ReAct agent implementation, we are allowing LLM to call tools and LLM should parse tool results to perform a reason step to decide what to do next. The problem with the default approach in LangGraph:

  1. Call tool.
  2. Receive tool result as a message.
  3. Perform reasoning

This history of tool executions is saved until the current execution of the graph reaches the end. This makes token usage enormously high. To reduce it, I decided to save current tool execution in the state.results field and pass to LLM only when it's needed and not on each LLM call.

Common node - ReActNode

Code available on GitHub.

This node inherits from the common ReAct node which I introduced to omit duplicate work in the future:

system_prompt = """
You are an agent that should act as specified in escaped content <BEHAVIOR></BEHAVIOR>.

TOOLS AVAILABLE TO USE:
{tools}

TOOLS USAGE LIMITS:
{tools_usage}

TOOLS CALLING LIMITS:
{calling_limits}

PREVIOUS TOOLS EXECUTION RESULTS:
{tools_results}

<BEHAVIOR>
{behavior}
</BEHAVIOR>
"""


class ReActNode[StateT: ReActAgentState](ABC):
    def __init__(self, llm_with_tools: Runnable[LanguageModelInput, BaseMessage]):
        self.llm_with_tools = llm_with_tools

    def __call__(self, state: StateT) -> dict:
        prompt = system_prompt.format(
            tools=json.dumps(state["tools"].to_dict()),
            tools_usage=json.dumps(state["tools_usage"].to_dict()),
            calling_limits=json.dumps(state["usage"].to_dict()),
            tools_results=json.dumps([r.to_dict() for r in state.get("results", [])]),
            behavior=self.get_system_prompt(state),
        )
        system_message = SystemMessage(prompt)

        res = self.llm_with_tools.invoke([system_message])

        logging.debug(
            "[ReActNode] Executed LLM request: state = %s, response = %s", state, res
        )
        return {"messages": [res]}

    @abstractmethod
    def get_system_prompt(self, state: StateT) -> str:
        pass
Enter fullscreen mode Exit fullscreen mode

The StateT type is generic, which means that any subclass can specify a custom state used in it, which makes this node very flexible. In the __call__ method I'm building a prompt that controls tool usage. Since the tools field is dynamic, I don't need to hardcode the tool usage guide inside my system prompt because I can generate it dynamically, based on the tools field state.

Subclasses of the ReActNode class should implement the get_system_prompt method which should return a node-specific prompt, but the subclass shouldn't care about common things implemented in the ReActNode class.

Core of the system - scan_node

Code available on GitHub.

The scan_node node is a subclass of the ReActNode class:

from typing import override

from langchain_core.language_models import LanguageModelInput
from langchain_core.messages import AIMessage, BaseMessage, SystemMessage
from langchain_core.runnables import Runnable

from agent_core.node import ReActNode
from scan_agent.state import ScanAgentState

SCAN_BEHAVIOR_PROMPT = "Omitted for simplicity. Full prompt available on GitHub."


class ScanNode(ReActNode[ScanAgentState]):
    def __init__(self, llm_with_tools: Runnable[LanguageModelInput, BaseMessage]):
        super().__init__(llm_with_tools=llm_with_tools)

    @override
    def get_system_prompt(self, state: ScanAgentState) -> str:
        target = state.get("target", {})
        target_url = getattr(target, "url", "Unknown") if target else "Unknown"
        target_description = (
            getattr(target, "description", "No description provided")
            if target
            else "No description provided"
        )

        return SCAN_BEHAVIOR_PROMPT.format(
            target_url=target_url, target_description=target_description
        )
Enter fullscreen mode Exit fullscreen mode

And it basically just provides scan-specific task prompt.

Control Edge - ToolRouterEdge

Code available on GitHub.

To control graph execution I used a dynamic routing edge:

import logging
from dataclasses import dataclass

from langchain_core.messages import AIMessage

from agent_core.state import ReActAgentState


@dataclass
class ToolRouterEdge[StateT: ReActAgentState]:
    origin_node: str
    end_node: str
    tools_node: str

    def __call__(self, state: StateT) -> str:
        """Route based on tool calls and limits"""
        last_message = state["messages"][-1]
        usage = state["usage"]
        tools_usage = state["tools_usage"]
        tools = state["tools"]
        tools_names = [t.name for t in tools.tools]

        if usage.is_limit_reached():
            logging.info(
                "Limit is reached, routing to end node: usage = %s, end_node = %s",
                usage,
                self.end_node,
            )
            return self.end_node

        if isinstance(last_message, AIMessage) and last_message.tool_calls:
            logging.info("Routing to tools node: %s", self.tools_node)
            return self.tools_node

        if not tools_usage.is_limit_reached(tools_names):
            logging.info(
                "Limit is not reached: tools = %s, usage = %s, origin_node = %s",
                tools_names,
                tools_usage,
                self.origin_node,
            )
            return self.origin_node

        logging.info(
            "ToolRouterEdge: No tool calls found in the last message. "
            "Usage limit reached. Routing to end node: %s. "
            "Last message: %s",
            self.end_node,
            last_message,
        )
        return self.end_node
Enter fullscreen mode Exit fullscreen mode

This decides what node to call next based on the LLM decision and current tool usage.

Usually in the ReAct pattern, LLM decides what tools to call, but LLM is not a deterministic system and sometimes it can be lazy and use a tool only once or even never use it. We never know when LLM will decide to behave like this. To omit such cases, I decided to use LLM only as a "decision engine" but control tool calling from good old code with if-else statements:

Problem Solution
LLM constantly calls tools If tool usage exceeds the limit - stop using tools even if LLM decided to do so and go to the end_node
LLM doesn't call tools If the LLM decides not to use any tool but tool usage didn't exceed the limit - restart previous node to force the LLM to decide what tool to use until tool usage exceeds the limit

Tools Results Processor - ProcessToolResultsNode

Code available on GitHub.

As mentioned above, I fought with the high LLM token usage problem and I introduced a special field in the state:

class ReActAgentState(MessagesState):
    results: Annotated[list[ToolResult], operator.add]
Enter fullscreen mode Exit fullscreen mode

This contains tool result executions. But to populate this field I need to parse the LangGraph messages list in the ProcessToolResultsNode class:

from langchain_core.messages import (
    AIMessage,
    AnyMessage,
    ToolMessage,
)

from agent_core.state import ReActAgentState, ToolResult
import logging


class ProcessToolResultsNode[StateT: ReActAgentState]:
    def __call__(self, state: StateT) -> dict:
        messages = state["messages"]
        tools_usage = state["tools_usage"]
        new_results = []

        results = state.get("results", [])

        call_id_to_result = {
            result.tool_call_id: result for result in results if result.tool_call_id
        }

        reversed_messages = list(reversed(messages))
        for msg in reversed_messages:
            if isinstance(msg, ToolMessage):
                if msg.tool_call_id not in call_id_to_result:
                    if msg.name is not None:
                        tools_usage.increment_usage(msg.name)

                    new_results.append(
                        ToolResult(
                            result=str(msg.content),
                            tool_name=msg.name,
                            tool_arguments=self._find_tool_call_args(
                                reversed_messages, msg.tool_call_id
                            ),
                            tool_call_id=msg.tool_call_id,
                        )
                    )

        logging.debug(
            "ProcessToolResultsNode: Processed tool results: %s",
            new_results,
        )
        return {
            "results": list(reversed(new_results)),
            "tools_calls": tools_usage,
        }

    def _find_tool_call_args(
        self, messages: list[AnyMessage], tool_call_id: str
    ) -> dict | None:
        for msg in messages:
            if isinstance(msg, AIMessage):
                for tool_call in msg.tool_calls:
                    if tool_call.get("id") == tool_call_id:
                        return tool_call.get("args")
Enter fullscreen mode Exit fullscreen mode

This basically associates current tool results with tool request messages and saves them in the graph state for next processing.

Summary Generation

Code available on GitHub.

After finishing scanning or reaching limits, we need to generate a summary that will be easy to consume by people or the next AI Agent in the chain:

import json
from langchain_core.language_models import BaseChatModel
from langchain_core.messages import SystemMessage

from scan_agent.state import ScanAgentState
from scan_agent.state.scan_agent_state import ScanAgentSummary

SUMMARY_BEHAVIOR_PROMPT = "Omitted for simplicity. Full prompt available on GitHub."

class SummaryNode:
    def __init__(self, llm: BaseChatModel):
        self.structured_llm = llm.with_structured_output(ScanAgentSummary)

    def __call__(self, state: ScanAgentState) -> dict:
        target = state["target"]

        system_prompt = SUMMARY_BEHAVIOR_PROMPT.format(
            target_url=target.url,
            target_description=target.description,
            target_type=target.type,
            tool_results=json.dumps([r.to_dict() for r in state.get("results", [])]),
        )

        prompt_messages = [SystemMessage(content=system_prompt), state["messages"][-1]]
        summary = self.structured_llm.invoke(prompt_messages)

        return {"summary": summary}
Enter fullscreen mode Exit fullscreen mode

Graph

Code available on GitHub.

To build a graph I used this code:

from langchain_openai import ChatOpenAI
from langgraph.checkpoint.memory import MemorySaver
from langgraph.graph import END, START, StateGraph
from langgraph.graph.state import CompiledStateGraph
from langgraph.prebuilt import ToolNode

from agent_core.edge import ToolRouterEdge
from agent_core.node import ProcessToolResultsNode
from agent_core.tool import ffuf_directory_scan, curl_tool
from scan_agent.node import ScanNode
from scan_agent.node.summary_node import SummaryNode
from scan_agent.state import ScanAgentState


def create_scan_graph() -> CompiledStateGraph:
    llm = ChatOpenAI(model="gpt-4.1-2025-04-14", temperature=0.3)
    tools = [ffuf_directory_scan, curl_tool]
    llm_with_tools = llm.bind_tools(tools, parallel_tool_calls=True)

    scan_node = ScanNode(llm_with_tools=llm_with_tools)
    summary_node = SummaryNode(llm=llm)
    process_tool_results_node = ProcessToolResultsNode[ScanAgentState]()

    tools_router = ToolRouterEdge[ScanAgentState](
        origin_node="scan_node",
        end_node="summary_node",
        tools_node="scan_tools",
    )

    builder = StateGraph(ScanAgentState)

    builder.add_node("scan_node", scan_node)
    builder.add_node("summary_node", summary_node)
    builder.add_node("scan_tools", ToolNode(tools))
    builder.add_node("process_tool_results_node", process_tool_results_node)

    builder.add_edge(START, "scan_node")
    builder.add_edge("scan_tools", "process_tool_results_node")
    builder.add_edge("process_tool_results_node", "scan_node")
    builder.add_edge("summary_node", END)

    builder.add_conditional_edges("scan_node", tools_router)

    return builder.compile(checkpointer=MemorySaver())
Enter fullscreen mode Exit fullscreen mode

Testing

To perform testing of my scan agent, I asked Claude Code to develop a vulnerable REST API with FastAPI and launched it locally. Code of that service is available here.

After execution of my agent on the specified target with this script:

import uuid
from datetime import timedelta

from langchain_core.runnables.config import RunnableConfig

from agent_core.graph import run_graph
from agent_core.state import ReActUsage, Target, Tools, ToolsUsage
from agent_core.tool import CURL_TOOL, FFUF_TOOL
import logging

logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s")

state = {
    "target": Target(
        url="http://localhost:8000", description="Local REST API target", type="web"
    ),
    "usage": ReActUsage(limit=25),
    "tools_usage": ToolsUsage(
        limits={
            FFUF_TOOL.name: 2,
            CURL_TOOL.name: 5,
        }
    ),
    "tools": Tools(tools=[FFUF_TOOL, CURL_TOOL]),
}
thread_id = str(uuid.uuid4())[:8]
config = RunnableConfig(
    max_concurrency=10,
    recursion_limit=25,
    configurable={"thread_id": thread_id},
)

print(f"🚀 Starting improved event processing with thread ID: {thread_id}")
print("=" * 80)

event = await run_graph(graph, state, config)
Enter fullscreen mode Exit fullscreen mode

I got pretty solid results:

Image description

My agent found critical vulnerabilities and that was just a scan agent, not an attack agent (about which I will explain in a next article). So I was very happy that this agent worked so well. LLMs have real power to make unexpected decisions which were literally impossible to code in the previous software era.

Summary

I built a simple agent to perform cybersecurity scanning and it worked amazingly well. Modern LLMs provide great power for software engineers to build really powerful systems that were just impossible to build before LLMs. I'm excited to build more agents and solve real world problems.

Main insights for ReAct agent development:

  • To minimize LLM token usage, you need to save tool output in the graph state instead of simply using a list of messages.
  • To guarantee sufficient tool usage, you need to control it from the source code instead of relying on the LLM decision.

In the next article I will explain how to combine multiple AI Agents to perform complete cybersecurity assessment for a system with LangGraph.

Subscribe to my Substack to not miss my new articles 😊

Top comments (0)