DEV Community

avnsh
avnsh

Posted on

Agent in a desktop app

In this post I'll be setting up a very simple AI agent in an electron app and then go on show updates in a super interactive chat interface using react.

Stack

For this project I'll be using the following libraries

  • Electron
  • React
  • Turborepo
  • Model Context Protocol(MCP) SDK
  • Langchain libraries - Langraph, MCP Adapters etc

Dev setup for desktop app

This part is mostly kind of boilerplate, so bear with me. If you're already familiar with setting up a desktop app inside a monorepo you may skip to the next section of setting up agent.

Electron & React

Electron React Boilerplate(ERB) is a pretty easy way to start building a desktop app in no time. It uses electron-builder under the hood to power build and publish step of the application and for building the frontend it uses Webpack. Just clone the repo and you're ready to go. For this setup let's use the name agent-desktop-app.

Turborepo

First thing I did was setting up a monorepo. The idea was to make a list of MCP servers as separate packages which can be used by any MCP client. Initially until I had build my own client, I was testing these servers with Claude desktop. This is the folder structure we'll have:

agentic-ui
├── apps
│   └── agent-desktop-app
├── docs
│   ├── 0-vibe-coding-context.md
│   └── design-decisions
├── package.json
├── packages
│   ├── eslint-config
│   ├── typescript-config
│   └── ui
├── pnpm-lock.yaml
├── pnpm-workspace.yaml
├── README.md
├── servers
│   ├── mcp-playwright-server
│   └── my-mcp-server
└── turbo.json
Enter fullscreen mode Exit fullscreen mode

Setting up Turborepo is pretty straightforward, just follow their installation guide and afte that add our agent-desktopa-app inside apps, move Github actions to root of monorepo and remove package-lock file.

Styling

Adding Tailwind for styling was an extra step since ERB came with SaSS support by default and it uses Webpack.

Just install tailwindcss, autoprefixer, postcss, postcss-loader and update the config files

//apps/agent-desktop-app/tailwind.config.js
/** @type {import('tailwindcss').Config} */
export default {
  content: ['./src/renderer/**/*.{js,jsx,ts,tsx}', './src/index.html'],
  theme: {
    extend: {},
  },
  plugins: [],
};
Enter fullscreen mode Exit fullscreen mode
//apps/agent-desktop-app/.erb/configs/postcss.config.js
export default {
  plugins: {
    tailwindcss: {},
    autoprefixer: {},
  },
};
Enter fullscreen mode Exit fullscreen mode
//apps/agent-desktop-app/.erb/configs/webpack.config.renderer.prod/dev.ts
//other rules
     {
        test: /\.s?(a|c)ss$/,
        use: [
          MiniCssExtractPlugin.loader,
          'css-loader',
          {
            loader: 'postcss-loader',
            options: {
              postcssOptions: {
                config: path.resolve(__dirname, 'postcss.config.js'),
              },
            },
          },
          'sass-loader',
        ],
        exclude: /\.module\.s?(c|a)ss$/,
      },
//other rules
Enter fullscreen mode Exit fullscreen mode

Now that our basic dev setup is done, we'll move to next step to setup our agent.

Setting up agent

Now to the most fun part. I'll be using the electron app as the MCP host application. Under the hood, in the main thread we can create a simple agent using Langgraph functional approach which gives so much control over the agent execution compared graph based, specially for beginners.

Agent loop

Apart from basic for loop which runs agent until the task is completed, I've included a tool for agent to get help from user if required at any point and also a control from outside the agent to pause the execution in case the user wants to add more context.

import { Connection, MultiServerMCPClient } from '@langchain/mcp-adapters';
import {
  StructuredToolInterface,
  tool as langchainTool,
} from '@langchain/core/tools';
import { type ToolCall } from '@langchain/core/messages/tool';
import {
  AIMessage,
  BaseMessageLike,
  ToolMessage,
  HumanMessage,
  BaseMessage,
  isAIMessage,
  SystemMessage,
} from '@langchain/core/messages';
import { ChatOpenAI, ChatOpenAICallOptions } from '@langchain/openai';
import { ChatVertexAI } from '@langchain/google-vertexai';
import { ChatAnthropic, ChatAnthropicCallOptions } from '@langchain/anthropic';
import { ChatOllama } from '@langchain/ollama';
import { z } from 'zod';
import { addMessages, entrypoint, Pregel, task } from '@langchain/langgraph';
import log from 'electron-log';
import { Observable, Subject, firstValueFrom, race } from 'rxjs';
import { buildSystemPrompt, buildUserPrompt } from './prompt';

const prettyPrintMessage = (message: BaseMessage) => {
  log.info('='.repeat(30), `${message.getType()} message`, '='.repeat(30));
  log.info(message.content);
  if (isAIMessage(message) && message.tool_calls?.length) {
    log.info(JSON.stringify(message.tool_calls, null, 2));
  }
};

type UserInputRequest = {
  type: 'get_user_help';
  question: string;
};

type Update = {
  type: 'tool_call' | 'tool_result';
  message?: any;
  toolName?: string;
  toolArgs?: Record<string, any>;
  isError?: boolean;
}

export type OnUpdate = (update: Update) => void;
export type OnUserInputRequest = (request: UserInputRequest) => Promise<string>;
type ChatModel = ChatVertexAI | ChatOpenAI | ChatAnthropic | ChatOllama;

const openAiOptions: ChatOpenAICallOptions = {
  tool_choice: 'required',
};

const anthropicOptions: ChatAnthropicCallOptions = {
  tool_choice: 'any',
};

export class Agent {
  private mcp: MultiServerMCPClient | null = null;

  private tools: StructuredToolInterface[] = [];

  private agent: Pregel<any, any, any, any, any> | null = null;

  static completed = true;

  private paused = false;

  private pauseResolver: ((value: void | PromiseLike<void>) => void) | null =
    null;

  private model: ChatModel | null = null;

  private pauseSubject: Subject<void> = new Subject<void>();

  async connectToDefaultServers() {
    try {
      const mcpServers: Record<string, Connection> = {};

      for (const server of getDefaultServersConfig()) {
        mcpServers[server.name] = getMcpServer(server);
      }

      if (this.mcp) {
        await this.mcp.close();
      }

      log.info(
        'connecting to mcpServers:',
        JSON.stringify(mcpServers, null, 2),
      );

      this.mcp = new MultiServerMCPClient({
        mcpServers,
        prefixToolNameWithServerName: false,
        additionalToolNamePrefix: '',
        throwOnLoadError: true,
      });

      this.tools = await this.mcp.getTools();
      // Ensure tools is initialized as an array even if getTools returns undefined
      if (!this.tools) {
        this.tools = [];
        log.warn(
          'MCP getTools returned undefined, initializing empty tools array',
        );
      }

      log.info(
        'Connected to server with tools:',
        this.tools.map(({ name }) => name),
      );
      return true;
    } catch (e) {
      log.error('Failed to connect to default servers:', e);

      return false;
    }
  }

  async execute(
    onUpdate?: (update: Update) => void,
    onUserInputRequest?: OnUserInputRequest,
  ) {
    const result = await this.connectToDefaultServers();
    if (!result) {
      throw new Error('Failed to connect to default servers');
    }
    this.createAgent(onUpdate, onUserInputRequest);

    if (!this.agent) {
      throw new Error('Agent not initialized');
    }

    let messages: BaseMessageLike[] = [];

    // Create initial messages with user query
    messages = [
      new SystemMessage(buildSystemPrompt()),
      new HumanMessage(
        buildUserPrompt(),
      ),
    ];

    // Stream responses from agent
    const stream = await this.agent.stream(messages);

    // This loop is only used to stream updates somewhere.
    // If you want to modify the functionality these are your options:
    // 1. Modify the tools passed to the agent (beginner)
    // 2. Modify the callModel or callTool functions (intermediate)
    // 3. Add nodes to the graph to handle the updates differently (advanced)
    for await (const update of stream) {
      for (const [taskName, message] of Object.entries(update)) {
        if (taskName === 'agent') {
          // Skip agent task messages
          continue;
        }
        prettyPrintMessage(message as BaseMessage);
      }
    }
  }

  async pause() {
    this.paused = true;
    this.pauseSubject.next();

    return new Promise((resolve) => {
      this.pauseResolver = resolve;
    });
  }

  createAgent(onUpdate?: OnUpdate, onUserInputRequest?: OnUserInputRequest) {
    const tools = this.getAllTools(onUpdate, onUserInputRequest);

    const toolsByName = Object.fromEntries(
      tools.map((tool) => [tool.name, tool]),
    );

    const callModel = task('callModel', async (messages: BaseMessageLike[]) => {
      const messagesWithUpdatedPlan = messages.map((message) => {
        if (message instanceof SystemMessage) {
          return new SystemMessage(buildSystemPrompt());
        }
        return message;
      });

      const kwargs =
        this.model instanceof ChatOpenAI ? openAiOptions : anthropicOptions;
      if (!this.model) {
        throw new Error('Model not initialized');
      }

      // Create an observable for the model invocation
      const modelObservable = new Observable<AIMessage>((subscriber) => {
        this.model!.bindTools(tools, kwargs as any)
          .invoke(messagesWithUpdatedPlan)
          .then((response) => {
            subscriber.next(response);
            subscriber.complete();
            return response;
          })
          .catch((error) => {
            subscriber.error(error);
            return error;
          });
      });

      // Create a pause observable that emits when pause is triggered
      const pauseObservable = new Observable<never>((subscriber) => {
        const subscription = this.pauseSubject.subscribe(() => {
          subscriber.error(new Error('Operation paused by user'));
        });

        return () => subscription.unsubscribe();
      });

      try {
        // Race between model completion and pause
        const response = await firstValueFrom(
          race(modelObservable, pauseObservable),
        );
        log.info('model response =======>', JSON.stringify(response, null, 2));
        return response;
      } catch (e) {
        if (this.paused) {
          log.info('Model call paused by user');
          return new AIMessage({
            content: 'paused',
          });
        }
        throw e;
      }
    });

    const callTool = task(
      'callTool',
      async (toolCall: ToolCall): Promise<AIMessage> => {
        if (onUpdate) {
          onUpdate({
            type: 'tool_call',
            toolName: toolCall.name,
            toolArgs: toolCall.args,
          });
        }
        let observation = null;
        let isError = false;
        const tool = toolsByName[toolCall.name];

        // Create an observable for the tool invocation
        const toolObservable = new Observable<string>((subscriber) => {
          tool
            .invoke(toolCall.args, { timeout: 60000 })
            .then((result) => {
              subscriber.next(result);
              subscriber.complete();
              return result;
            })
            .catch((error) => {
              subscriber.error(error);
              return error;
            });
        });

        // Create a pause observable that emits when pause is triggered
        const pauseObservable = new Observable<never>((subscriber) => {
          const subscription = this.pauseSubject.subscribe(() => {
            subscriber.error(new Error('Operation paused by user'));
          });

          return () => subscription.unsubscribe();
        });

        try {
          // Race between tool completion and pause
          observation = await firstValueFrom(
            race(toolObservable, pauseObservable),
          );
        } catch (e) {
          if (this.paused) {
            log.info('Toolcall paused by user');
            observation = 'Toolcall paused';
          } else {
            observation = `Error in tool call: ${e}, try something else maybe?`;
            isError = true;
          }
        }
        if (onUpdate) {
          onUpdate({
            type: 'tool_result',
            toolName: toolCall.name,
            toolArgs: toolCall.args,
            message: observation,
            isError,
          });
        }
        return new ToolMessage({
          content: observation,
          tool_call_id: toolCall.id || '',
        });
      },
    );

    this.agent = entrypoint('agent', async (messages: BaseMessageLike[]) => {
      let currentMessages = messages;

      let llmResponse = await callModel(currentMessages);

      // eslint-disable-next-line no-constant-condition
      while (true) {
        if (
          llmResponse.content === 'paused' ||
          !llmResponse.tool_calls?.length
        ) {
          break;
        }

        currentMessages = addMessages(currentMessages, [llmResponse]);

        // Execute tools one by one
        for (const toolCall of llmResponse.tool_calls) {
          const toolResult = await callTool(toolCall);
          currentMessages = addMessages(currentMessages, [toolResult]);
        }

        // paused when running tool calls
        if (this.paused) {
          const userResponse = await this.afterPause(
            toolsByName.get_user_help,
            onUpdate,
          );
          currentMessages = addMessages(currentMessages, [userResponse]);
        }

        // Call model again
        llmResponse = await callModel(currentMessages);

        if (llmResponse.content === 'paused') {
          const userResponse = await this.afterPause(
            toolsByName.get_user_help,
            onUpdate,
          );

          currentMessages = addMessages(currentMessages, [userResponse]);
          llmResponse = await callModel(currentMessages);
        }
      }

      return llmResponse;
    });
  }

  async afterPause(
    askUserTool: StructuredToolInterface,
    onUpdate?: OnUpdate,
  ): Promise<HumanMessage> {
    this.pauseResolver?.();
    this.paused = false;

    if (onUpdate) {
      onUpdate({
        type: 'tool_call',
        toolName: 'get_user_help',
        toolArgs: {
          question: 'Execution paused.',
        },
      });
    }

    const userResponse = await askUserTool.invoke({
      question: 'Execution paused.',
    });

    return new HumanMessage(userResponse);
  }

  getAllTools(
    onUpdate?: OnUpdate,
    onUserInputRequest?: OnUserInputRequest,
  ): StructuredToolInterface[] {

    const askUser = langchainTool(
      async (args) => {
        if (!onUserInputRequest) {
          throw new Error('onUserInputRequest is not set');
        }
        const userResponse = await onUserInputRequest({
          type: 'get_user_help',
          question: args.question,
        });
        return userResponse;
      },
      {
        name: 'get_user_help',
        description: 'Ask the user for input',
        schema: z.object({
          question: z.string().describe('The question to ask the user'),
        }),
      },
    );


    return [
      askUser,
      ...this.tools,
    ];
  }

  getAvailableTools(): StructuredToolInterface[] {
    return this.tools;
  }

  async cleanup() {
    // Close all server connections
    try {
      if (this.mcp) {
        await this.mcp.close();
      }
    } catch (e) {
      log.error(`Error closing MCP client:`, e);
    }

    // Clean up the pause subject
    try {
      this.pauseSubject.complete();
    } catch (e) {
      log.error(`Error cleaning up pause subject:`, e);
    }

    this.agent = null;
    this.tools = [];
  }
}

Enter fullscreen mode Exit fullscreen mode

Upstream communication with chat interface

The way we traditionally update any user interface in a React application is through state updates. For this project, I'll be using useReducer to store the state and ask the agent to generate the store updates which can be directly dispatched to the store to update UI. With a bit of prompting magic, what we get is a UI which state updates happen as though some user is taking the action.

For example, a task runner agent which does a list of steps, UI updates are like these

//main thread
// add a tool in agent which updates UI
const informUserAboutUpdates = langchainTool(
      async (args) => {

        return 'User has been informed about the updates';
      },
      {
        name: 'inform_user_about_all_updates',
        description:
          'Inform the user about the step being run currently. Should be called as proactively as possible. When you start a step, what sub-step are you going to do next? A sub-step can be smaller parts within the step. The user should know what is happening next even inside the step if step is complex.',
        schema: z.object({
          stepNumber: z
            .number()
            .describe(
              'The index of the step currently being run, starts from 1',
            ),
          step: z
            .string()
            .describe(
              'The step currently being run. A short, concise one-liner.',
            ),
          summary: z
            .string()
            .describe(
              'A summary of the step',
            ),
          status: z
            .enum(['passed', 'failed', 'in_progress'])
            .describe(
              'The overall status of the step',
            ),
        }),
      },
    );
Enter fullscreen mode Exit fullscreen mode
// renderer thread
// listen to updates from agent and update UI using dispatch
...
const [messages, dispatch] = useReducer(messagesReducer, []);

useEffect(() => {
    const unsubscribe = window.electron.agent.onMessageUpdate(
      (message) => {
         dispatch(message)
      }
    )
}, [])
...

... render messages the way you want
Enter fullscreen mode Exit fullscreen mode

Downstream communication with MCP server and tools

We're already calling a function in our agent to connect to MCP server using connectToDefaultServers

This can simply be an array of object referencing an MCP server bringing their set of tools.

    {
      name: 'memory',
      command: nodePath,
      args: [
        // path to '@modelcontextprotocol/server-memory',
      ],
    },
    {
      name: 'sequential-thinking',
      command: 'node',
      args: [
        // path to '@modelcontextprotocol/server-sequential-thinking',
      ],
    },
    {
      name: 'playwright',
      command: 'node',
      args: [
        '//path to local playwright MCP server'
      ],
    },
Enter fullscreen mode Exit fullscreen mode

This sets up launching and communicating with MCP server. For other funcionalities like setting up reading resources and prompts from MCP server are also possible in this framework now. I haven't covered them here but can add in a separate blog if anyone is interested.

Conclusion

This concludes this article, with this simple approach we can have desktop app with a highly interactive powered by agent, everything using JavaScript inside the Electron app. This can connect to any MCP server and be used for various usecases and infinite possibilites.

Top comments (0)