Posted on Jun 3

LlamaIndex File Chat Workflow with A2A Protocol

#a2a #python #agent

This sample demonstrates a conversational agent built with LlamaIndex Workflows and exposed through the A2A protocol. It showcases file upload and parsing, conversational interactions with support for multi-turn dialogue, streaming responses/updates, and in-line citations.

source code

a2a llama index file chat with openrouter

How It Works

This agent uses LlamaIndex Workflows with OpenRouter to provide a conversational agent that can upload files, parse them, and answer questions about the content. The A2A protocol enables standardized interaction with the agent, allowing clients to send requests and receive real-time updates.

Key Features

File Upload: Clients can upload files and parse them to provide context to the chat
Multi-turn Conversations: Agent can request additional information when needed
Real-time Streaming: Provides status updates during processing
Push Notifications: Support for webhook-based notifications
Conversational Memory: Maintains context across interactions in the same session
LlamaParse Integration: Uses LlamaParse to parse files accurately

NOTE: This sample agent accepts multimodal inputs, but at the time of writing, the sample UI only supports text inputs. The UI will become multimodal in the future to handle this and other use cases.

Prerequisites

Python 3.12 or higher
UV
Access to an LLM and API Key (Current code assumes using OpenRouter API)
A LlamaParse API key (get one for free)

Setup & Running

Clone and navigate to the project directory:

   git clone https://github.com/sing1ee/a2a_llama_index_file_chat
   cd a2a_llama_index_file_chat

Create a virtual environment and install dependencies:

   uv venv
   uv sync

Create an environment file with your API keys:

   echo "OPENROUTER_API_KEY=your_api_key_here" >> .env
   echo "LLAMA_CLOUD_API_KEY=your_api_key_here" >> .env

Getting API Keys:

OpenRouter API Key: Sign up at https://openrouter.ai to get your free API key
LlamaCloud API Key: Get one for free at https://cloud.llamaindex.ai

Run the agent:

   # Using uv
   uv run a2a-file-chat

   # Or activate the virtual environment and run directly
   source .venv/bin/activate  # On Windows: .venv\Scripts\activate
   python -m a2a_file_chat

   # With custom host/port
   uv run a2a-file-chat --host 0.0.0.0 --port 8080

In a separate terminal, run an A2A client cli:

Download a file to parse, or link to your own file. For example:

   curl -L https://arxiv.org/pdf/1706.03762 -o attention.pdf

   git clone https://github.com/google-a2a/a2a-samples.git
   cd a2a-samples/samples/python/hosts/cli
   uv run . --agent http://localhost:10010

And enter something like the following:

   ======= Agent Card ========
   {"name":"Parse and Chat","description":"Parses a file and then chats with a user using the parsed content as context.","url":"http://localhost:10010/","version":"1.0.0","capabilities":{"streaming":true,"pushNotifications":true,"stateTransitionHistory":false},"defaultInputModes":["text","text/plain"],"defaultOutputModes":["text","text/plain"],"skills":[{"id":"parse_and_chat","name":"Parse and Chat","description":"Parses a file and then chats with a user using the parsed content as context.","tags":["parse","chat","file","llama_parse"],"examples":["What does this file talk about?"]}]}
   =========  starting a new task ======== 

   What do you want to send to the agent? (:q or quit to exit): What does this file talk about?
   Select a file path to attach? (press enter to skip): ./attention.pdf

Technical Implementation

LlamaIndex Workflows: Uses a custom workflow to parse the file and then chat with the user
Streaming Support: Provides incremental updates during processing
Serializable Context: Maintains conversation state between turns, can optionally be persisted to redis, mongodb, to disk, etc.
Push Notification System: Webhook-based updates with JWK authentication
A2A Protocol Integration: Full compliance with A2A specifications

Limitations

Only supports text-based output
LlamaParse is free for the first 10K credits (~3333 pages with basic settings)
Memory is session-based and in-memory, and therefore not persisted between server restarts
Inserting the entire document into the context window is not scalable for larger files. You may want to deploy a vector DB or use a cloud DB to run retrieval over one or more files for effective RAG. LlamaIndex integrates with a ton of vector DBs and cloud DBs.

Examples

Synchronous request

Request:

POST http://localhost:10010
Content-Type: application/json

{
  "jsonrpc": "2.0",
  "id": 11,
  "method": "tasks/send",
  "params": {
    "id": "129",
    "sessionId": "8f01f3d172cd4396a0e535ae8aec6687",
    "acceptedOutputModes": [
      "text"
    ],
    "message": {
      "role": "user",
      "parts": [
        {
          "type": "text",
          "text": "What does this file talk about?"
        },
        {
            "type": "file",
            "file": {
                "bytes": "...",
                "name": "attention.pdf"
            }
        }
      ]
    }
  }
}

Response:

{
  "jsonrpc": "2.0",
  "id": 11,
  "result": {
    "id": "129",
    "status": {
      "state": "completed",
      "timestamp": "2025-04-02T16:53:29.301828"
    },
    "artifacts": [
      {
        "parts": [
          {
            "type": "text",
            "text": "This file is about XYZ... [1]"
          }
        ],
        "metadata": {
            "1": ["Text for citation 1"]
        }
        "index": 0,
      }
    ],
  }
}

Multi-turn example

Request - Seq 1:

POST http://localhost:10010
Content-Type: application/json

{
  "jsonrpc": "2.0",
  "id": 11,
  "method": "tasks/send",
  "params": {
    "id": "129",
    "sessionId": "8f01f3d172cd4396a0e535ae8aec6687",
    "acceptedOutputModes": [
      "text"
    ],
    "message": {
      "role": "user",
      "parts": [
        {
          "type": "text",
          "text": "What does this file talk about?"
        },
        {
            "type": "file",
            "file": {
                "bytes": "...",
                "name": "attention.pdf"
            }
        }
      ]
    }
  }
}

Response - Seq 2:

{
  "jsonrpc": "2.0",
  "id": 11,
  "result": {
    "id": "129",
    "status": {
      "state": "completed",
      "timestamp": "2025-04-02T16:53:29.301828"
    },
    "artifacts": [
      {
        "parts": [
          {
            "type": "text",
            "text": "This file is about XYZ... [1]"
          }
        ],
        "metadata": {
            "1": ["Text for citation 1"]
        }
        "index": 0,
      }
    ],
  }
}

Request - Seq 3:

POST http://localhost:10010
Content-Type: application/json

{
  "jsonrpc": "2.0",
  "id": 11,
  "method": "tasks/send",
  "params": {
    "id": "130",
    "sessionId": "8f01f3d172cd4396a0e535ae8aec6687",
    "acceptedOutputModes": [
      "text"
    ],
    "message": {
      "role": "user",
      "parts": [
        {
          "type": "text",
          "text": "What about thing X?"
        }
      ]
    }
  }
}

Response - Seq 4:

{
  "jsonrpc": "2.0",
  "id": 11,
  "result": {
    "id": "130",
    "status": {
      "state": "completed",
      "timestamp": "2025-04-02T16:53:29.301828"
    },
    "artifacts": [
      {
        "parts": [
          {
            "type": "text",
            "text": "Thing X is ... [1]"
          }
        ],
        "metadata": {
            "1": ["Text for citation 1"]
        }
        "index": 0,
      }
    ],
  }
}

Streaming example