Welcome to this comprehensive guide where you'll learn how to build your very own AI agent from scratch! We'll start with the basics, understand what an AI agent is, and then progressively add capabilities like tools for performing actions and Retrieval Augmented Generation (RAG) for accessing external knowledge. Finally, we'll wrap it all up with a simple but functional web user interface.
This tutorial is designed for beginners, so we'll break down complex concepts into easy-to-understand steps with plenty of code examples.
What you'll build:
An AI agent that can:
- Chat with you.
- Use a "calculator" tool to perform mathematical calculations.
- Access a small knowledge base to answer questions about specific topics (RAG).
- Interact through a web UI.
Prerequisites:
- Basic understanding of Python.
- Familiarity with HTML, CSS, and JavaScript is helpful but not strictly required for the UI part, as we'll provide the code.
- A willingness to learn and experiment!
Let's dive in!
Part 0: What is an AI Agent?
Think of an AI agent as a smart assistant that can understand your requests, make decisions, and perform actions to achieve a goal. Unlike a simple chatbot that just responds based on its training data, an agent can be more proactive and interact with its environment.
Core Components of Our Agent:
- The Brain (Large Language Model - LLM): This is the core intelligence. We'll use an LLM (like Google's Gemini) to understand language, reason, and generate responses.
- Tools: These are external functions or APIs that the agent can use to perform actions that the LLM itself cannot do well (e.g., precise calculations, accessing real-time information, interacting with other systems).
- Knowledge Base (RAG): This allows the agent to access and use information that wasn't part of its original training data, making its responses more accurate and up-to-date for specific domains.
- User Interface (UI): A way for you to interact with your agent.
Part 1: Setting Up Your Development Environment
First, let's get your computer ready.
1. Python Installation:
If you don't have Python installed, download it from python.org (Version 3.7+ recommended). Make sure to check the box "Add Python to PATH" during installation.
2. Create a Project Directory:
Create a new folder for your project, let's call it my_ai_agent
.
3. Virtual Environment (Recommended):
It's good practice to use a virtual environment to manage project dependencies.
Open your terminal or command prompt, navigate to your my_ai_agent
directory, and run:
python -m venv venv
Activate the virtual environment:
- On Windows:
venv\Scripts\activate
- On macOS/Linux:
source venv/bin/activate
You should see (venv)
at the beginning of your command prompt line.
4. Install Necessary Python Libraries:
We'll need Flask (for the web server backend) and Requests (to make HTTP requests to the Gemini API).
pip install Flask requests python-dotenv
-
Flask
: A micro web framework for Python. -
requests
: A simple HTTP library for Python. -
python-dotenv
: To manage environment variables (like API keys).
5. Get a Gemini API Key:
To use Google's Gemini LLM, you'll need an API key.
- Go to Google AI Studio and create an API key.
- Create a file named
.env
(note the leading dot) in yourmy_ai_agent
project root. -
Add your API key to this file:
GEMINI_API_KEY=YOUR_API_KEY_HERE
Replace
YOUR_API_KEY_HERE
with your actual key. Never share your API key publicly.
Part 2: The Brain - Integrating the LLM (Gemini)
Let's build the backend of our agent using Flask and connect it to the Gemini LLM.
1. Create app.py
:
Inside your my_ai_agent
directory, create a file named app.py
. This will be our main backend file.
# app.py
import os
import json
import requests
from flask import Flask, request, jsonify
from dotenv import load_dotenv
# Load environment variables from .env file
load_dotenv()
app = Flask(__name__)
# Get Gemini API Key from environment variable
GEMINI_API_KEY = os.getenv("GEMINI_API_KEY")
if not GEMINI_API_KEY:
raise ValueError("GEMINI_API_KEY not found. Please set it in the .env file.")
GEMINI_API_URL = f"[https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash-latest:generateContent?key=](https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash-latest:generateContent?key=){GEMINI_API_KEY}"
# --- Placeholder for Tools and RAG data ---
# We will populate these later
KNOWLEDGE_BASE = {
"company_info.txt": "Our company, 'AI Innovators Inc.', was founded in 2023. We specialize in creating cutting-edge AI solutions for businesses. Our mission is to democratize AI technology.",
"product_specs.txt": "Our flagship product, 'AgentX', is an advanced AI agent platform. It features modular tool integration, dynamic RAG capabilities, and a user-friendly interface. It supports natural language processing for complex queries."
}
def simple_calculator(expression):
"""A simple calculator tool."""
try:
# A more robust calculator would use ast.literal_eval or a dedicated library
# For safety, this is a very restricted eval.
if not all(c in "0123456789+-*/(). " for c in expression):
return "Error: Invalid characters in expression."
return str(eval(expression)) # Be cautious with eval in real-world apps
except Exception as e:
return f"Error: {str(e)}"
# --- End Placeholder ---
def call_gemini_api(prompt_text, chat_history=[]):
"""
Calls the Gemini API with the given prompt text and chat history.
Includes logic for tool use and RAG.
"""
print(f"Original user prompt: {prompt_text}")
# 1. RAG: Retrieve relevant knowledge
relevant_knowledge = ""
# Simple keyword-based retrieval
for doc_name, content in KNOWLEDGE_BASE.items():
# Check if any word from the prompt (longer than 3 chars) is in the content
# This is a very basic retrieval, can be improved significantly
prompt_words = [word.lower() for word in prompt_text.split() if len(word) > 3]
if any(word in content.lower() for word in prompt_words):
relevant_knowledge += f"\n\n--- Relevant document: {doc_name} ---\n{content}\n--- End of document ---"
if relevant_knowledge:
print(f"Retrieved knowledge: {relevant_knowledge[:200]}...") # Print first 200 chars
# 2. Construct the full prompt with instructions, RAG context, and tool descriptions
# The system prompt needs to be carefully crafted.
system_prompt = f"""You are a helpful AI assistant.
You have access to the following tools:
1. Calculator:
- Description: Solves mathematical expressions.
- To use: Respond with "TOOL_REQUEST: CALCULATOR(expression)" where 'expression' is the math problem (e.g., "TOOL_REQUEST: CALCULATOR(2+2*8/4-1)").
- Only use the calculator for math questions. For other questions, answer directly.
{( "You also have access to the following information from our knowledge base:\n" + relevant_knowledge) if relevant_knowledge else ""}
Based on the user's query and the available tools/information, provide an answer.
If you use a tool, ONLY output the TOOL_REQUEST string. Do not add any other text.
If you don't need a tool, provide a direct answer.
"""
# Prepare chat history for Gemini API
# Gemini API expects a specific format for chat history.
# We'll simplify here and just prepend the system prompt to the user's current query.
# A more robust solution would manage a proper turn-by-turn history.
# For simplicity in this first pass, we'll just use the latest prompt_text
# and prepend the system instructions and RAG context.
# A real chat history would be a list of {"role": "user", "parts": [{"text": "..."}]} and {"role": "model", "parts": [{"text": "..."}]}
full_prompt_for_llm = f"{system_prompt}\n\nUser query: {prompt_text}"
payload = {
"contents": [
{
"role": "user", # The "system" instructions are part of the user turn for Flash model if not using system_instruction field
"parts": [{"text": full_prompt_for_llm}]
}
],
"generationConfig": {
"temperature": 0.7,
"topK": 1,
"topP": 1,
"maxOutputTokens": 2048,
}
}
# print(f"Payload to Gemini: {json.dumps(payload, indent=2)}") # For debugging
headers = {"Content-Type": "application/json"}
response = requests.post(GEMINI_API_URL, headers=headers, data=json.dumps(payload))
if response.status_code == 200:
response_data = response.json()
# print(f"Raw Gemini Response: {json.dumps(response_data, indent=2)}") # For debugging
# Extract text from response
try:
candidate = response_data.get("candidates", [])[0]
content_parts = candidate.get("content", {}).get("parts", [])
llm_response_text = content_parts[0].get("text", "").strip()
except (IndexError, KeyError, AttributeError) as e:
print(f"Error parsing Gemini response: {e}")
print(f"Problematic response data: {response_data}")
return "Error: Could not parse LLM response."
print(f"LLM Raw Output: {llm_response_text}")
# 3. Tool Handling
if llm_response_text.startswith("TOOL_REQUEST: CALCULATOR("):
expression = llm_response_text.replace("TOOL_REQUEST: CALCULATOR(", "").replace(")", "").strip()
print(f"Tool Request Detected: CALCULATOR, Expression: {expression}")
calc_result = simple_calculator(expression)
print(f"Calculator Result: {calc_result}")
# Now, call Gemini API again with the tool's result to get a natural language response
# This is a second LLM call.
prompt_with_tool_result = f"""You asked to use the calculator for the expression '{expression}'.
The calculator returned: '{calc_result}'.
Based on this result, please formulate a natural language response to the original user query: '{prompt_text}'."""
# Re-call Gemini, this time without tool instructions, just to formulate the answer.
# For simplicity, we are not passing the full chat history or original system prompt here,
# but in a more complex agent, you would.
payload_for_final_answer = {
"contents": [{"role": "user", "parts": [{"text": prompt_with_tool_result}]}],
"generationConfig": {
"temperature": 0.7,
"topK": 1,
"topP": 1,
"maxOutputTokens": 2048,
}
}
final_response = requests.post(GEMINI_API_URL, headers=headers, data=json.dumps(payload_for_final_answer))
if final_response.status_code == 200:
final_response_data = final_response.json()
try:
final_text = final_response_data.get("candidates", [])[0].get("content", {}).get("parts", [])[0].get("text", "").strip()
return final_text
except (IndexError, KeyError, AttributeError):
return f"Calculator result: {calc_result}. (Error formatting final LLM response)"
else:
return f"Calculator result: {calc_result}. (Error getting final LLM response: {final_response.text})"
else:
# No tool used, return LLM's direct response
return llm_response_text
else:
print(f"Error calling Gemini API: {response.status_code} - {response.text}")
return f"Error: Could not connect to LLM. Status: {response.status_code}"
@app.route('/chat', methods=['POST'])
def chat():
data = request.json
user_message = data.get('message')
# chat_history = data.get('history', []) # If you want to manage history client-side
if not user_message:
return jsonify({"error": "No message provided"}), 400
# For now, we are not passing chat_history to call_gemini_api in a structured way,
# but you would typically manage it.
agent_response = call_gemini_api(user_message)
return jsonify({"reply": agent_response})
@app.route('/')
def index():
# We'll serve the HTML UI from here later
return "AI Agent Backend is running. UI will be here."
if __name__ == '__main__':
# Make sure to run on 0.0.0.0 to be accessible if running in a container or VM
# For local development, 127.0.0.1 is fine.
app.run(debug=True, host='0.0.0.0', port=5000)
Explanation of app.py
(Initial Version):
- Imports: Necessary libraries.
-
load_dotenv()
: Loads yourGEMINI_API_KEY
from the.env
file. -
Flask App (
app
): Initializes the Flask application. - API Key & URL: Sets up the Gemini API key and endpoint URL.
-
call_gemini_api(prompt_text, chat_history=[])
:- This is the core function that will eventually handle RAG, tools, and the LLM call.
- It constructs a
payload
with the user's prompt. - It sends a POST request to the Gemini API.
- It parses the response to extract the generated text.
- Error Handling: Basic error checking for the API call.
- Placeholders: Comments for where RAG and Tool logic will go.
-
/chat
Endpoint:- This is a POST endpoint that expects a JSON payload like
{"message": "Hello, agent!"}
. - It extracts the user's message.
- Calls
call_gemini_api
to get the agent's response. - Returns the response as JSON:
{"reply": "Agent's answer"}
.
- This is a POST endpoint that expects a JSON payload like
-
/
Endpoint: A placeholder for our future UI. -
if name == 'main':
: Runs the Flask development server.debug=True
is useful for development as it automatically reloads the server on code changes.
2. Run your Flask App:
Save app.py
. Make sure your virtual environment is active and you are in the my_ai_agent
directory.
Run:
python app.py
You should see output like:
* Serving Flask app 'app'
* Debug mode: on
WARNING: This is a development server. Do not use it in a production deployment.
* Running on [http://127.0.0.1:5000](http://127.0.0.1:5000) (Press CTRL+C to quit)
(Or [http://0.0.0.0:5000/](http://0.0.0.0:5000/) if you used host='0.0.0.0')
3. Test with a Tool (like Postman or curl
):
You can send a POST request to http://127.0.0.1:5000/chat
.
Using curl
(if you have it installed):
curl -X POST -H "Content-Type: application/json" -d "{\"message\":\"Hello, who are you?\"}" [http://127.0.0.1:5000/chat](http://127.0.0.1:5000/chat)
You should get a JSON response from Gemini, like:
{
"reply": "I am a large language model, trained by Google."
}
(The exact reply will vary.)
At this point, you have the "brain" of your agent connected! It can receive messages and respond using the LLM.
Part 3: Giving Your Agent Tools - The Calculator
LLMs are great at language, but not always perfect at precise tasks like arithmetic. Tools help bridge this gap. We'll give our agent a simple calculator.
The process will be:
- User asks a question involving math (e.g., "What is 5 + 7 * 3?").
- The LLM, guided by our prompt, recognizes it needs the calculator.
- The LLM outputs a special string like
"TOOL_REQUEST: CALCULATOR(5+7*3)"
. - Our Python backend parses this, calls our
simple_calculator
function. - The calculator's result is fed back to the LLM.
- The LLM uses this result to formulate a natural language answer (e.g., "5 + 7 * 3 is 26.").
1. Update app.py
with Tool Logic:
We already added placeholders in app.py
. Now let's flesh them out. The call_gemini_api
function needs to be updated to:
* Include tool descriptions in the system prompt sent to Gemini.
* Check Gemini's response for TOOL_REQUEST
.
* If a tool request is found, execute the tool.
* Send the tool's output back to Gemini to generate a final user-facing response.
The app.py
code provided in Part 2 already includes the logic for the calculator tool. Let's review the key parts:
-
simple_calculator(expression)
function: This is our tool. It takes a string expression and tries to evaluate it.
def simple_calculator(expression): """A simple calculator tool.""" try: # Basic validation to prevent arbitrary code execution with eval if not all(c in "0123456789+-*/(). " for c in expression): return "Error: Invalid characters in expression." return str(eval(expression)) # Use with caution except Exception as e: return f"Error: {str(e)}"
Security Note:
eval()
can be dangerous if used with untrusted input. For a real application, you'd use a safer math expression parser (e.g.,ast.literal_eval
for simple cases, or a dedicated library likenumexpr
). Our basic character check is a minimal safeguard. -
System Prompt in
call_gemini_api
:
The system prompt now instructs the LLM on how to use the calculator:
system_prompt = f"""You are a helpful AI assistant. You have access to the following tools: 1. Calculator: - Description: Solves mathematical expressions. - To use: Respond with "TOOL_REQUEST: CALCULATOR(expression)" where 'expression' is the math problem (e.g., "TOOL_REQUEST: CALCULATOR(2+2*8/4-1)"). - Only use the calculator for math questions. For other questions, answer directly.
{( "You also have access to the following information from our knowledge base:\n" + relevant_knowledge) if relevant_knowledge else ""}
Based on the user's query and the available tools/information, provide an answer.
If you use a tool, ONLY output the TOOL_REQUEST string. Do not add any other text.
If you don't need a tool, provide a direct answer.
"""
```
This is crucial. The LLM needs clear instructions on *when* and *how* to request a tool.
-
Tool Handling Logic in
call_gemini_api
:
After getting the initial response from Gemini:
if llm_response_text.startswith("TOOL_REQUEST: CALCULATOR("): expression = llm_response_text.replace("TOOL_REQUEST: CALCULATOR(", "").replace(")", "").strip() print(f"Tool Request Detected: CALCULATOR, Expression: {expression}") calc_result = simple_calculator(expression) print(f"Calculator Result: {calc_result}") # Now, call Gemini API again with the tool's result prompt_with_tool_result = f"""You asked to use the calculator for the expression '{expression}'. The calculator returned: '{calc_result}'. Based on this result, please formulate a natural language response to the original user query: '{prompt_text}'.""" # ... (second call to Gemini to get final answer) ... # This part is already in the full app.py code block
This "two-step" LLM call (first to decide on tool use, second to formulate response with tool output) is common in agent architectures.
2. Test the Calculator Tool:
Restart your Flask app (python app.py
).
Now, send a request that should trigger the calculator:
curl -X POST -H "Content-Type: application/json" -d "{\"message\":\"What is 100 / 4 + 7?\"}" [http://127.0.0.1:5000/chat](http://127.0.0.1:5000/chat)
Check your Flask console output. You should see something like:
Original user prompt: What is 100 / 4 + 7?
LLM Raw Output: TOOL_REQUEST: CALCULATOR(100 / 4 + 7)
Tool Request Detected: CALCULATOR, Expression: 100 / 4 + 7
Calculator Result: 32.0
... (Gemini API call with tool result) ...
And the JSON response should be something like:
{
"reply": "100 / 4 + 7 is 32.0."
}
If it works, congratulations! Your agent can now use tools.
Part 4: Giving Your Agent Knowledge - Retrieval Augmented Generation (RAG)
RAG allows your agent to pull in information from external sources before generating a response. This is great for:
- Answering questions about specific, up-to-date, or proprietary information.
- Reducing "hallucinations" (incorrect information) from the LLM.
We'll implement a very simple RAG:
- Create a small knowledge base (a few text snippets).
- When the user asks a question, we'll do a basic keyword search in our knowledge base.
- If relevant information is found, we'll add it to the prompt we send to Gemini.
1. Create Knowledge Base:
The app.py
already has a KNOWLEDGE_BASE
dictionary:
KNOWLEDGE_BASE = {
"company_info.txt": "Our company, 'AI Innovators Inc.', was founded in 2023. We specialize in creating cutting-edge AI solutions for businesses. Our mission is to democratize AI technology.",
"product_specs.txt": "Our flagship product, 'AgentX', is an advanced AI agent platform. It features modular tool integration, dynamic RAG capabilities, and a user-friendly interface. It supports natural language processing for complex queries."
}
In a real application, this could be a folder of text files, a database, or a vector store.
2. Implement Retrieval Logic:
The call_gemini_api
function in the provided app.py
already includes a basic RAG implementation:
# 1. RAG: Retrieve relevant knowledge
relevant_knowledge = ""
# Simple keyword-based retrieval
for doc_name, content in KNOWLEDGE_BASE.items():
# Check if any word from the prompt (longer than 3 chars) is in the content
prompt_words = [word.lower() for word in prompt_text.split() if len(word) > 3]
if any(word in content.lower() for word in prompt_words):
relevant_knowledge += f"\n\n--- Relevant document: {doc_name} ---\n{content}\n--- End of document ---"
if relevant_knowledge:
print(f"Retrieved knowledge: {relevant_knowledge[:200]}...")
This is a very naive retrieval (simple keyword matching). More advanced RAG systems use:
- Text Embeddings: Convert text into numerical vectors.
- Vector Databases: Store these vectors and efficiently search for similar (semantically related) text. Libraries like FAISS, ChromaDB, or Pinecone are used for this. We're keeping it simple for now.
3. Integrate Retrieved Context into the Prompt:
The system_prompt
in call_gemini_api
is updated to include this retrieved knowledge:
system_prompt = f"""You are a helpful AI assistant.
# ... (tool instructions) ...
{( "You also have access to the following information from our knowledge base:\n" + relevant_knowledge) if relevant_knowledge else ""}
Based on the user's query and the available tools/information, provide an answer.
# ...
"""
If relevant_knowledge
is found, it's prepended to the LLM's instructions.
4. Test RAG:
Restart your Flask app.
Try asking questions related to your knowledge base:
curl -X POST -H "Content-Type: application/json" -d "{\"message\":\"Tell me about your company\"}" [http://127.0.0.1:5000/chat](http://127.0.0.1:5000/chat)
You should see in the Flask console that company_info.txt
was retrieved, and the agent's reply should be based on that content.
Example response:
{
"reply": "Our company, 'AI Innovators Inc.', was founded in 2023 and specializes in creating cutting-edge AI solutions for businesses. Our mission is to democratize AI technology."
}
Try another one:
curl -X POST -H "Content-Type: application/json" -d "{\"message\":\"What are the features of AgentX?\"}" [http://127.0.0.1:5000/chat](http://127.0.0.1:5000/chat)
This should retrieve product_specs.txt
.
Your agent now has a basic form of RAG!
Part 5: Building the User Interface (HTML, Tailwind CSS, JavaScript)
Let's create a simple web page to chat with our agent. We'll use HTML for structure, Tailwind CSS for styling (for a modern look with minimal custom CSS), and JavaScript to communicate with our Flask backend.
1. Create templates
and static
folders:
In your my_ai_agent
project root, create two folders:
-
templates
: Flask looks for HTML files here. -
static
: For CSS, JavaScript, images, etc.
2. Create index.html
:
Inside the templates
folder, create index.html
:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>My AI Agent</title>
<script src="[https://cdn.tailwindcss.com](https://cdn.tailwindcss.com)"></script>
<style>
/* Custom scrollbar for chat (optional) */
#chatbox::-webkit-scrollbar {
width: 8px;
}
#chatbox::-webkit-scrollbar-track {
background: #f1f1f1;
border-radius: 10px;
}
#chatbox::-webkit-scrollbar-thumb {
background: #888;
border-radius: 10px;
}
#chatbox::-webkit-scrollbar-thumb:hover {
background: #555;
}
.user-message {
background-color: #DCF8C6; /* Light green, WhatsApp style */
align-self: flex-end;
}
.agent-message {
background-color: #E5E7EB; /* Light gray */
align-self: flex-start;
}
.message-bubble {
padding: 8px 12px;
border-radius: 12px;
margin-bottom: 8px;
max-width: 70%;
word-wrap: break-word;
}
/* Loading spinner */
.loader {
border: 4px solid #f3f3f3; /* Light grey */
border-top: 4px solid #3498db; /* Blue */
border-radius: 50%;
width: 24px;
height: 24px;
animation: spin 1s linear infinite;
margin-right: 8px;
}
@keyframes spin {
0% { transform: rotate(0deg); }
100% { transform: rotate(360deg); }
}
</style>
</head>
<body class="bg-gray-100 font-sans">
<div class="container mx-auto max-w-2xl h-screen flex flex-col p-4">
<header class="mb-4 text-center">
<h1 class="text-3xl font-bold text-gray-700">AI Agent Chat</h1>
</header>
<div id="chatbox" class="flex-grow bg-white p-4 rounded-lg shadow-md overflow-y-auto mb-4 flex flex-col space-y-2">
<div class="agent-message message-bubble">
Hello! I'm your AI Agent. How can I help you today?
</div>
</div>
<div id="loadingIndicator" class="hidden items-center text-gray-600 mb-2">
<div class="loader"></div>
<span>Agent is typing...</span>
</div>
<footer class="flex">
<input type="text" id="userInput" class="flex-grow p-3 border border-gray-300 rounded-l-lg focus:outline-none focus:ring-2 focus:ring-blue-500" placeholder="Type your message...">
<button id="sendButton" class="bg-blue-500 hover:bg-blue-600 text-white p-3 rounded-r-lg font-semibold">Send</button>
</footer>
</div>
<script>
const chatbox = document.getElementById('chatbox');
const userInput = document.getElementById('userInput');
const sendButton = document.getElementById('sendButton');
const loadingIndicator = document.getElementById('loadingIndicator');
// let chatHistory = []; // Optional: to maintain history for more complex context
function addMessage(message, sender) {
const messageDiv = document.createElement('div');
messageDiv.classList.add('message-bubble');
if (sender === 'user') {
messageDiv.classList.add('user-message');
messageDiv.textContent = message;
} else { // agent
messageDiv.classList.add('agent-message');
// Sanitize HTML or use markdown parser if agent can return HTML/Markdown
messageDiv.innerHTML = message.replace(/\n/g, '<br>'); // Basic newline handling
}
chatbox.appendChild(messageDiv);
chatbox.scrollTop = chatbox.scrollHeight; // Auto-scroll to bottom
}
async function sendMessage() {
const messageText = userInput.value.trim();
if (messageText === '') return;
addMessage(messageText, 'user');
userInput.value = '';
loadingIndicator.classList.remove('hidden');
loadingIndicator.classList.add('flex');
sendButton.disabled = true;
userInput.disabled = true;
try {
const response = await fetch('/chat', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
},
body: JSON.stringify({ message: messageText /*, history: chatHistory */ }),
});
if (!response.ok) {
const errorData = await response.json();
addMessage(`Error: ${errorData.error || response.statusText}`, 'agent');
console.error("Error from server:", errorData);
} else {
const data = await response.json();
addMessage(data.reply, 'agent');
// chatHistory.push({role: "user", parts: [{text: messageText}]});
// chatHistory.push({role: "model", parts: [{text: data.reply}]});
}
} catch (error) {
addMessage('Failed to connect to the agent. Please try again.', 'agent');
console.error('Network error:', error);
} finally {
loadingIndicator.classList.add('hidden');
loadingIndicator.classList.remove('flex');
sendButton.disabled = false;
userInput.disabled = false;
userInput.focus();
}
}
sendButton.addEventListener('click', sendMessage);
userInput.addEventListener('keypress', function(event) {
if (event.key === 'Enter') {
sendMessage();
}
});
</script>
</body>
</html>
Explanation of index.html
:
- Tailwind CSS: Included via CDN for quick styling.
-
Layout: A simple chat interface with a header, message display area (
chatbox
), and an input area (userInput
,sendButton
). - CSS: Some custom styles for message bubbles and scrollbar.
-
JavaScript (
<script>
block):-
addMessage(message, sender)
: Appends a new message to the chatbox, styled differently for 'user' and 'agent'. -
sendMessage()
:- Gets text from
userInput
. - Calls
addMessage
to display the user's message. - Shows a loading indicator.
- Sends a
fetch
POST request to your/chat
Flask endpoint. - Handles the JSON response (or error).
- Calls
addMessage
to display the agent's reply. - Hides loading indicator.
- Gets text from
- Event Listeners: Attached to the send button and the input field (for Enter key).
-
chatHistory
(commented out): This is a placeholder if you wanted to implement more sophisticated context management by sending the history back to the server. Our currentapp.py
doesn't use it in a structured way yet for simplicity.
-
3. Update app.py
to Serve the HTML:
Modify your app.py
to render index.html
. You'll need render_template
from Flask.
# app.py (add this import at the top)
from flask import Flask, request, jsonify, render_template
# ... (rest of your app.py code) ...
@app.route('/')
def index():
return render_template('index.html') # Serve the HTML UI
# ... (rest of your app.py code) ...
4. Test the UI:
- Make sure your Flask app (
app.py
) is running (restart it if you made changes). - Open your web browser and go to
http://127.0.0.1:5000/
.
You should see your chat interface! Try sending messages:
- "Hello"
- "What is 15 * 3 - 5?"
- "Tell me about AI Innovators Inc."
- "What does AgentX do?"
You should see the agent respond, use the calculator, and pull information from its knowledge base, all within your new UI!
Part 6: Putting It All Together
Let's review the final project structure and the complete code.
Project Directory Structure:
my_ai_agent/
├── venv/ # Virtual environment
├── static/ # (Currently empty, for future CSS/JS files if separated)
├── templates/
│ └── index.html # Frontend UI
├── .env # API Key
├── app.py # Backend Flask App (Agent Logic)
└── (Other files like requirements.txt if you generate it)
Complete app.py
(Recap - this is the same as developed through the tutorial):
# app.py
import os
import json
import requests
from flask import Flask, request, jsonify, render_template
from dotenv import load_dotenv
load_dotenv()
app = Flask(__name__)
GEMINI_API_KEY = os.getenv("GEMINI_API_KEY")
if not GEMINI_API_KEY:
raise ValueError("GEMINI_API_KEY not found. Please set it in the .env file.")
GEMINI_API_URL = f"[https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash-latest:generateContent?key=](https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash-latest:generateContent?key=){GEMINI_API_KEY}"
KNOWLEDGE_BASE = {
"company_info.txt": "Our company, 'AI Innovators Inc.', was founded in 2023. We specialize in creating cutting-edge AI solutions for businesses. Our mission is to democratize AI technology.",
"product_specs.txt": "Our flagship product, 'AgentX', is an advanced AI agent platform. It features modular tool integration, dynamic RAG capabilities, and a user-friendly interface. It supports natural language processing for complex queries."
}
def simple_calculator(expression):
try: # Corrected: 'try' was missing a colon in the original markdown
if not all(c in "0123456789+-*/(). " for c in expression):
return "Error: Invalid characters in expression."
return str(eval(expression))
except Exception as e:
return f"Error: {str(e)}"
def call_gemini_api(prompt_text):
print(f"Original user prompt: {prompt_text}")
relevant_knowledge = ""
prompt_words = [word.lower() for word in prompt_text.split() if len(word) > 3]
for doc_name, content in KNOWLEDGE_BASE.items():
if any(word in content.lower() for word in prompt_words):
relevant_knowledge += f"\n\n--- Relevant document: {doc_name} ---\n{content}\n--- End of document ---"
if relevant_knowledge:
print(f"Retrieved knowledge: {relevant_knowledge[:200]}...")
system_prompt = f"""You are a helpful AI assistant.
You have access to the following tools:
1. Calculator:
- Description: Solves mathematical expressions.
- To use: Respond with "TOOL_REQUEST: CALCULATOR(expression)" where 'expression' is the math problem (e.g., "TOOL_REQUEST: CALCULATOR(2+2*8/4-1)").
- Only use the calculator for math questions. For other questions, answer directly.
{( "You also have access to the following information from our knowledge base:\n" + relevant_knowledge) if relevant_knowledge else ""}
Based on the user's query and the available tools/information, provide an answer.
If you use a tool, ONLY output the TOOL_REQUEST string. Do not add any other text.
If you don't need a tool, provide a direct answer.
"""
full_prompt_for_llm = f"{system_prompt}\n\nUser query: {prompt_text}"
payload = {
"contents": [{"role": "user", "parts": [{"text": full_prompt_for_llm}]}],
"generationConfig": {"temperature": 0.7, "topK": 1, "topP": 1, "maxOutputTokens": 2048}
}
headers = {"Content-Type": "application/json"}
response = requests.post(GEMINI_API_URL, headers=headers, data=json.dumps(payload))
if response.status_code == 200:
response_data = response.json()
try:
llm_response_text = response_data.get("candidates", [])[0].get("content", {}).get("parts", [])[0].get("text", "").strip()
except (IndexError, KeyError, AttributeError) as e:
print(f"Error parsing Gemini response: {e}\nData: {response_data}")
return "Error: Could not parse LLM response."
print(f"LLM Raw Output: {llm_response_text}")
if llm_response_text.startswith("TOOL_REQUEST: CALCULATOR("):
expression = llm_response_text.replace("TOOL_REQUEST: CALCULATOR(", "").replace(")", "").strip()
print(f"Tool Request Detected: CALCULATOR, Expression: {expression}")
calc_result = simple_calculator(expression)
print(f"Calculator Result: {calc_result}")
prompt_with_tool_result = f"""You asked to use the calculator for the expression '{expression}'.
The calculator returned: '{calc_result}'.
Based on this result, please formulate a natural language response to the original user query: '{prompt_text}'."""
payload_for_final_answer = {
"contents": [{"role": "user", "parts": [{"text": prompt_with_tool_result}]}],
"generationConfig": {"temperature": 0.7, "topK": 1, "topP": 1, "maxOutputTokens": 2048}
}
final_response = requests.post(GEMINI_API_URL, headers=headers, data=json.dumps(payload_for_final_answer))
if final_response.status_code == 200:
final_response_data = final_response.json()
try:
return final_response_data.get("candidates", [])[0].get("content", {}).get("parts", [])[0].get("text", "").strip()
except (IndexError, KeyError, AttributeError):
return f"Calculator result: {calc_result}. (Error formatting final LLM response)"
else:
return f"Calculator result: {calc_result}. (Error getting final LLM response: {final_response.text})"
else:
return llm_response_text
else:
print(f"Error calling Gemini API: {response.status_code} - {response.text}")
return f"Error: Could not connect to LLM. Status: {response.status_code}"
@app.route('/chat', methods=['POST'])
def chat():
data = request.json
user_message = data.get('message')
if not user_message:
return jsonify({"error": "No message provided"}), 400
agent_response = call_gemini_api(user_message)
return jsonify({"reply": agent_response})
@app.route('/')
def index():
return render_template('index.html')
if __name__ == '__main__':
app.run(debug=True, host='0.0.0.0', port=5000)
Complete templates/index.html
(Recap):
This is the same as provided in Part 5.
Running the Full Application:
- Ensure your
.env
file has yourGEMINI_API_KEY
. - Activate your virtual environment:
source venv/bin/activate
orvenv\Scripts\activate
. - Run the Flask app:
python app.py
. - Open
http://127.0.0.1:5000/
in your browser.
You now have a working AI agent with a basic UI, tool usage (calculator), and RAG capabilities!
Part 7: Next Steps & Advanced Concepts
You've built a solid foundation. Here are some ways to enhance your agent:
-
More Sophisticated Tools:
-
Web Search: Allow your agent to search the internet (e.g., using Google Search API or a library like
googlesearch-python
). - API Integrations: Connect to other APIs (weather, stocks, calendar, etc.).
- Code Execution: (With extreme caution and sandboxing) Allow the agent to write and execute simple Python scripts.
-
Web Search: Allow your agent to search the internet (e.g., using Google Search API or a library like
-
Advanced RAG:
-
Vector Embeddings: Use models like
text-embedding-ada-002
(OpenAI) or Sentence Transformers to convert your knowledge base documents and user queries into vector embeddings. - Vector Databases: Store these embeddings in a vector database (FAISS, Chroma, Pinecone, Weaviate) for efficient similarity search. This finds semantically similar information, not just keyword matches.
- Chunking: Break down large documents into smaller, manageable chunks for better retrieval.
-
Vector Embeddings: Use models like
-
Agent Memory:
-
Short-term Memory: Pass the recent chat history to the LLM so it remembers the context of the current conversation. (Our
index.html
has a commented-outchatHistory
variable; you'd need to send this to the backend and incorporate it into thecall_gemini_api
function's payload for Gemini.) - Long-term Memory: Store summaries of past conversations or key facts in a database for the agent to recall later.
-
Short-term Memory: Pass the recent chat history to the LLM so it remembers the context of the current conversation. (Our
-
Better Prompt Engineering:
- Experiment with different system prompts to improve the agent's reasoning, tool usage, and response quality.
- Look into techniques like "Chain of Thought" or "ReAct" (Reasoning and Acting) prompting.
-
Error Handling & Robustness:
- Add more comprehensive error handling for API calls, tool execution, etc.
- Validate inputs and outputs more strictly.
-
Asynchronous Operations:
- For tools that might take time (like web searches), use asynchronous tasks (e.g., with Celery or asyncio) in your Flask app to prevent blocking.
-
Agent Frameworks:
- Explore frameworks like LangChain or LlamaIndex. They provide pre-built components and abstractions that can significantly speed up agent development, especially for complex agents with many tools and data sources.
-
UI Enhancements:
- Markdown rendering for agent responses.
- Streaming responses (so text appears word by word).
- User authentication.
Conclusion
Congratulations on making it through this tutorial! You've gone from zero to building a functional AI agent capable of using tools and accessing a knowledge base, all wrapped in a web UI.
The field of AI agents is rapidly evolving, and what you've learned here are the fundamental building blocks. Keep experimenting, learning, and building. The possibilities are vast!
Happy coding!
Top comments (0)