Ashutosh Sarangi

Posted on Jun 24

Gen AI, RAGS, CAG, Fine tuning, VectorDB, LangChain, LangSmith, LangGraph, Lang Flow, MCP, AI Agents, Agentic RAG

#genai #langchain #mcp #agentai

What is an LLM?

A Large Language Model (LLM) is an instance of a foundation model, applied specifically to text and text-like data (such as code). LLMs are trained on large datasets of text, including books, articles, and conversations. Training these models involves using petabytes of data.

Data Scale

178 million words ≈ 1 GB of data
1 million GB ≈ 1 PB of data

Parameters

A parameter is a value the model can change independently as it learns.

Core Components of LLM

LLM = Data + Transformer Architecture + Training

Business Applications (Use Cases)

Customer Support
Content Articles
Software Development (e.g., Copilot)

Foundation models are pre-trained on large amounts of unlabeled and self-supervised data.

Self-Supervised Data

Self-supervised data means the model learns from patterns in the data in a way that produces generalizable and adaptable output.

Small vs. Large Language Models

Small Language Models (SLMs) have a small number of parameters.
Large Language Models (LLMs) have a large number of parameters.

Examples

Mistral 7B: 7 billion parameters (SLM)
LLAMA 700B: 700 billion parameters (LLM)

LLM Use Cases

Code Generation
Document Analysis
Multilingual Translation

Small Model Use Cases

On-Device AI
Everyday Summarization
Enterprise Chatbots

What is an AI Agent?

LLMs can be part of a compound AI system, leading to the creation of AI agents.

Traditional LLMs

If you ask a traditional LLM about your remaining vacations, it might generate an incorrect answer because it does not have access to your specific data.

Compound AI Systems (RAGs)

In a compound AI system:

Query → Search Query (LLM) → Your Database → Generate (LLM) → Answer (Correct)

This system fetches the response from your database, ensuring accuracy.

However, if the query changes (e.g., asking about the weather in Limassol), the system might fail if the dedicated path breaks.

Control Logic in Compound AI Systems

Control logic in compound AI systems is defined programmatically by a human programmer. Another approach is to put the LLM in charge of the logic, leveraging its advanced capabilities to handle complex problems and generate plans.

Agentic Approach

When LLMs control the logic, we refer to this as an agentic approach. Let's break down the components of LLM agents:

LLM Agents

Reasoning: Ability to think
Ability to Act: Via external tools (e.g., web search, database access, calculator, code execution)
Access Memory

Configuring Agents

One popular method is the ReAct approach, combining reasoning and action components.

ReAct Agent Configuration

User Query → Plan/Think (LLM) → Act (via Tool) → Observe the Answer (Iterate if Wrong) → Provide Correct Answer

Retrieval-Augmented Generation (RAG)

LLM Challenges

No Source
Outdated Knowledge

Addressing the Challenges

When a user prompts an LLM, it may not have the latest information. Additionally, without sources, the truthfulness of the response cannot be ensured.

RAG Framework

The RAG framework addresses these issues:

When a user writes a prompt, the LLM queries the document to find the answer from the provided data, rather than relying solely on pre-trained information.
A well-crafted prompt is essential to avoid hallucinations.

CAG (Cache Augmented Generation)

rather than querying knowledge data base for answer the core idea to preload the entire knowledge base and load these information into the context window

RAG:-

CAG:-

Example:-

Rag:- Law research papers (10000 pages of documents)

CAG:- IT help desk (Product Manual of 100page) Not frequently changing

Clinical Decision Support (Both RAG + CAG)

How can we improve the Model's Answer?

There are 3 ways

RAG
Fine Tuning
prompt Engineering

Fine tuning

for this we need

Supervised learning: training the model
We need additional GPUs for the training

Prompt Engineering

Well, in prompt engineering, we can say that "Think Step by Step". as it has transfer architecture. It will activate the Self attention, and without RAG/Finetuning we can achieve our result.

Example:- Validate code (Wrong prompt)

Good Prompt:-
Validate code security by identifying inputs, checking memory management, and list potential vulnerabilities with their cuss scores.
We can add role into the prompt.

LangChain

LangChain is an orchestration framework for the development of applications that use large language models (LLMs), including multiple LLMs.

Modules in LangChain

LLM Support
- Nearly all LLMs can be added to LangChain, such as Llama2 and GPT-4.
Prompts
- These are instructions given to LLM models. The PromptTemplate class in LangChain formalizes the composition of prompts without the need to manually hard-code context and queries.
- A prompt template can contain instructions like: "Don't use technical terms in your response." We can specify output format and few-shot context.
Chains
- Chains are the core of LangChain's workflow. They combine LLMs with other components, creating applications by executing a sequence of functions.
- For example, an application might need to first retrieve data from a website, then summarize the text it gets back, and finally use that summary to answer user-submitted questions. This is a sequential chain where the output of one function acts as the input for the next function. Each function in the chain could use different prompts, parameters, and even different models (LLMs).
Indexes
- To achieve certain tasks, LLMs might need to access specific external data sources not included in the training dataset, such as internal documents or emails. LangChain collectively refers to these documents as indexes.

Indexes:

Document Loader: Works with third-party applications for importing data from sources like file storage services, Dropbox, Google Drive, MongoDB, and Pandas.
Vector Database: Represents data points by converting them into vector embeddings, which are numerical representations in the form of vectors with a fixed number of dimensions. This format is very efficient for retrieval.
Text Splitters: Split text into small, semantically meaningful chunks that can then be combined using methods and parameters of your choosing.

Memory
- LLMs by default don't have long-term memory of prior conversations unless you pass the chat history as an input to your query. LangChain solves this problem with simple utilities for adding memory to your application.
- You have options for retaining entire conversations or just a summarization of the conversation so far.
Agents
- Agents can use a given language model as a reasoning engine to determine which actions to take. When building a chain for agents, you want to include inputs like a list of available tools, user input, prompts, and queries.

LangChain Use Cases

Chatbots
Summarizations
Question Answering (e.g., Jira)
Data Augmentation

LangGraph

LangGraph is designed for building stateful multi-agent systems that can handle complex, nonlinear workflows (asynchronous).

LangSmith

LangSmith is used for monitoring purposes in applications built with LangChain or LangGraph.

LangFlow

LangFlow allows you to create a Minimum Viable Product (MVP) model of your ideas using a drag-and-drop component. However, it is not suitable for production use.

What is a Vector Database?

A vector database represents data as mathematical vector embeddings. Vector embeddings are simple arrays of numbers that capture the semantic essence of the data. Similar items are positioned close together in vector space, while dissimilar items are positioned far apart. With a vector database, we can perform similarity searches using mathematical operations. We can store image files, text, and audio files by converting these complex files into vector embeddings, which can then be stored in the vector database.

Example of Vector Embeddings

[0.91, 0.15, 0.83, ..., N]

How are Vector Embeddings Created?

Vector embeddings are created using embedding models that have been trained on massive datasets. Each type of data has its own specialized embedding model.

Examples of Embedding Models

CLIP (images)
GloVe (text)
Wav2Vec (audio)

Vector Indexing

When we have millions of records in our vector database, each containing thousands of pieces of information (vector embeddings), comparing a query vector to every single vector in the database would be very slow. Vector indexing uses approximate nearest neighbor (ANN) algorithms to quickly find vectors that are very likely to be among the closest matches.

Examples of Vector Indexing Methods

HNSW (Hierarchical Navigable Small World): Creates multi-layer graphs connecting similar vectors.
IVF (Inverted File Index): Divides the vector space into clusters and only searches the most relevant clusters.

These indexing methods trade a small amount of accuracy for significant improvements in search speed. Vector databases are a core feature of Retrieval-Augmented Generation (RAG).

MCP vs. API

Introduction

For LLMs to be truly useful, they often need to interact with external data sources, services, and tools, which was typically done with APIs. In late 2024, Anthropic introduced a new open standard protocol called MCP (Model Context Protocol).

MCP Components

MCP Host: Runs a number of MCP clients.
MCP Clients: Each client opens a JSON RPC 2.0 session using the MCP protocol, connecting to external MCP servers.
MCP Servers: Provide capabilities like access to databases, code repositories, or email servers.

Capabilities of MCP

MCP addresses two main needs of LLM applications (AI agents):

Context: Provides contextual data.
Tools: Enables the usage of tools by AI agents.

MCP provides a standard way for AI agents to retrieve external context, such as documents, knowledge bases, and database records. It can also execute actions or tools, like running a web search, calling an external service, or performing calculations.

Dynamic Self-Discovery

MCP servers have tools, resources, and prompt templates. An LLM model queries the MCP server to know what capabilities are available, allowing dynamic self-discovery.

Standardization of Interfaces

Every MCP server, regardless of the service or data it connects to, speaks the same protocol and follows the same patterns.

Note

Many MCP servers use traditional APIs to do their work. In many cases, an MCP server is essentially a wrapper around an existing API, translating between the MCP format and the underlying service's native interface.

Example

The MCP GitHub server exposes high-level tools such as repository/list as MCP primitives and internally translates each tool call into the corresponding GitHub REST API.

MCP services are available for file systems, Google Maps, Docker, Spotify, and many more, allowing better integration into AI agents in a standardized way.

Agentic RAG

Agentic RAG is an upgraded version of RAG, where the LLM decides the data source based on the user query, combining AI agents with RAG.

References:-