Dharmendra Singh

Posted on Jun 23 • Originally published at Medium

Building RAG Applications with LangChain: Part 1

#langchain #llm #ai #langgraph

Welcome to a brand new series where we deep-dive into building RAG (Retrieval-Augmented Generation) applications using LangChain, LLMs (like ChatGPT/Gemini), and modern vector databases.

In the previous part of this series, we explored how to build foundational LLM applications using tools like chains, structured output parsers, prompt engineering, and more.

👉 If you’re not yet familiar with concepts like LangChain basics, prompt templates, output parsers, LCEL (LangChain Expression Language), and chains, I recommend checking out the earlier articles in this series for a solid foundation.

LangChain basics & LCEL

Part-2: Document Loader

Now, we’re taking it a step further: infusing LLMs with factual, external knowledge using RAG — one of the most important design patterns in LLM-powered systems today.

What is RAG(Retrieval-Augmented Generation)?

Retrieval: Retrieve relevant documents or passages based on the user query.

Augmentation: Use the retrieved documents as additional context.

Generation: Generate a response based on the retrieved content plus the user query.

Retrieval-Augmented Generation (RAG) is a technique that combines information retrieval and text generation. Instead of asking an LLM to generate answers from its internal knowledge alone, we first retrieve relevant documents from a data source and feed them into the prompt.

This allows LLMs to:

Generate responses grounded in external data
Work with up-to-date and domain-specific knowledge
Reduce hallucination
Enable enterprise and private data use

Think of RAG as “search + summarize” powered by an LLM.

Why Use RAG?

Retrieval-Augmented Generation (RAG) offers key advantages over using traditional LLMs alone. Here's how they compare:

Both Traditional LLMs and RAG-enabled LLMs are trained on large datasets.
Traditional LLMs cannot access real-time or private data, but
RAG-enabled LLMs can, via external sources or databases.
Traditional LLMs are prone to hallucinations, while RAG-enabled LLMs are more reliable due to grounding with real data.
Traditional LLMs often give generic or unverified answers, whereas RAG-enabled LLMs provide grounded, source-backed responses.
Traditional LLMs may not be ideal for production use alone, but RAG-enabled LLMs are well-suited for real-world production apps.

If you’re building apps like:

AI search assistants
Chat with PDFs or websites
Domain-specific Q&A
Legal/medical document readers …you’ll want RAG.

Core Components of a RAG Pipeline

Here’s a breakdown of each core building block in a LangChain-based RAG app:

1. Document Loader

LangChain offers a wide variety of document loaders to help you ingest and process data from various sources and formats. These loaders are essential for preparing unstructured data for use in LLM-powered applications.

Supported Sources

Local files (PDFs, text, markdown, etc.)
URLs and web pages
APIs and JSON endpoints
Databases (e.g., SQL, MongoDB)

Common Formats

PDF, CSV, Markdown, HTML, DOCX
Web pages and plain text
Notion, Airtable, and more

Under-the-Hood Tools

unstructured
BeautifulSoup
PyMuPDF
pdfminer.six
pypdf
html2text

Popular LangChain Loaders

PyPDFLoader – For reading PDF files
WebBaseLoader – For scraping and parsing content from web pages
UnstructuredFileLoader – For general-purpose file parsing using the unstructured library
BSHTMLLoader – Parses raw HTML using BeautifulSoup
CSVLoader – Ingests CSV files into document chunks
NotionDBLoader – Loads structured content directly from Notion databases
DirectoryLoader – Loads multiple documents from a folder in bulk

These loaders make it easy to turn raw content into structured Document objects ready for chunking, embedding, or retrieval.

from langchain.document_loaders import PyPDFLoader  
loader = PyPDFLoader("sample.pdf")  
documents = loader.load()

2. Text Splitter

Splits large texts into manageable chunks.
Improves vector relevance and performance.
Tools: RecursiveCharacterTextSplitter, TokenTextSplitter.

from langchain.text_splitter import RecursiveCharacterTextSplitter  
splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)  
chunks = splitter.split_documents(documents)

3. Embeddings & Vector Store

Converts text chunks into numerical vectors.
Stores them in a vector database for similarity search.
Tools: OpenAIEmbeddings, GooglePalmEmbeddings, FAISS, Chroma, Pinecone.

from langchain.vectorstores import FAISS  
from langchain.embeddings import OpenAIEmbeddings  
db = FAISS.from_documents(chunks, OpenAIEmbeddings())

4. Retriever

Interfaces with the vector store to fetch similar documents based on a query.
Returns top k relevant chunks.

retriever = db.as_retriever(search_type="similarity", k=3)

5. Prompt Template

Formats the retrieved chunks and the user’s question into a single prompt.
May include instructions for the LLM.


template = """Use the context below to answer the question:  
{context}  
Question: {question}  
Answer:  
"""

6. LLM / ChatModel

The large language model (ChatGPT, Gemini, Claude) that processes the prompt.
Can be tuned for summarization, Q&A, or reasoning.

7. RAG Chain

LangChain lets you connect all these with a RetrievalQA or a custom LCEL chain.

from langchain.chains import RetrievalQA  

qa_chain = RetrievalQA.from_chain_type(  
    llm=llm,  
    retriever=retriever,  
    chain_type="stuff"  # or refine, map_reduce  
)  
qa_chain.run("What did the author say about LangChain?")

LCEL chain:

from langchain_core.prompts import PromptTemplate  
from langchain.chains.combine_documents.stuff import StuffDocumentsChain  
from langchain_core.runnables import RunnablePassthrough  

## Define a simple prompt  
prompt = PromptTemplate.from_template(  
    "Answer the following question based on the context:\n\n{context}\n\nQuestion: {question}"  
)  

## Combine documents using the 'stuff' method  
document_chain = prompt | llm  

## Build the full LCEL chain  
qa_chain = {  
    "context": retriever | RunnablePassthrough(),  
    "question": RunnablePassthrough()  
} | document_chain  

## Invoke the chain  
response = qa_chain.invoke("What did the author say about LangChain?")  
print(response)

RAG Flow Diagram

[Document Loader] → [Text Splitter] → [Embeddings] ↓ [Vector Store] ↑ [Retriever (k documents)] ↑ [User Query] → [Prompt Template] + [Docs] → [LLM] → [Answer]

Why RAG?

RAG bridges the gap between static LLMs and dynamic, real-world applications. Instead of retraining models, we teach them via retrieval — making them faster, safer, and more context-aware.

Whether you’re building internal tools, smart search engines, or AI copilots — RAG is a must-have skill.

This article outlines the complete technology stack we use to build Retrieval-Augmented Generation (RAG) applications, along with the reasoning behind their growing importance in the GenAI landscape. Beginning with this introduction, we’ll explore each component of the RAG architecture in detail. Once we’ve covered all the essential building blocks, we’ll move on to developing several real-world, end-to-end RAG applications.

Let’s get started — the RAG journey begins here.

DEV Community