DEV Community

Cover image for Building RAG Applications with LangChain: Part 1
Dharmendra Singh
Dharmendra Singh

Posted on • Originally published at Medium

Building RAG Applications with LangChain: Part 1

Welcome to a brand new series where we deep-dive into building RAG (Retrieval-Augmented Generation) applications using LangChain, LLMs (like ChatGPT/Gemini), and modern vector databases.

In the previous part of this series, we explored how to build foundational LLM applications using tools like chains, structured output parsers, prompt engineering, and more.

👉 If you’re not yet familiar with concepts like LangChain basics, prompt templates, output parsers, LCEL (LangChain Expression Language), and chains, I recommend checking out the earlier articles in this series for a solid foundation.

LangChain basics & LCEL

Part-2: Document Loader

Now, we’re taking it a step further: infusing LLMs with factual, external knowledge using RAG — one of the most important design patterns in LLM-powered systems today.

What is RAG(Retrieval-Augmented Generation)?

Retrieval: Retrieve relevant documents or passages based on the user query.

Augmentation: Use the retrieved documents as additional context.

Generation: Generate a response based on the retrieved content plus the user query.

Retrieval-Augmented Generation (RAG) is a technique that combines information retrieval and text generation. Instead of asking an LLM to generate answers from its internal knowledge alone, we first retrieve relevant documents from a data source and feed them into the prompt.

This allows LLMs to:

  • Generate responses grounded in external data
  • Work with up-to-date and domain-specific knowledge
  • Reduce hallucination
  • Enable enterprise and private data use

Think of RAG as “search + summarize” powered by an LLM.

Why Use RAG?

Retrieval-Augmented Generation (RAG) offers key advantages over using traditional LLMs alone. Here's how they compare:

  • Both Traditional LLMs and RAG-enabled LLMs are trained on large datasets.

  • Traditional LLMs cannot access real-time or private data, but

  • RAG-enabled LLMs can, via external sources or databases.

  • Traditional LLMs are prone to hallucinations, while RAG-enabled LLMs are more reliable due to grounding with real data.

  • Traditional LLMs often give generic or unverified answers, whereas RAG-enabled LLMs provide grounded, source-backed responses.

  • Traditional LLMs may not be ideal for production use alone, but RAG-enabled LLMs are well-suited for real-world production apps.

If you’re building apps like:

  • AI search assistants
  • Chat with PDFs or websites
  • Domain-specific Q&A
  • Legal/medical document readers …you’ll want RAG.

Core Components of a RAG Pipeline

Here’s a breakdown of each core building block in a LangChain-based RAG app:

1. Document Loader

LangChain offers a wide variety of document loaders to help you ingest and process data from various sources and formats. These loaders are essential for preparing unstructured data for use in LLM-powered applications.

Supported Sources

  • Local files (PDFs, text, markdown, etc.)
  • URLs and web pages
  • APIs and JSON endpoints
  • Databases (e.g., SQL, MongoDB)

Common Formats

  • PDF, CSV, Markdown, HTML, DOCX
  • Web pages and plain text
  • Notion, Airtable, and more

Under-the-Hood Tools

  • unstructured
  • BeautifulSoup
  • PyMuPDF
  • pdfminer.six
  • pypdf
  • html2text

Popular LangChain Loaders

  • PyPDFLoader – For reading PDF files
  • WebBaseLoader – For scraping and parsing content from web pages
  • UnstructuredFileLoader – For general-purpose file parsing using the unstructured library
  • BSHTMLLoader – Parses raw HTML using BeautifulSoup
  • CSVLoader – Ingests CSV files into document chunks
  • NotionDBLoader – Loads structured content directly from Notion databases
  • DirectoryLoader – Loads multiple documents from a folder in bulk

These loaders make it easy to turn raw content into structured Document objects ready for chunking, embedding, or retrieval.

from langchain.document_loaders import PyPDFLoader  
loader = PyPDFLoader("sample.pdf")  
documents = loader.load()
Enter fullscreen mode Exit fullscreen mode

2. Text Splitter

  • Splits large texts into manageable chunks.
  • Improves vector relevance and performance.
  • Tools: RecursiveCharacterTextSplitter, TokenTextSplitter.
from langchain.text_splitter import RecursiveCharacterTextSplitter  
splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)  
chunks = splitter.split_documents(documents)
Enter fullscreen mode Exit fullscreen mode

3. Embeddings & Vector Store

  • Converts text chunks into numerical vectors.
  • Stores them in a vector database for similarity search.
  • Tools: OpenAIEmbeddings, GooglePalmEmbeddings, FAISS, Chroma, Pinecone.
from langchain.vectorstores import FAISS  
from langchain.embeddings import OpenAIEmbeddings  
db = FAISS.from_documents(chunks, OpenAIEmbeddings())
Enter fullscreen mode Exit fullscreen mode

4. Retriever

  • Interfaces with the vector store to fetch similar documents based on a query.
  • Returns top k relevant chunks.
retriever = db.as_retriever(search_type="similarity", k=3)
Enter fullscreen mode Exit fullscreen mode

5. Prompt Template

  • Formats the retrieved chunks and the user’s question into a single prompt.
  • May include instructions for the LLM.

template = """Use the context below to answer the question:  
{context}  
Question: {question}  
Answer:  
"""
Enter fullscreen mode Exit fullscreen mode

6. LLM / ChatModel

  • The large language model (ChatGPT, Gemini, Claude) that processes the prompt.
  • Can be tuned for summarization, Q&A, or reasoning.

7. RAG Chain

LangChain lets you connect all these with a RetrievalQA or a custom LCEL chain.

from langchain.chains import RetrievalQA  

qa_chain = RetrievalQA.from_chain_type(  
    llm=llm,  
    retriever=retriever,  
    chain_type="stuff"  # or refine, map_reduce  
)  
qa_chain.run("What did the author say about LangChain?")

Enter fullscreen mode Exit fullscreen mode

LCEL chain:

from langchain_core.prompts import PromptTemplate  
from langchain.chains.combine_documents.stuff import StuffDocumentsChain  
from langchain_core.runnables import RunnablePassthrough  

## Define a simple prompt  
prompt = PromptTemplate.from_template(  
    "Answer the following question based on the context:\n\n{context}\n\nQuestion: {question}"  
)  

## Combine documents using the 'stuff' method  
document_chain = prompt | llm  

## Build the full LCEL chain  
qa_chain = {  
    "context": retriever | RunnablePassthrough(),  
    "question": RunnablePassthrough()  
} | document_chain  

## Invoke the chain  
response = qa_chain.invoke("What did the author say about LangChain?")  
print(response)

Enter fullscreen mode Exit fullscreen mode

RAG Flow Diagram

[Document Loader] → [Text Splitter] → [Embeddings]
↓
[Vector Store]
↑
[Retriever (k documents)]
↑
[User Query] → [Prompt Template] + [Docs] → [LLM] → [Answer]

Why RAG?

RAG bridges the gap between static LLMs and dynamic, real-world applications. Instead of retraining models, we teach them via retrieval — making them faster, safer, and more context-aware.

Whether you’re building internal tools, smart search engines, or AI copilots — RAG is a must-have skill.

This article outlines the complete technology stack we use to build Retrieval-Augmented Generation (RAG) applications, along with the reasoning behind their growing importance in the GenAI landscape. Beginning with this introduction, we’ll explore each component of the RAG architecture in detail. Once we’ve covered all the essential building blocks, we’ll move on to developing several real-world, end-to-end RAG applications.

Let’s get started — the RAG journey begins here.

Top comments (0)