Dharmendra Singh

Posted on Jun 23

Building RAG Applications with LangChain(Part-4)

#langchain #langgraph #rag #llm

Embeddings & Vector Stores: Turning Text into Searchable Intelligence

Welcome to Part 4 of our hands-on RAG series with LangChain.

So far, we’ve covered:

Part 1: RAG Theory and Architecture
Part 2: Document Loaders
Part 3: Text Splitters

In this part, we explore how your split documents are converted into vectors (Embeddings) that can be searched, ranked, and retrieved using LLMs.

What Are Embeddings?

Embeddings are numerical vector representations of text that capture semantic meaning.

When a model generates an embedding:

It transforms “meaning” into a dense numerical vector
Similar meanings = closer vectors in multi-dimensional space

Example:

king - man + woman ≈ queen

This analogy works because embeddings preserve relational meaning.

Lets understand with some examples

Let’s say you have a model that creates embeddings for words.

king <-------Embeddings---------> [0.9, 0.8, 0.7]
queen <-------Embeddings---------> [0.88, 0.82, 0.68]
man <-------Embeddings---------> [0.5, 0.4, 0.3]
woman <-------Embeddings---------> [0.48, 0.42, 0.28]
apple <-------Embeddings---------> [0.1, 0.3, 0.4]
banana <-------Embeddings---------> [0.09, 0.29, 0.41]

Here we can see ,

King is very close to queen
man is close to woman
apple is close to banana but in other semantic cluster

Embedding Table: Sentences

"How to cook pasta?" <--Embeddings-->. [0.65, 0.88, 0.34, ..., 0.72]
"Steps for making spaghetti" <--Embeddings--> [0.63, 0.90, 0.33, ..., 0.71]
"What is quantum physics?" <--Embeddings--> [0.11, 0.23, 0.56, ..., 0.19]

Now calculate cosine similarity between embeddings:

cook pasta and making spaghetti is very similar
cook pasta and quantum physics is not similar

Semantic Search Works on Meaning, Not Just Words

In both the word and sentence embedding examples, you’ll notice a key takeaway:

Semantic search operates on vector representations — numerical values that capture meaning — not just literal word matching.

This means even if the exact words don’t appear in the query or document, the model can still understand the context and retrieve relevant results based on meaning. This is what makes LLM-powered search far more powerful than traditional keyword-based methods.

How Embeddings Power RAG

In RAG, embeddings allow us to:

Convert chunks of documents into vectors
Store these vectors in a vector database
Embed the user query at runtime
Use similarity search to fetch relevant chunks

Result: LLMs generate answers with user query + context (relevant documents).

Common Embedding Models

OpenAIEmbeddings
HuggingFaceEmbeddings
GoogleGenerativeAIEmbeddings
OllamaEmbeddings

from langchain_google_genai import GoogleGenerativeAIEmbeddings
 from langchain.schema import Document
 from langchain.text_splitter import CharacterTextSplitter

embedding = GoogleGenerativeAIEmbeddings(
    model="models/embedding-001",
    google_api_key="YOUR_API_KEY")

#Document
faq_text = """
Q: What is your return policy?
A: You can return items within 30 days for a full refund.
Q: How long does shipping take?
A: Shipping typically takes 3-5 business days.
Q: Do you offer international shipping?
A: Yes, we ship to over 50 countries.
Q: How can I track my order?
A: You will receive a tracking link via email once your order ships.
"""
#Split the doucment
text_splitter = CharacterTextSplitter(chunk_size=200, chunk_overlap=0)
documents = text_splitter.create_documents([faq_text])

doc_embeddings = embedding.embed_documents(documents)
query_embedding = embedding.embed_query("What is the return policy?")

embed_documents() creates a vector for each chunk
embed_query() lets you compare a query to your document embeddings

What Are Vector Databases?

A vector database is a special kind of database designed to store and search through embeddings (vectors), which represent the semantic meaning of things like:

Text (words, sentences, documents)
Images
Code
Audio

These databases are optimized for fast similarity search — like answering:

“Find me the most similar documents to this question.”

Key Idea is :

In traditional databases search by exact values such as:

SELECT * FROM users WHERE email = '[email protected]';

But in Vector Databases perform semantic search which is based on words or sentences context or meaning. Already discussed above. To perform such operations they use cosine similarity, Euclidean distance, etc.

Use Case Flow Example*

    Your PDF → Split into chunks → Embed each chunk → Store in Vector DB

    User query → Embed query → Search DB → Get top chunks → Answer

Popular Vector Databases

FAISS: Open-source by Facebook, fast, local
Pinecone: Cloud-native, scalable, real-time updates
Weaviate: Semantic graph + vector search
Milvus: High-performance, GPU acceleration
Qdrant: Rust-based, fast, open-source
Chroma: Developer-friendly, works well with LangChain

Vector Database use cases:

Similarity Search: Finds meaning, not just keywords
Memory for LLMs: Used in Retrieval-Augmented Generation (RAG)
Fast Search on Big Data: Search millions of vectors quickly
Scalable + Flexible: Easily update, delete, filter, tag data

Code Example with Chroma

from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langchain_community.vectorstores import Chroma
from langchain.schema import Document
from langchain.text_splitter import CharacterTextSplitter
from langchain_core.output_parsers import StrOutputParser

#Document
faq_text = """
Q: What is your return policy?
A: You can return items within 30 days for a full refund.
Q: How long does shipping take?
A: Shipping typically takes 3-5 business days.
Q: Do you offer international shipping?
A: Yes, we ship to over 50 countries.
Q: How can I track my order?
A: You will receive a tracking link via email once your order ships.
"""

#Split the doucment
text_splitter = CharacterTextSplitter(chunk_size=200, chunk_overlap=0)
documents = text_splitter.create_documents([faq_text])

#Embeded model
embedings = GoogleGenerativeAIEmbeddings(google_api_key=GOOGLE_API_KEY, model=EMBEDDING_MODEL_NAME)

#create vector database
vectorstore = Chroma.from_documents(documents, embedings, persist_directory='./faq.db')

query = "What is the return policy?"
results = vectorstore.similarity_search(query)
print(results[0].page_content)

You just built a semantic search engine.

Summary

A vector database stores and retrieves embeddings, enabling machines to search by meaning rather than exact matches.

They’re essential for:

Chatbots with memory
Semantic search
AI-powered search engines
RAG pipelines

What is Cosine Similarity?

Similarity between embeddings is usually calculated using cosine similarity:

Similarity(A, B) = (A · B) / (||A|| ||B||)

Ranges from -1 to 1
1 = Identical direction (most similar)
0 = Orthogonal (unrelated)

LangChain handles this internally when using similarity_search().

Best Practices

Use same model for doc/query to prevent mismatched meaning
Normalize content before embedding
Store metadata in chunks
Choose right vector store

What’s Next?

In Part 5, we’ll bring it all together using:

LangChain Chains + Output Parsers

So that the LLM can not just retrieve context — but generate structured, actionable answers!

Missed the earlier parts?

DEV Community