DEV Community

Cover image for Building RAG Applications with LangChain(Part-4)
Dharmendra Singh
Dharmendra Singh

Posted on

Building RAG Applications with LangChain(Part-4)

Embeddings & Vector Stores: Turning Text into Searchable Intelligence

Welcome to Part 4 of our hands-on RAG series with LangChain.

So far, we’ve covered:

In this part, we explore how your split documents are converted into vectors (Embeddings) that can be searched, ranked, and retrieved using LLMs.

What Are Embeddings?

Embeddings are numerical vector representations of text that capture semantic meaning.

When a model generates an embedding:

  • It transforms “meaning” into a dense numerical vector

  • Similar meanings = closer vectors in multi-dimensional space

Example:

king - man + woman ≈ queen

This analogy works because embeddings preserve relational meaning.

Lets understand with some examples

  1. Let’s say you have a model that creates embeddings for words.
  • king <-------Embeddings---------> [0.9, 0.8, 0.7]
  • queen <-------Embeddings---------> [0.88, 0.82, 0.68]
  • man <-------Embeddings---------> [0.5, 0.4, 0.3]
  • woman <-------Embeddings---------> [0.48, 0.42, 0.28]
  • apple <-------Embeddings---------> [0.1, 0.3, 0.4]
  • banana <-------Embeddings---------> [0.09, 0.29, 0.41]

Here we can see ,

  • King is very close to queen
  • man is close to woman
  • apple is close to banana but in other semantic cluster
  1. Embedding Table: Sentences
  • "How to cook pasta?" <--Embeddings-->. [0.65, 0.88, 0.34, ..., 0.72]
  • "Steps for making spaghetti" <--Embeddings--> [0.63, 0.90, 0.33, ..., 0.71]
  • "What is quantum physics?" <--Embeddings--> [0.11, 0.23, 0.56, ..., 0.19]

Now calculate cosine similarity between embeddings:

  • cook pasta and making spaghetti is very similar
  • cook pasta and quantum physics is not similar

Semantic Search Works on Meaning, Not Just Words

In both the word and sentence embedding examples, you’ll notice a key takeaway:

Semantic search operates on vector representations — numerical values that capture meaning — not just literal word matching.

This means even if the exact words don’t appear in the query or document, the model can still understand the context and retrieve relevant results based on meaning. This is what makes LLM-powered search far more powerful than traditional keyword-based methods.

How Embeddings Power RAG

In RAG, embeddings allow us to:

  1. Convert chunks of documents into vectors

  2. Store these vectors in a vector database

  3. Embed the user query at runtime

  4. Use similarity search to fetch relevant chunks

Result: LLMs generate answers with user query + context (relevant documents).

Common Embedding Models

  • OpenAIEmbeddings
  • HuggingFaceEmbeddings
  • GoogleGenerativeAIEmbeddings
  • OllamaEmbeddings
from langchain_google_genai import GoogleGenerativeAIEmbeddings
 from langchain.schema import Document
 from langchain.text_splitter import CharacterTextSplitter

embedding = GoogleGenerativeAIEmbeddings(
    model="models/embedding-001",
    google_api_key="YOUR_API_KEY")

#Document
faq_text = """
Q: What is your return policy?
A: You can return items within 30 days for a full refund.
Q: How long does shipping take?
A: Shipping typically takes 3-5 business days.
Q: Do you offer international shipping?
A: Yes, we ship to over 50 countries.
Q: How can I track my order?
A: You will receive a tracking link via email once your order ships.
"""
#Split the doucment
text_splitter = CharacterTextSplitter(chunk_size=200, chunk_overlap=0)
documents = text_splitter.create_documents([faq_text])

doc_embeddings = embedding.embed_documents(documents)
query_embedding = embedding.embed_query("What is the return policy?")

Enter fullscreen mode Exit fullscreen mode
  • embed_documents() creates a vector for each chunk
  • embed_query() lets you compare a query to your document embeddings

What Are Vector Databases?

A vector database is a special kind of database designed to store and search through embeddings (vectors), which represent the semantic meaning of things like:

  • Text (words, sentences, documents)
  • Images
  • Code
  • Audio

These databases are optimized for fast similarity search — like answering:

“Find me the most similar documents to this question.”

Key Idea is :

In traditional databases search by exact values such as:

SELECT * FROM users WHERE email = '[email protected]';
Enter fullscreen mode Exit fullscreen mode

But in Vector Databases perform semantic search which is based on words or sentences context or meaning. Already discussed above. To perform such operations they use cosine similarity, Euclidean distance, etc.

Use Case Flow Example*

    Your PDF → Split into chunks → Embed each chunk → Store in Vector DB

    User query → Embed query → Search DB → Get top chunks → Answer
Enter fullscreen mode Exit fullscreen mode

Popular Vector Databases

  • FAISS: Open-source by Facebook, fast, local
  • Pinecone: Cloud-native, scalable, real-time updates
  • Weaviate: Semantic graph + vector search
  • Milvus: High-performance, GPU acceleration
  • Qdrant: Rust-based, fast, open-source
  • Chroma: Developer-friendly, works well with LangChain

Vector Database use cases:

  • Similarity Search: Finds meaning, not just keywords
  • Memory for LLMs: Used in Retrieval-Augmented Generation (RAG)
  • Fast Search on Big Data: Search millions of vectors quickly
  • Scalable + Flexible: Easily update, delete, filter, tag data

Code Example with Chroma

from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langchain_community.vectorstores import Chroma
from langchain.schema import Document
from langchain.text_splitter import CharacterTextSplitter
from langchain_core.output_parsers import StrOutputParser

#Document
faq_text = """
Q: What is your return policy?
A: You can return items within 30 days for a full refund.
Q: How long does shipping take?
A: Shipping typically takes 3-5 business days.
Q: Do you offer international shipping?
A: Yes, we ship to over 50 countries.
Q: How can I track my order?
A: You will receive a tracking link via email once your order ships.
"""

#Split the doucment
text_splitter = CharacterTextSplitter(chunk_size=200, chunk_overlap=0)
documents = text_splitter.create_documents([faq_text])

#Embeded model
embedings = GoogleGenerativeAIEmbeddings(google_api_key=GOOGLE_API_KEY, model=EMBEDDING_MODEL_NAME)

#create vector database
vectorstore = Chroma.from_documents(documents, embedings, persist_directory='./faq.db')

query = "What is the return policy?"
results = vectorstore.similarity_search(query)
print(results[0].page_content)
Enter fullscreen mode Exit fullscreen mode

You just built a semantic search engine.

Summary

A vector database stores and retrieves embeddings, enabling machines to search by meaning rather than exact matches.

They’re essential for:

  • Chatbots with memory
  • Semantic search
  • AI-powered search engines
  • RAG pipelines

What is Cosine Similarity?

Similarity between embeddings is usually calculated using cosine similarity:

Similarity(A, B) = (A · B) / (||A|| ||B||)

  • Ranges from -1 to 1
  • 1 = Identical direction (most similar)
  • 0 = Orthogonal (unrelated)

LangChain handles this internally when using similarity_search().

Best Practices

  • Use same model for doc/query to prevent mismatched meaning
  • Normalize content before embedding
  • Store metadata in chunks
  • Choose right vector store

What’s Next?

In Part 5, we’ll bring it all together using:

LangChain Chains + Output Parsers

So that the LLM can not just retrieve context — but generate structured, actionable answers!

Missed the earlier parts?

Top comments (0)