Embeddings & Vector Stores: Turning Text into Searchable Intelligence
Welcome to Part 4 of our hands-on RAG series with LangChain.
So far, we’ve covered:
In this part, we explore how your split documents are converted into vectors (Embeddings) that can be searched, ranked, and retrieved using LLMs.
What Are Embeddings?
Embeddings are numerical vector representations of text that capture semantic meaning.
When a model generates an embedding:
It transforms “meaning” into a dense numerical vector
Similar meanings = closer vectors in multi-dimensional space
Example:
king - man + woman ≈ queen
This analogy works because embeddings preserve relational meaning.
Lets understand with some examples
- Let’s say you have a model that creates embeddings for words.
- king <-------Embeddings---------> [0.9, 0.8, 0.7]
- queen <-------Embeddings---------> [0.88, 0.82, 0.68]
- man <-------Embeddings---------> [0.5, 0.4, 0.3]
- woman <-------Embeddings---------> [0.48, 0.42, 0.28]
- apple <-------Embeddings---------> [0.1, 0.3, 0.4]
- banana <-------Embeddings---------> [0.09, 0.29, 0.41]
Here we can see ,
- King is very close to queen
- man is close to woman
- apple is close to banana but in other semantic cluster
- Embedding Table: Sentences
- "How to cook pasta?" <--Embeddings-->. [0.65, 0.88, 0.34, ..., 0.72]
- "Steps for making spaghetti" <--Embeddings--> [0.63, 0.90, 0.33, ..., 0.71]
- "What is quantum physics?" <--Embeddings--> [0.11, 0.23, 0.56, ..., 0.19]
Now calculate cosine similarity between embeddings:
- cook pasta and making spaghetti is very similar
- cook pasta and quantum physics is not similar
Semantic Search Works on Meaning, Not Just Words
In both the word and sentence embedding examples, you’ll notice a key takeaway:
Semantic search operates on vector representations — numerical values that capture meaning — not just literal word matching.
This means even if the exact words don’t appear in the query or document, the model can still understand the context and retrieve relevant results based on meaning. This is what makes LLM-powered search far more powerful than traditional keyword-based methods.
How Embeddings Power RAG
In RAG, embeddings allow us to:
Convert chunks of documents into vectors
Store these vectors in a vector database
Embed the user query at runtime
Use similarity search to fetch relevant chunks
Result: LLMs generate answers with user query + context (relevant documents).
Common Embedding Models
- OpenAIEmbeddings
- HuggingFaceEmbeddings
- GoogleGenerativeAIEmbeddings
- OllamaEmbeddings
from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langchain.schema import Document
from langchain.text_splitter import CharacterTextSplitter
embedding = GoogleGenerativeAIEmbeddings(
model="models/embedding-001",
google_api_key="YOUR_API_KEY")
#Document
faq_text = """
Q: What is your return policy?
A: You can return items within 30 days for a full refund.
Q: How long does shipping take?
A: Shipping typically takes 3-5 business days.
Q: Do you offer international shipping?
A: Yes, we ship to over 50 countries.
Q: How can I track my order?
A: You will receive a tracking link via email once your order ships.
"""
#Split the doucment
text_splitter = CharacterTextSplitter(chunk_size=200, chunk_overlap=0)
documents = text_splitter.create_documents([faq_text])
doc_embeddings = embedding.embed_documents(documents)
query_embedding = embedding.embed_query("What is the return policy?")
- embed_documents() creates a vector for each chunk
- embed_query() lets you compare a query to your document embeddings
What Are Vector Databases?
A vector database is a special kind of database designed to store and search through embeddings (vectors), which represent the semantic meaning of things like:
- Text (words, sentences, documents)
- Images
- Code
- Audio
These databases are optimized for fast similarity search — like answering:
“Find me the most similar documents to this question.”
Key Idea is :
In traditional databases search by exact values such as:
SELECT * FROM users WHERE email = '[email protected]';
But in Vector Databases perform semantic search which is based on words or sentences context or meaning. Already discussed above. To perform such operations they use cosine similarity, Euclidean distance, etc.
Use Case Flow Example*
Your PDF → Split into chunks → Embed each chunk → Store in Vector DB
User query → Embed query → Search DB → Get top chunks → Answer
Popular Vector Databases
- FAISS: Open-source by Facebook, fast, local
- Pinecone: Cloud-native, scalable, real-time updates
- Weaviate: Semantic graph + vector search
- Milvus: High-performance, GPU acceleration
- Qdrant: Rust-based, fast, open-source
- Chroma: Developer-friendly, works well with LangChain
Vector Database use cases:
- Similarity Search: Finds meaning, not just keywords
- Memory for LLMs: Used in Retrieval-Augmented Generation (RAG)
- Fast Search on Big Data: Search millions of vectors quickly
- Scalable + Flexible: Easily update, delete, filter, tag data
Code Example with Chroma
from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langchain_community.vectorstores import Chroma
from langchain.schema import Document
from langchain.text_splitter import CharacterTextSplitter
from langchain_core.output_parsers import StrOutputParser
#Document
faq_text = """
Q: What is your return policy?
A: You can return items within 30 days for a full refund.
Q: How long does shipping take?
A: Shipping typically takes 3-5 business days.
Q: Do you offer international shipping?
A: Yes, we ship to over 50 countries.
Q: How can I track my order?
A: You will receive a tracking link via email once your order ships.
"""
#Split the doucment
text_splitter = CharacterTextSplitter(chunk_size=200, chunk_overlap=0)
documents = text_splitter.create_documents([faq_text])
#Embeded model
embedings = GoogleGenerativeAIEmbeddings(google_api_key=GOOGLE_API_KEY, model=EMBEDDING_MODEL_NAME)
#create vector database
vectorstore = Chroma.from_documents(documents, embedings, persist_directory='./faq.db')
query = "What is the return policy?"
results = vectorstore.similarity_search(query)
print(results[0].page_content)
You just built a semantic search engine.
Summary
A vector database stores and retrieves embeddings, enabling machines to search by meaning rather than exact matches.
They’re essential for:
- Chatbots with memory
- Semantic search
- AI-powered search engines
- RAG pipelines
What is Cosine Similarity?
Similarity between embeddings is usually calculated using cosine similarity:
Similarity(A, B) = (A · B) / (||A|| ||B||)
- Ranges from -1 to 1
- 1 = Identical direction (most similar)
- 0 = Orthogonal (unrelated)
LangChain handles this internally when using similarity_search()
.
Best Practices
- Use same model for doc/query to prevent mismatched meaning
- Normalize content before embedding
- Store metadata in chunks
- Choose right vector store
What’s Next?
In Part 5, we’ll bring it all together using:
LangChain Chains + Output Parsers
So that the LLM can not just retrieve context — but generate structured, actionable answers!
Missed the earlier parts?
Top comments (0)