Presented at DevFest 2024, Johannesburg
Artificial Intelligence is no longer the exclusive domain of massive corporations or high-end research labs. With the rise of open-source models like Gemma and modular frameworks like LangChain, it is now possible to build robust, domain-specific AI systems using publicly available tools.
At DevFest Johannesburg 2024, I walked through how to build a Retrieval-Augmented Generation (RAG) system using Gemma, LangChain, and Vertex AI — with a working chatbot demo and live deployment.
What is Retrieval-Augmented Generation (RAG)?
RAG is an AI architecture that combines:
- Information retrieval (search, databases, document indexes)
- Generative language models (like Gemma or Gemini)
This enables systems to access external knowledge, retrieve relevant content, and use that content to ground the generation of responses, improving both factual accuracy and relevance.
Why Use RAG?
Traditional language models are limited to what they were trained on. RAG addresses this limitation by adding real-time context from external sources. Some of the key advantages of RAG systems include:
- Access to the latest information
- Factual grounding and domain control
- Semantic search with vector databases and relevance-based re-ranking
- Enhanced response quality and trustworthiness
How It Works
RAG systems typically follow this pipeline:
1. Retrieval and Preprocessing
- External documents (e.g., knowledge bases, PDFs, databases) are searched using vector similarity or keyword search.
- Retrieved results are cleaned, tokenized, and filtered to prepare for use in generation.
2. Grounded Generation
- Retrieved and preprocessed content is injected into the input context of the LLM.
- The model then uses this grounded context to generate responses that are both accurate and contextually rich.
Core Concepts in RAG
When building a RAG system, several key processes are involved:
Chunking
Splitting large documents into smaller chunks for indexing and retrieval.
Embedding
Transforming text chunks into high-dimensional vectors using an embedding model, making them searchable by meaning.
Indexing
Storing embeddings in a vector database like PostgreSQL with pgvector
for fast similarity search.
Retrieval
Searching for relevant chunks based on user input.
Grounding
Combining retrieved context with the user query to provide better background for the language model.
Generation
Using the LLM (in our case, Gemma) to generate an answer based on the grounded context.
Building with LangChain and Gemma
We built the system using the following stack:
Component | Technology |
---|---|
LLM | Gemma (Open-source) |
Framework | LangChain + LangGraph |
Storage | PostgreSQL + pgvector |
UI / API | FastAPI + LangServe |
Deployment | Cloud Run (Serverless) |
LangChain
LangChain provides modular tools for handling each step of the RAG pipeline — from data ingestion to generation chaining.
LangGraph
LangGraph allows building orchestrated workflows and agentic behaviors, essential for more complex RAG pipelines.
LangSmith
We used LangSmith to debug, monitor, and test our RAG applications throughout development.
Live Demo: Local RAG Chatbot with Gemma and Vertex AI
During the presentation, we built a fully functional RAG chatbot that:
- Accepts user queries
- Searches documents using vector search
- Grounds the query with retrieved chunks
- Generates an answer using Gemma
Services Used
- LangChain – Framework for building the RAG app
- Cloud Run – Deployment of both the indexing and inference pipeline
- PostgreSQL on Cloud SQL – Vector database using pgvector
- FastAPI + LangServe – Simple and scalable REST API interface
Conclusion
With open models like Gemma and flexible frameworks like LangChain, building a RAG system has become far more accessible. You can now create AI applications that are not only intelligent but also grounded in real, trustworthy data — whether you're working on a chatbot, documentation assistant, or enterprise search tool.
AI is no longer locked behind proprietary walls. With the right tools and frameworks, you can build, test, and deploy contextual LLM-powered apps in a weekend.
Additional Resources
Feel free to reach out if you're building with LangChain or deploying RAG systems — always happy to connect and share ideas!
Top comments (0)