Samuel Oladejo

Posted on Jun 17

Democratizing AI: Building a RAG System with Gemma and LangChain

#rag #langchain #gemini #googlecloud

Presented at DevFest 2024, Johannesburg

Artificial Intelligence is no longer the exclusive domain of massive corporations or high-end research labs. With the rise of open-source models like Gemma and modular frameworks like LangChain, it is now possible to build robust, domain-specific AI systems using publicly available tools.

At DevFest Johannesburg 2024, I walked through how to build a Retrieval-Augmented Generation (RAG) system using Gemma, LangChain, and Vertex AI — with a working chatbot demo and live deployment.

What is Retrieval-Augmented Generation (RAG)?

RAG is an AI architecture that combines:

Information retrieval (search, databases, document indexes)
Generative language models (like Gemma or Gemini)

This enables systems to access external knowledge, retrieve relevant content, and use that content to ground the generation of responses, improving both factual accuracy and relevance.

Why Use RAG?

Traditional language models are limited to what they were trained on. RAG addresses this limitation by adding real-time context from external sources. Some of the key advantages of RAG systems include:

Access to the latest information
Factual grounding and domain control
Semantic search with vector databases and relevance-based re-ranking
Enhanced response quality and trustworthiness

How It Works

RAG systems typically follow this pipeline:

1. Retrieval and Preprocessing

External documents (e.g., knowledge bases, PDFs, databases) are searched using vector similarity or keyword search.
Retrieved results are cleaned, tokenized, and filtered to prepare for use in generation.

2. Grounded Generation

Retrieved and preprocessed content is injected into the input context of the LLM.
The model then uses this grounded context to generate responses that are both accurate and contextually rich.

Core Concepts in RAG

When building a RAG system, several key processes are involved:

Chunking

Splitting large documents into smaller chunks for indexing and retrieval.

Embedding

Transforming text chunks into high-dimensional vectors using an embedding model, making them searchable by meaning.

Indexing

Storing embeddings in a vector database like PostgreSQL with pgvector for fast similarity search.

Retrieval

Searching for relevant chunks based on user input.

Grounding

Combining retrieved context with the user query to provide better background for the language model.

Generation

Using the LLM (in our case, Gemma) to generate an answer based on the grounded context.

Building with LangChain and Gemma

We built the system using the following stack:

Component	Technology
LLM	Gemma (Open-source)
Framework	LangChain + LangGraph
Storage	PostgreSQL + pgvector
UI / API	FastAPI + LangServe
Deployment	Cloud Run (Serverless)

LangChain

LangChain provides modular tools for handling each step of the RAG pipeline — from data ingestion to generation chaining.

LangGraph

LangGraph allows building orchestrated workflows and agentic behaviors, essential for more complex RAG pipelines.

LangSmith

We used LangSmith to debug, monitor, and test our RAG applications throughout development.

Live Demo: Local RAG Chatbot with Gemma and Vertex AI

During the presentation, we built a fully functional RAG chatbot that:

Accepts user queries
Searches documents using vector search
Grounds the query with retrieved chunks
Generates an answer using Gemma

Services Used

LangChain – Framework for building the RAG app
Cloud Run – Deployment of both the indexing and inference pipeline
PostgreSQL on Cloud SQL – Vector database using pgvector
FastAPI + LangServe – Simple and scalable REST API interface

Conclusion

With open models like Gemma and flexible frameworks like LangChain, building a RAG system has become far more accessible. You can now create AI applications that are not only intelligent but also grounded in real, trustworthy data — whether you're working on a chatbot, documentation assistant, or enterprise search tool.

AI is no longer locked behind proprietary walls. With the right tools and frameworks, you can build, test, and deploy contextual LLM-powered apps in a weekend.

Additional Resources

Feel free to reach out if you're building with LangChain or deploying RAG systems — always happy to connect and share ideas!

DEV Community