Have you ever struggled to find the latest circular or regulation from the IRDAI (Insurance Regulatory and Development Authority of India) website?
So did I β and thatβs why I decided to build a full-stack AI-powered chatbot that can:
- Scrape circulars and press notes from the IRDAI website
- Extract and embed content from PDFs/HTML
- Answer user questions with accurate regulatory information
- Suggest follow-up questions like a real assistant
- Provide a beautiful and modern UI using Angular + Bootstrap
In this blog, Iβll walk through the architecture, tech stack, and some cool agentic automation behind the scenes.
π Overview: What I Built
The IRDAI Chatbot is a fully automated system that:
- Scrapes and downloads IRDAI circulars across all paginated pages
- Parses and embeds PDFs using LangChain + OpenAI embeddings
- Stores the embeddings in a local Chroma vectorstore
- Uses a smart LangChain QA Agent to answer questions using RAG (retrieval-augmented generation)
- Offers an interactive, smooth Angular frontend with live chat, typing effects, suggestion bubbles, and animated scroll
- Suggests relevant follow-up questions based on each answer!
π§° Tech Stack
Layer | Tech |
---|---|
π§ LLM | OpenAI GPT-4 via LangChain |
π§± Vectorstore | ChromaDB |
π Embedding | OpenAIEmbeddings |
π RAG | LangChain QA chains |
π§© Agent Framework | LangGraph |
π§ Backend | FastAPI |
π§Ό PDF Parsing | PyMuPDF |
π Frontend | Angular 17 + Bootstrap 5 |
π€ Scraping | Selenium + BeautifulSoup |
π Async | Python asyncio + batching |
π§ Backend: AI Agent Architecture
I built a smart multi-step LangGraph agent with these nodes:
- Scrape Node β uses Selenium to crawl all IRDAI circular pages, follows paginated "Next" links, and downloads PDFs
- Parse Node β uses PyMuPDF to read PDF content
- Embed Node β splits content into chunks and stores embeddings in Chroma
- QA Node β answers questions by retrieving relevant docs using vector similarity
- Suggestion Node β uses another agent to suggest follow-up questions based on the bot's answer
All nodes are reusable and callable as standalone FastAPI routes too.
π§ Chatbot Flow: How Everything Connects
Hereβs a visual flow of how the chatbot works β from user input to AI agents performing RAG-based document search and response formatting:
- Text Input: User submits a question.
- ChatGPT Core: Formats the query, routes it to agents.
-
AI Agents:
-
Scraper
: Collects PDFs & press notes -
ETL
: Parses and embeds documents -
QA
: Handles similarity search + answer generation
-
- Database: Stores and retrieves document embeddings
- Chat Interface: Formats HTML responses and suggestions
This modular design ensures scalability and clarity.
βοΈ Smart Features
- β Async batching for large document QA (splits input across token-safe chunks)
- β Automatic spell correction + similar question detection
- β Answer caching to improve performance with a time-aware LRU-like strategy
- β Suggestions engine that generates related follow-up questions using a second LLM chain
- β
Common
llm_provider.py
to centralize LLM configuration across the app
π¬ Frontend: Angular Chat UI
The frontend is built with Angular 17 standalone components, styled with Bootstrap 5, and includes:
- π‘ Suggested questions before and after answers
- π€ Typing animation (blinking dots)
- π― Smart session tracking using UUIDs
- π Smooth scroll-to-bottom on every update
- β Graceful error handling
- π₯ Responsive, mobile-friendly layout
π FastAPI Backend
The backend exposes:
-
/ask
β main QA endpoint -
/suggest
β generate follow-up questions -
/scrape
β run scraper -
/embed
β re-embed new content
You can trigger scraping + embedding via the LangGraph agent, CLI, or API β fully flexible.
π§ Example Q&A
Q: What is Saral Jeevan Bima?
A: Saral Jeevan Bima is a standard term life insurance policy mandated by IRDAI...
Suggested follow-ups:
- "Who is eligible for Saral Jeevan?"
- "Is it mandatory for insurers?"
- "What are the premium limits?"
π‘ Lessons Learned
- π LangGraph is amazing for building modular multi-step agent flows.
- β οΈ Be cautious of OpenAI token limits β I had to chunk documents smartly.
- π Building a good frontend experience is just as important as the backend logic.
- β‘ Donβt forget caching when dealing with repeated queries or expensive operations.
π What's Next
- Add user authentication for session history
- Push updates to a Firebase or Netlify-hosted frontend
- Enable upload of user PDFs for comparison
- Train a custom model on domain-specific terms
π¦ Repo Coming Soon
Planning to open-source this soon. Let me know if youβd like early access!
π Letβs Connect!
If you found this useful or have feedback:
π¬ Comment below
π§ Follow me on LinkedIn
π‘ Have a chatbot idea? Letβs collaborate!
Top comments (2)
Great Work ππ
Thank you