DEV Community

Cover image for Secure Local RAG with Role-Based Access: Spring AI, Ollama & MongoDB
Tim Kelly
Tim Kelly

Posted on

Secure Local RAG with Role-Based Access: Spring AI, Ollama & MongoDB

Retrieval-augmented generation (RAG) gives our language models the power to fetch more relevant, timely, and informed responses. But here's the catch: What if you're not thrilled about shipping your highly sensitive company information off to some remote server farm in a distant country, just relying on a "promise" it won’t be used to train someone else's LLM?

Or maybe your organization is a bit more complex. Your legal team shouldn’t have access to the sales team's confidential data, and that overly curious intern definitely shouldn't be poking around executive-level information. In cases like these, you need RAG—but you need it secure, precise, and completely under your control.

In this tutorial, we’ll build a secure, locally-managed RAG solution that incorporates user-based access control. If you just want the code, check out the GitHub repo.

Prerequisites

Before we get started, there are a few things we'll need:

  • MongoDB Atlas account with a cluster set up (M0 free tier is perfect)
  • Docker installed
  • Java 17+ (I use 21)
  • Maven 3.9.6+

What is Ollama?

Ollama is an open-source tool that allows you to run LLMs locally on your machine. By running models locally, we can maintain full data ownership and avoid the potential security risks associated with cloud storage. Offline AI tools like Ollama are not only faster and more reliable, but also come with no token costs and no usage fees (basically free).

Setting up Ollama in Docker

Before we can actually run anything, we need to get Ollama up and running locally. This is what will power both our embedding model and chat model. We’ll keep things fully local so none of our documents ever leave our machine.

First, create a simple docker-compose.yml file:

version: '3.8'  

services:  
  ollama:  
    image: ollama/ollama  
    container_name: ollama  
    ports:  
      - "11434:11434"  
    volumes:  
      - ollama_data:/root/.ollama  
    restart: unless-stopped  
    environment:  
      - OLLAMA_HOST=0.0.0.0  

volumes:  
  ollama_data:
Enter fullscreen mode Exit fullscreen mode

This will spin up Ollama inside Docker and map port 11434 to your local machine, which is what Spring AI will connect to later.

Now, let’s start everything up and pull the models we’ll be using:

docker compose up -d
Enter fullscreen mode Exit fullscreen mode

With Ollama running, pull down the models:

# Pull the chat model
docker exec -it ollama ollama pull llama3.2

# Pull the embedding model
docker exec -it ollama ollama pull nomic-embed-text

# Verify models are installed
docker exec -it ollama ollama list
Enter fullscreen mode Exit fullscreen mode

That’s it. We’re fully set up and ready to start wiring Ollama into Spring AI.

Getting our app

Before we start actually coding, we need to set up our dependencies and configure our application.properties. We'll build a simple little API to interact with our application, and while plenty of embedding and language models are out there, we'll stick with nomic-embed-text for embeddings and llama3.2 as our chat model.

Dependencies

To set up our project, the easiest approach is to use Spring Initializr. We will be selecting Spring Boot version 3.5.0 or later, Maven as our build tool, Java as the language, and setting the Java version to 21 with JAR packaging.

For dependencies, we'll need Spring Web to handle our REST API, MongoDB Atlas Vector Search for efficient retrieval, and Spring AI Ollama for integration with our embedding and language models.

Initializr set up with dependencies

After all that, we can generate our application, and open it in an IDE of our liking. Double check you have the necessary dependencies. You should see something like this in your pom.xml once Spring Initializr generates your project:

<dependencies>
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-web</artifactId>
    </dependency>
    <dependency>
        <groupId>org.springframework.ai</groupId>
        <artifactId>spring-ai-advisors-vector-store</artifactId>
    </dependency>
    <dependency>
        <groupId>org.springframework.ai</groupId>
        <artifactId>spring-ai-starter-model-ollama</artifactId>
    </dependency>
    <dependency>
        <groupId>org.springframework.ai</groupId>
        <artifactId>spring-ai-starter-vector-store-mongodb-atlas</artifactId>
    </dependency>

    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-test</artifactId>
        <scope>test</scope>
    </dependency>
</dependencies>
<dependencyManagement>
    <dependencies>
        <dependency>
            <groupId>org.springframework.ai</groupId>
            <artifactId>spring-ai-bom</artifactId>
            <version>${spring-ai.version}</version>
            <type>pom</type>
            <scope>import</scope>
        </dependency>
    </dependencies>
</dependencyManagement>
Enter fullscreen mode Exit fullscreen mode

application.properties

Now, let’s configure our application.properties. This is where we wire everything together—MongoDB, Ollama, and a bit of Spring AI logging so we can peek under the hood while we build.

First, the basic Spring and MongoDB config:

spring.application.name=securerag

spring.data.mongodb.uri=${MONGODB_URI}
spring.data.mongodb.database=rag
server.port=8000

logging.level.org.springframework.ai=DEBUG
logging.level.org.springframework.data.mongodb=DEBUG
Enter fullscreen mode Exit fullscreen mode

Next, we configure MongoDB Vector Search. This is what allows Spring AI to store and query our document embeddings inside MongoDB Atlas:

spring.ai.vectorstore.mongodb.collection-name=vector_store
spring.ai.vectorstore.mongodb.initialize-schema=true
spring.ai.vectorstore.mongodb.metadata-fields-to-filter=department,access_level,roles
Enter fullscreen mode Exit fullscreen mode

Here, we’re telling Spring AI:

  • Which collection to use for storing our vector embeddings (vector_store).
  • Which metadata fields we’ll later want to use for filtering search results (department, access_level, and roles).
  • Whether or not to initialize the schema (initialize-schema=true). This simply means Spring AI will create the collection and index automatically if it doesn’t already exist, so we’re ready to run semantic search right out of the gate.

And finally, we wire up Ollama. This points Spring AI to the local Ollama instance we’ll be running, and tells it which models to use for embedding and chat:

spring.ai.ollama.base-url=http://localhost:11434
spring.ai.ollama.embedding.model=nomic-embed-text
spring.ai.ollama.chat.options.model=llama3.2
Enter fullscreen mode Exit fullscreen mode

Since different models excel at different tasks, we’re using nomic-embed-text for generating our embeddings (this helps turn documents into vector representations), and llama3.2 for the actual chat responses. Using separate models here gives us better results on both sides—retrieval and generation.

The business logic

Now that we’ve got everything wired up, let’s look at how the actual chat flow works behind the scenes. This is where we combine retrieval, filtering, and our chat model into one secure conversation pipeline.

The ChatService

We’ll start by building a simple Spring @Service to contain our main logic:

package com.mongodb.securerag;  

import org.springframework.ai.chat.client.ChatClient;  
import org.springframework.ai.chat.client.advisor.vectorstore.QuestionAnswerAdvisor;  
import org.springframework.ai.chat.model.ChatModel;  
import org.springframework.ai.vectorstore.SearchRequest;  
import org.springframework.ai.vectorstore.VectorStore;  
import org.springframework.stereotype.Service;  

@Service  
public class ChatService {  

    private final ChatModel chatModel;  
    private final VectorStore vectorStore;  

    ChatService(ChatModel chatModel, VectorStore vectorStore) {  
        this.chatModel = chatModel;  
        this.vectorStore = vectorStore;  
    }

    // We'll add our business methods here next
}
Enter fullscreen mode Exit fullscreen mode

Spring will automatically inject the ChatModel (which talks to Ollama) and the VectorStore (which talks to MongoDB Atlas Vector Search). This allows us to easily wire together search and chat in a clean service layer.

Sending secure messages

Now, let’s build the core method that handles chat requests, performs semantic search, and applies access control at query time. We’ll add the following to the ChatService class:

public String sendSecureMessage(String message, String userRole, String department) {  
    // Build the base QuestionAnswerAdvisor — this wires up vector search for retrieval
    QuestionAnswerAdvisor advisor = QuestionAnswerAdvisor.builder(vectorStore)  
            .searchRequest(SearchRequest.builder()  
                    .topK(5)  // Fetch top 5 most relevant documents
                    .build())  
            .build();  

    // Build the ChatClient using the advisor for retrieval-augmented context
    ChatClient filteredChatClient = ChatClient.builder(chatModel)  
            .defaultAdvisors(advisor)  
            .build();  

    // Dynamically build the filtering expression based on user role & department
    String filterExpression = createAccessFilterExpression(userRole, department);  

    // Attach the filter expression to this specific chat request
    return filteredChatClient.prompt()  
            .user(message)  
            .advisors(a -> a.param(QuestionAnswerAdvisor.FILTER_EXPRESSION, filterExpression))  
            .call()  
            .content();  
}
Enter fullscreen mode Exit fullscreen mode

A few key things happen here:

  • We build a QuestionAnswerAdvisor that handles the vector search piece.
  • We dynamically generate our security filter per request, based on who’s asking.
  • Spring AI lets us inject this filter expression into each call to enforce access control at retrieval time.

Building the filter expression

Let’s generate that dynamic filter that decides which documents each user can see:

private String createAccessFilterExpression(String userRole, String department) {  
    List<String> conditions = new ArrayList<>();  

    // Public documents are always accessible
    conditions.add("access_level == 'public'");  

    // Add role-based conditions if the user has a specific role
    if (userRole != null && !userRole.equalsIgnoreCase("public")) {  
        StringBuilder roleCondition = new StringBuilder();  
        roleCondition.append("(roles in ['").append(userRole).append("', 'all_employees'])");  

        // Further restrict by department if provided
        if (department != null && !department.trim().isEmpty()) {  
            roleCondition.append(" && (department == '").append(department).append("')");  
        }  

        conditions.add(roleCondition.toString());  
    }  

    // Join everything together into one filter string
    return String.join(" || ", conditions);  
}
Enter fullscreen mode Exit fullscreen mode

Quick summary:

  • If we’re public: We only get public docs.
  • If we have a role: We get both public docs and documents matching our role (and optionally, department).
  • This gives us fine-grained access control entirely at query time, no additional layers needed.

Under the hood, Spring AI takes this filter string and parses it using its built-in filter expression language. This syntax isn’t raw MongoDB query language, but rather a simple boolean expression format supported by Spring AI. Operators like ==, in, &&, ||, and parentheses are translated by Spring AI into proper MongoDB $vectorSearch filters when the query is executed. For example, our filter string:

access_level == 'public' || (roles in ['hr_manager', 'all_employees']) && (department == 'HR')
Enter fullscreen mode Exit fullscreen mode

Would be automatically parsed into the filter expression:

{
  "$or": [
    { "access_level": { "$eq": "public" } },
    {
      "$and": [
        { "roles": { "$in": ["hr_manager", "all_employees"] } },
        { "department": { "$eq": "HR" } }
      ]
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

The key is that Spring AI expects the filter string to follow this limited expression grammar. You cannot send full MongoDB queries or arbitrary text. As long as your string follows this format, Spring AI handles parsing and conversion automatically.

Creating an API

Alright, now that the brains of the system are built, we need a way to actually talk to it. Let’s wire up a super simple REST API so we can hit it with requests and see our secure RAG flow in action.

The request payload

We’ll create a plain old Java object to handle chat requests coming into our app. Nothing fancy, just a little DTO that carries the message, role, and department info along for the ride:

package com.mongodb.securerag;  

public class ChatRequest {  
    private String message;  
    private String userRole;  
    private String department;  

    public ChatRequest() {}  // default constructor needed for Spring to deserialize  

    public ChatRequest(String message, String userRole, String department) {  
        this.message = message;  
        this.userRole = userRole;  
        this.department = department;  
    }  

    public String getMessage() { return message; }  
    public void setMessage(String message) { this.message = message; }  

    public String getUserRole() { return userRole; }  
    public void setUserRole(String userRole) { this.userRole = userRole; }  

    public String getDepartment() { return department; }  
    public void setDepartment(String department) { this.department = department; }  
}
Enter fullscreen mode Exit fullscreen mode

Basically:

  • message: what the user is asking
  • userRole: who’s asking
  • department: which department they're in, if relevant

Spring will automatically turn incoming JSON into this object for us. Easy.

The controller

This is where we expose our endpoints, just a small API surface so we can send chat requests to the back end:

package com.mongodb.securerag;  

import org.springframework.web.bind.annotation.*;  

@RestController  
@RequestMapping("/chat")  
public class ChatController {  

    private final ChatService chatService;  

    public ChatController(ChatService chatService) {  
        this.chatService = chatService;  
    }  

    @GetMapping("/{message}")  
    public String sendMessage(@PathVariable String message,  
                              @RequestParam(defaultValue = "public") String userRole,  
                              @RequestParam(required = false) String department) {  
        return chatService.sendSecureMessage(message, userRole, department);  
    }

    @PostMapping("/secure")  
    public String sendSecureMessage(@RequestBody ChatRequest request) {  
        return chatService.sendSecureMessage(request.getMessage(), request.getUserRole(), request.getDepartment());  
    }  
}
Enter fullscreen mode Exit fullscreen mode

Here’s what’s going on:

  • The GET endpoint makes it easy to test quickly—just drop a URL in your browser or hit it with curl.
  • The POST endpoint gives us a bit more flexibility—we can send in the full request body with role and department data.
  • Both routes hand off to our ChatService, which handles the filtering logic and secure retrieval behind the scenes.

Perfect, here’s a little insert you can drop right after you explain the controller:

Quick note on GET vs POST

You might be wondering: Why do we have both a GET and a POST endpoint for this?

  • The GET endpoint is mainly here for convenience. It makes testing easier when you're still building. You can fire off quick curl commands or hit it directly from your browser to sanity check things.
  • The POST endpoint is better suited for real usage. It allows us to send the full chat request as JSON, which is much cleaner if you're dealing with longer queries, special characters, or calling the API from an actual front end or another system. It also gives us room to grow later if we want to pass in more fields.
  • Technically speaking, since this isn't a simple "read" operation (we're doing retrieval, filtering, embedding comparisons, and inference), many would argue this should be POST-only once you go into production.

For now, we’ll keep both while we’re building, but if you were turning this into a real product, you’d likely just expose the POST endpoint.

And that’s it. The API layer is done. You’ve now got a secure RAG system you can hit with real HTTP requests.

Load some sample data

Before we can test anything, we need some documents loaded into our vector store—otherwise, our model will just sit there politely answering nothing.

We’ll load up some simple HR, Finance, Sales, and Executive documents, each tagged with different access levels, roles, and departments. This will give us enough data to see our filtering in action.

Here’s our data loader:

package com.mongodb.securerag;

import org.springframework.ai.document.Document;
import org.springframework.ai.vectorstore.VectorStore;
import org.springframework.boot.CommandLineRunner;
import org.springframework.stereotype.Component;

import java.util.*;

@Component
public class DocumentLoader implements CommandLineRunner {

    private final VectorStore vectorStore;

    public DocumentLoader(VectorStore vectorStore) {
        this.vectorStore = vectorStore;
    }

    @Override
    public void run(String... args) throws Exception {
        loadSampleDocuments();
    }

    public void loadSampleDocuments() {
        List<Document> documents = new ArrayList<>();

        // HR Documents
        documents.add(new Document(
                "Employee Handbook - Vacation Policy: All full-time employees are entitled to 15 days of paid vacation per year.",
                Map.of(
                        "department", "HR",
                        "access_level", "confidential",
                        "roles", Arrays.asList("hr_manager", "hr_coordinator"),
                        "title", "Vacation Policy"
                )
        ));

        documents.add(new Document(
                "Salary Review Process: Annual salary reviews are conducted in Q4 of each year.",
                Map.of(
                        "department", "HR",
                        "access_level", "restricted",
                        "roles", Arrays.asList("hr_manager", "executive"),
                        "title", "Salary Review Process"
                )
        ));

        // Finance Documents
        documents.add(new Document(
                "Q3 Budget Report: Total revenue for Q3 was $2.4M.",
                Map.of(
                        "department", "Finance",
                        "access_level", "confidential",
                        "roles", Arrays.asList("finance_manager", "cfo", "executive"),
                        "title", "Q3 Budget Report"
                )
        ));

        documents.add(new Document(
                "Expense Policy: All business expenses over $100 require manager approval.",
                Map.of(
                        "department", "Finance",
                        "access_level", "public",
                        "roles", Arrays.asList("all_employees"),
                        "title", "Expense Policy"
                )
        ));

        // Public Company Information
        documents.add(new Document(
                "Company Mission Statement: Our mission is to democratize access to artificial intelligence.",
                Map.of(
                        "department", "General",
                        "access_level", "public",
                        "roles", Arrays.asList("all_employees", "public"),
                        "title", "Company Mission Statement"
                )
        ));

        documents.add(new Document(
                "Office Hours and Location: Our main office is located at 123 Tech Street, San Francisco, CA.",
                Map.of(
                        "department", "General",
                        "access_level", "public",
                        "roles", Arrays.asList("all_employees", "public"),
                        "title", "Office Information"
                )
        ));

        // Sales Documents
        documents.add(new Document(
                "Q4 Sales Targets: Individual sales targets for Q4 are set at $500K per sales representative.",
                Map.of(
                        "department", "Sales",
                        "access_level", "confidential",
                        "roles", Arrays.asList("sales_rep", "sales_manager"),
                        "title", "Q4 Sales Targets"
                )
        ));

        // Executive Documents
        documents.add(new Document(
                "Strategic Planning 2025: Key initiatives include international expansion and AI model optimization.",
                Map.of(
                        "department", "Executive",
                        "access_level", "restricted",
                        "roles", Arrays.asList("ceo", "cto", "cfo", "executive"),
                        "title", "Strategic Planning 2025"
                )
        ));

        System.out.println("Loading " + documents.size() + " sample documents...");
        vectorStore.add(documents);
        System.out.println("Sample documents loaded successfully!");
    }
}
Enter fullscreen mode Exit fullscreen mode

We inject the VectorStore directly into our loader, and on application startup (using CommandLineRunner), it loads all the documents automatically. Each document contains the main text plus metadata fields: access_level, roles, department, and title. This metadata structure maps directly to the filters we wrote earlier in createAccessFilterExpression().

This metadata is what we’ll later use to apply filtering logic during search. Public documents will always be visible to any user, while the restricted documents will only appear if the user has the appropriate role and department permissions.

That’s it! Our app will automatically load these into MongoDB Atlas when we start it up.

Testing it out

Now, let’s actually give this thing a spin. We can run it with:

export MONGODB_URI="YOUR_CONNECTION_STRING>"

mvn spring-boot:run
Enter fullscreen mode Exit fullscreen mode

With our app running locally, we can hit the API directly using curl and see our secure retrieval logic in action. First, make sure to give it a moment to insert our sample data and create our embeddings.

Let’s try a simple public query where no role is provided. Anyone should be able to access this:

curl "http://localhost:8000/chat/What%20is%20our%20mission?userRole=public"
Enter fullscreen mode Exit fullscreen mode

And voila!

Based on the provided context, the company's mission statement is:

"Our mission is to democratize access to artificial intelligence by building tools that make AI accessible to businesses of all sizes. We believe in ethical AI development and transparent business practices."

This appears to be a repeated statement, but it provides the core of the company's mission and values.
Enter fullscreen mode Exit fullscreen mode

We get back our mission statement, as expected, pulled from the public data. The model also surfaces a bit of related context (office hours, expense policy, etc.), since it's retrieving semantically similar documents.

Next, let’s simulate an HR manager querying for vacation policy. This time, we pass both the role and department to unlock access to more restricted content:

curl "http://localhost:8000/chat/vacation%20policy?userRole=hr_manager&department=HR"
Enter fullscreen mode Exit fullscreen mode

And we get back a fairly well informed response:

Based on the provided context, I see that there are multiple policies mentioned:

1. Vacation Policy: 15 days of paid vacation per year, with a maximum carryover of 5 unused days to the next year.

2. Expense Policy: All expenses over $100 require manager approval and must be submitted within 30 days of the expense date.

3. Office Hours and Location: The main office is located at 123 Tech Street, San Francisco, CA, with office hours from Monday to Friday, 9 AM to 6 PM Pacific Time.

If you could provide more context or clarify your question, I\'d be happy to try and assist further!     
Enter fullscreen mode Exit fullscreen mode

Now, we’re seeing both the vacation policy and other relevant internal documents that the HR manager is allowed to see. The access filter is working exactly as intended.

Finally, let’s check what happens if a public user asks the same vacation policy question:

curl "http://localhost:8000/chat/vacation%20policy?userRole=public"
Enter fullscreen mode Exit fullscreen mode

As expected, the system correctly returns no private HR data, and limits the response to public information only.

I don't see any user comments or questions regarding a "vacation policy". The provided text only includes information about an expense policy, office hours and location, and company mission statement. Could you please provide more context or clarify what you would like to know about a vacation policy? I'll do my best to help.
Enter fullscreen mode Exit fullscreen mode

Since this user isn’t authorized for confidential HR documents, the system correctly avoids leaking private data. Instead, the model only references public information and doesn’t return any HR policy details.

Conclusion

And we’ve built a fully functional, secure, locally-managed RAG system using Spring AI, Ollama, and MongoDB Atlas. The key here wasn’t just getting RAG working (plenty of tutorials stop there), but actually layering access control directly into the retrieval step itself. This way, we’re not handing the language model private data it was never supposed to see in the first place. We also don’t have to pay for token usage, so we aren’t paying for each request.

With this foundation, you can easily extend things further: add more roles, more metadata, more complex filters, richer documents, or swap in different models depending on your use case. Most importantly, you're in full control. No third-party model provider ever sees your sensitive documents, but you still get the power of retrieval-augmented LLM responses for your users.

Check out some of my other tutorials, like Building a Real-Time AI Fraud Detection System with Spring Kafka and MongoDB or Your Guide to Optimizing Slow Queries, and learn more about what you can do with MongoDB and Java.

Top comments (0)