In Part 2, I ran an LLM locally using Docker Model Runner and connected to it through a Python script. Now in Part 3, we are wrapping that logic inside a FastAPI REST API - giving us a real, local GenAI backend we can use from Postman, web apps, or CLI tools.
Let’s dive in.
Goal
- Build a FastAPI server that sends prompts to a locally running LLM (ai/mistral)
- Expose a
/generate
endpoint - Run the API container and Docker Model Runner side-by-side
What We Built
A REST API (running in Docker) that talks to Docker Model Runner via an OpenAI-compatible endpoint. You send a prompt like:
{
"prompt": ""Explain what is docker model runner in 3 points"
}
…and it responds with an AI-generated answer from a model running 100% on your machine.
Project Structure
docker-llm-fastapi-app/
├── app/
│ └── main.py ← FastAPI logic
├── Dockerfile ← API container
├── docker-compose.yml ← Orchestration
├── README.md
Check out the code here part3-code
How to Run It
1. Pull and Start the Model
docker model pull ai/mistral
docker model run ai/mistral
If you have already pulled the model from previous tutorial, running pull again is not necessary.
You may see:Interactive chat mode started. Type '/bye' to exit.
That’s okay — the API is still active behind the scenes if TCP access is enabled.
2. Start the FastAPI Server
docker compose up --build
You’ll see:
Uvicorn running on http://0.0.0.0:8000
Call the Endpoint
Send a request from Postman or curl:
POST http://localhost:8000/generate
Content-Type: application/json
{
"prompt": "What is MLOps in simple terms?"
}
Output:
{
"response": "MLOps, short for Machine Learning Operations, is a practice for collaboration and..."
}
Things I Learned
1. Interactive Mode Still Enables API
Even though Docker says:
Interactive chat mode started. Type '/bye' to exit.
…the HTTP API is still available on localhost:12434
. As long as TCP support is enabled in Docker Desktop, it works fine.
2. First Call Is Slow
The first request took ~2 minutes. Why?
- The model is loaded into memory
- Runtime warmup takes time
But after that, future prompts are little better.
What’s Next
In Part 4 I plan to build a - Prompt Templates + Role Options which adds a practical layer of prompt engineering to your GenAI app.
Stay tuned!
Top comments (0)