Preparing Your Application for Production
Production is not about code. It’s about resilience.
In the previous article of AI Lab serie, we explored how to secure API keys and sensitive secrets — a crucial step to avoid exposing your infrastructure to unnecessary risks.
But protecting secrets is only the beginning. When your application reaches production, the stakes change: what was once a developer’s concern becomes an operational and strategic challenge.
🙃 Someone recently told me how proud he was to roll out a Gmail auto-responder using OpenAI and n8n — in “just 10 minutes.” It scanned inbound emails, used GPT-4 to generate a reply, and dropped it neatly into Gmail drafts. No code, no sweat. But here’s what he didn’t consider: every message — client queries, legal threads, supplier updates — was being sent as-is to a public LLM endpoint, without any filtering, anonymization, or logging safeguards. Ten minutes to deploy. Zero minutes to think about DLP, GDPR, or reputational damage. The automation worked perfectly... until his legal team saw the audit trail.
For C-Level leaders, the message is clear:
👉 Going live is not just a technical milestone. It’s a governance milestone.
Efficient logs, strict access controls, and proactive data-leak mitigation are not just engineering best practices — they are risk management strategies that protect the company’s assets, reputation, and clients.
In this article, we’ll address three key aspects that every production-ready application — AI or otherwise — should consider:
1️⃣ Structured logs — making logs actionable, not just verbose.
2️⃣ CORS — securing cross-origin requests without breaking legitimate clients.
3️⃣ Mitigating data breach risks — because real-world security is never just about prevention — it’s about resilience. And as we’ll see, AI introduces a whole new set of risks.
Logs Are Not Just Logs
Naturally, your dev team is using a logger everywhere… because debugging without logs is like flying a plane at night with no instruments. But logging isn’t just about writing lines to a file — the real question is whether it’s being used efficiently in production.
Unstructured vs Structured Logs
Not all logs are created equal.
Why Structured Logs Matter
By adopting structured logging, your logs become more than just traces of execution—they turn into a rich source of intelligence across teams:
Best Practice
Structured logging isn’t just a nice-to-have. It’s part of the essential toolkit for any team aiming to manage production systems effectively, reduce downtime, and make logs a strategic asset rather than a liability.
AI Lab Python Backbone: Observability Starts Here
As we’ve seen, our AI Lab core engine is built with FastAPI / Python. This module — the heart of the stack — must be fully logged, since it’s responsible for handling requests, managing data, and driving the overall business logic. For this, we use Loguru, a modern logging library for Python that combines simplicity with powerful features. Unlike the standard logging module, Loguru requires almost no boilerplate, supports rotation and retention out of the box, and can export logs in both human-friendly and machine-readable formats.
Alternatives exist — for example, the native Python logging library (with JSON formatters), or specialized structured-logging frameworks like structlog. However, Loguru strikes a good balance between ease of use and production readiness.
In our setup, we use two exporters (called sinks in Loguru):
Here’s what happens:
First sink → sends logs to the console (stdout) in a simple, human-readable format like:
✔️ Anything you write to stdout (or stderr) inside a Docker container is captured by the Docker logging driver.
✔️ You can then view it with: docker logs <container_name>
✔️ Most orchestrators (Kubernetes, Docker Swarm, Nomad) rely on this convention — they scrape logs from container stdout/stderr, not from files.
So this is useful for developers… here’s an example of the kind of raw log output they usually work with.
Second sink → writes logs to /app/logs/ai-stack.log with two key features:
✔️ Rotation: the file is split once it reaches 10 MB, preventing unbounded growth.
✔️ Retention: rotated files are kept for 7 days, then deleted automatically.
✔️ Serialization: logs are stored in structured JSON format, making them machine-readable and easily parsable by log aggregators like Promtail → Loki → Grafana. (We will see that later with observability)
Thanks to the structure log, it is possible to parse it easily and to present the same sequence like this:
This dual-exporter configuration gives us the best of both worlds:
🎯 Readable logs for humans, structured logs for observability tools.
Log Smarter: Best Practices for Production-Ready Observability
We now have the right tool and the appropriate sinks in place — but without discipline in every method, ensuring that all useful events are consistently logged, even the best setup will fall short.
What to log (and why)
Levels & signal-to-noise
Privacy & compliance
Content quality
CORS Explained: Controlling Who Talks to Your API
As we covered before, you’ll typically have a Docker container — like the one illustrated next.
Since we are using FastAPI (FastAPI automatically integrates with Prometheus through libraries such as prometheus-fastapi-instrumentator), the system will expose a metrics endpoint. This endpoint is valuable for monitoring, but it should never be exposed to the Internet. That’s why in our Traefik configuration, requests to "/ai-metrics" are rejected from the public router. At the same time, if a request originates from the dedicated observability network, we explicitly allow it, so our monitoring stack can collect the data securely.
When this module acts as the primary backend, your frontend is responsible for invoking its endpoints. These endpoints must be properly protected, for instance with solutions such as Keycloak with RBAC (We’ve explored these topics in earlier articles.). As we can see, the service already runs under HTTPS, which is the baseline for securing communications.
But HTTPS alone is not enough. Anyone can still see requests through their browser’s developer tools. This means an attacker could attempt to replay or forge those requests from their own application — something you absolutely want to prevent. This is where CORS (Cross-Origin Resource Sharing) comes into play. CORS is a mechanism implemented by browsers to enforce that only authorized domains can call your API. By defining which frontends are allowed, you prevent malicious sites from invoking your core service, even if they manage to reproduce the API calls.
When your frontend runs in the user’s browser, every API call to the backend is an HTTP request: for example, fetch ("https://ai-api.yourdomain.com/ai/v1/chat"). The browser, however, doesn’t just send the request blindly. Before it allows the response to be passed back to the application, it checks the CORS policy defined by the backend. If the backend explicitly allows the calling domain (e.g., https://ui.yourdomain.com), the browser delivers the response to the frontend. If not, the browser silently blocks the response, even if the backend replied with valid data.
This is important: CORS doesn’t stop someone from making a request — anyone can try with curl or Postman — but it prevents a malicious website loaded in the user’s browser from making hidden requests to your API and accessing sensitive data. In other words, the browser itself becomes the enforcement agent, ensuring that only trusted frontends can talk to your backend on behalf of a user.
How We Configure CORS in AI Lab
To enable CORS, you need to add the appropriate configuration directly to your FastAPI instance using the built-in CORSMiddleware.
What this does
Data Breaches Don’t Ask for Permission. Prepare Now.
Mitigating data breach risks is critical, because real-world security is never just about prevention — it’s about resilience. Even with strong authentication, encrypted storage, and access controls, breaches may still occur through misconfiguration, human error, or novel attack vectors. What matters is how quickly your system can detect, contain, and recover. This is especially true with AI applications, which often process large amounts of sensitive data: user queries, business records, intellectual property, or even personal identifiers. A leak in this context doesn’t only threaten compliance (LPD, GDPR, HIPAA, etc.), it can also erode customer trust and damage your company’s reputation.
For AI systems, the risks are amplified: models can be exploited to infer training data, prompts can leak information, and integrations with external APIs can expand the attack surface. Mitigation therefore requires layered defenses: monitoring unusual access patterns, anonymizing or pseudonymizing sensitive inputs, encrypting data in transit and at rest, and having an incident response plan that treats a breach as a matter of “when,” not “if.”
In the previous article, we secured the OpenAI API key, but the call itself to OpenAI remained unprotected :
No Input Validation or Guardrails
No Output Control
Control the Input, Control the Risk
In production, every user input is a potential attack vector. Whether it’s prompt injection, oversized payloads, or malicious content, leaving inputs unchecked can compromise your entire AI pipeline. Controlling the input isn’t just about validation — it’s about defining strict boundaries, filtering intent, and preventing misuse before it reaches the model.
Summary checklist
✅ Validate & sanitize inputs (size, content).
✅ Fixed system prompt + strict templates.
✅ Structured outputs.
✅ Token/time bounds.
Recommended by LinkedIn
✅ Observability without leaking prompts/PII.
Input Sanitization: Your First Shield Against Attacks
In a web context, attackers often try to inject JavaScript inside text fields to launch Cross-Site Scripting (XSS) attacks. (<script> ... malicious content ... </script>).
Here we add a simple sanitization step. It strips <script> tags from user input before sending it to the LLM. Why? Because attackers might try to sneak JavaScript into your flow — and if you ever reuse that content in a frontend, you don’t want an XSS bomb waiting for you.
Sanitization isn't one-size-fits-all. Its depth should directly reflect the sensitivity of the endpoint and the potential impact of misuse. A public search field may only require basic filtering, while an AI prompt handler — especially one interfacing with external APIs — demands strict sanitization to block injection attempts, offensive content, or data leakage vectors. The more powerful or exposed the endpoint, the more rigorous your input hygiene should be. Tailor your sanitization strategy to the intent and risk profile of each interface — not just the technical type of input.
Input Validation: The Gatekeeper of Your System
Here we enforce a simple allowlist / denylist policy. If a user query contains dangerous snippets like api key or password=, we block it. Why? Because this protects against prompt injection attempts designed to trick the LLM into revealing secrets. This is only a first step and your policies and rules must be enforced.
Input validation is the stage where sensitive information must be caught — not downstream. This is your opportunity to prevent accidental exposure of client names, phone numbers, contract terms, or other regulated data before it reaches a third-party API or AI model. Once the data is sent, it’s too late to redact it. Effective validation acts as a final checkpoint, ensuring that only safe, compliant, and intentional input moves forward in your pipeline.
Input control isn’t always about blocking — sometimes, it’s about transforming. In many cases, outright rejection may not be necessary or desirable. Instead, pseudonymization can offer a middle ground: replacing sensitive fields (names, emails, IDs) with neutral tokens or hashes that preserve functionality while protecting privacy. This approach allows you to maintain traceability, support analytics, or run model inference — without exposing real data. It’s a practical compromise between usability and compliance.
No Boundaries, No Control: Why Your LLM Needs Guardrails
When working with LLMs, a fixed system prompt combined with strict templates is critical to maintain control over the model’s behavior. A fixed system prompt acts like a guardrail — it sets the role, tone, and non-negotiable boundaries for the model (e.g. “You are a safe assistant that never reveals credentials or internal policies”). Without it, attackers could attempt prompt injection to override the instructions. On top of that, strict templates ensure that inputs and outputs follow a predictable structure: instead of letting the LLM answer freely in prose, you constrain it to a defined JSON schema or sentence pattern. This reduces ambiguity, makes parsing reliable, and lowers the risk of the model “hallucinating” unexpected content. Together, these two measures transform an LLM from a free-form chatbot into a controlled component in your architecture — one you can trust to behave consistently under both normal use and adversarial pressure.
No Surprises, No Failures: The Power of Defined Responses
Structured outputs (via Pydantic models or LangChain’s with_structured_output) are about controlling what comes back from the LLM, not just what goes in.
Here’s why they matter:
LLMs are free-form by default Without constraints, a model can return anything — prose, JSON-ish text, or unexpected hallucinations. That’s a nightmare for downstream code that expects clean data.
Pydantic models enforce a schema When you wrap outputs in a BaseModel (e.g. AnswerModel), you’re telling LangChain: “I want a JSON object.” “It must have these fields: answer: str, risk_level: str (or whatever you define).” If the model deviates (extra keys, missing fields, invalid types), it fails fast and you catch the error instead of corrupting your pipeline.
Consistency for downstream systems APIs, databases, dashboards (Grafana, Prometheus, etc.) rely on structured data. If the LLM output is guaranteed to fit a schema, you can log it, export it, or query it without ad-hoc parsing.
Security + policy Structured outputs prevent “prompt injection” attempts that try to smuggle instructions or secrets back through the response. The model can only produce the whitelisted fields.
Better developer experience With with_structured_output(AnswerModel) you essentially get a typed contract between the AI and your backend, just like any other API. It feels like working with strongly typed code, not arbitrary strings.
So in short: Structured outputs turn the LLM into a predictable function call that always returns a known shape, instead of a free-text black box.
Token and Time Limits: Production-Safe by Design
max_tokens → caps output size ⇒ controls cost and prevents runaway replies.
timeout → caps call duration ⇒ protects latency SLOs (Service Level Objectives) and avoids stuck requests.
Bonus — Taming Temperature: Precision vs. Creativity
Temperature is one of those knobs that is simple but has big implications in your context (production API + LangChain + structured output).
It’s the randomness parameter in the LLM’s sampling.
Range: 0.0 (deterministic) → 1.0 (creative / diverse).
Higher = more variety, lower = more consistency.
You want predictable JSON / schema outputs.
For structured APIs, temperature=0 is best practice.
Privacy-First Logging: Keep Insights, Drop the Risks
We’ve already seen how to generate efficient logs. But what happens inside LangChain?
In production, you need full traceability (to debug, measure cost, alert) without turning logs into a data-leak. The rule: log metadata, not payloads.
What to capture
What not to capture
Custom Logging in LangChain: Your LoguruCallback Blueprint
Be cautious not to log everything from these callbacks — especially the prompts. While full logging can be helpful during development, in production you must carefully control what data is captured to avoid exposing sensitive information.
Here is the interface:
on_llm_start – called when an LLM starts, gives you the serialized config + prompts.
on_llm_new_token – called when a new token is streamed from the LLM.
on_llm_end – called when an LLM call finishes, gives you the LLMResult.
on_llm_error – called if the LLM call raises an exception.
on_chain_start – called when a chain starts, gives serialized config + inputs.
on_chain_end – called when a chain finishes, gives chain outputs.
on_chain_error – called if the chain raises an exception.
on_tool_start – called when a tool starts execution, with its input string.
on_tool_end – called when a tool finishes, gives tool output.
on_tool_error – called if a tool raises an exception.
on_retriever_start – called when a retriever starts, with the query string.
on_retriever_end – called when a retriever finishes, with retrieved documents.
on_retriever_error – called if a retriever raises an exception.
on_text – called when a text chunk is produced (e.g., streaming).
on_error – global error hook for any unexpected exception.
Your AI Stack Is Getting Safer — Here’s What’s in Place
The groundwork for production readiness by combining classic web practices with AI-specific safeguards. It covers structured logging for observability, strict CORS rules for controlled access, and input/output sanitization to prevent injection or data leaks. On the AI side, it introduces fixed system prompts, strict templates, and structured outputs to keep model behavior predictable, while token/time limits and a zero-temperature setup ensure stable, cost-efficient responses. Finally, it stresses observability without exposing sensitive data, showing how to log only safe metadata through LangChain callbacks.
Production is where good code goes to die unless you prepare it. If ‘secure & scalable AI’ still feels abstract, let’s make it concrete. Message me, I’ll show you how.
Read Part 5 — Add Rate Limiting
Now that your AI stack is structured, secured, and observable… it’s time to unlock the real power of AI models. But before going all-in, one critical layer remains: rate limiting. Because without it, even the best system can crash under pressure — or get abused in seconds.
In the next article, we’ll cover how to control access, prevent overloads, and protect your resources from misuse — all without degrading the user experience.
As a Fractional CTO, I help teams design efficient, scalable systems —without over-engineering.
📩 Let’s talk If you want to rethink your architecture without overengineering it, my DMs are open.
The fifth article of the AI Lab series is ready. 👉 This chapter uncovers a silent risk that could silently crash your AI system: rate limiting. Why it matters, how we handle it, and how you can too — all explained. 👇 Read the full article: https://www.linkedin.com/pulse/rate-limiting-protecting-your-ai-stack-from-overload-chatton-n2ttf
The AI Lab Series teaser: https://www.linkedin.com/posts/alexandrechatton_most-ai-talks-sound-smart-but-produce-activity-7360614211015577600-Bx8H
This chapter nails it: Bringing AI apps to production isn’t just about clever prompts, it’s about safety and observability. Structured logs, validation, guardrails, token caps — not “extras” but foundations. And the real-world practices with Python + FastAPI + LangChain + Loguru show how cloud-agnostic can also mean production-ready.