guzmanojero

Posted on May 25

How I Made My Django Blog Safe After Adding Markdown

#django #python #nh3 #markdown

Using Markdown in your blog is awesome — until it opens up a security hole you didn’t expect.

In this post, I’ll show how I fixed a potential XSS vulnerability in my Django app after adding Markdown support and using the safe filter in my templates.

I’ll cover:

How to add Markdown in your templates
Why using the safe template filter can be risky
How I sanitize Markdown-rendered HTML with nh3
Where to apply sanitization: forms, models, filter?
A reusable sanitize_html() helper for clean and safe input

Let's first see the relevant parts of this project.

We have a Post model.

# models.py
class Post(models.Model):
    title = models.CharField()
    title_tag = models.CharField()
    body = models.TextField()
    user = models.ForeignKey(User, on_delete=models.CASCADE)

We have a Form model:

# forms.py
class PostForm(ModelForm):
    class Meta:
        model = Post
        fields = "__all__"

And this is the (simplified) template:

<div> {{ post.body }} </div>

What is the idea?

To allow the user to use Markdown when using the form.

First step: add markdown

We will use the markdown library.

Install it:

pip install markdown

We have to create a new folder called templatetags inside the app folder.

Inside we will create a file where we are going to register a custom filter to safely allow Markdown rendering in a specific HTML element. I'll use markdown_extras.py.

your_app/
├── templatetags/
│   └── markdown_extras.py
├── ...

# markdown_extras.py

import markdown

from django import template
from django.template.defaultfilters import stringfilter

register = template.Library()


@register.filter
@stringfilter
def convert_markdown(value):
    md = markdown.markdown(
        value,
        extensions=["markdown.extensions.fenced_code", 
                    "markdown.extensions.tables"],
    )

    return md

This code defines a custom Django template filter called convert_markdown, which takes a string written in Markdown and converts it into HTML. The markdown.markdown() method does the actual conversion.

stringfilter converts an object to its string value before being passed to your function. If you’re writing a template filter that only expects a string as the first argument, you should use the decorator stringfilter.

It ensures the filter only processes string values, avoiding potential errors if the input is None or a non-string type.

The fenced_code blocks extension adds a secondary way to define code blocks, which overcomes a few limitations of indented code blocks.

And the tables extension adds the ability to create tables in Markdown documents.

This allows users to write code blocks using triple backticks and have them rendered correctly as <pre><code> blocks in HTML.

Import your custom template filter.

At the top of the template where you want to render Markdown write {% load markdown_extras %}.

This line is used in your Django template to import your custom template filter.

When you create a custom filter in a file like markdown_extras.py you must explicitly load that file in any template where you want to use its filters.

Apply the filter in the html element

We want post.body to render Markdown:

<div> {{ post.body|convert_markdown }} </div>

We are almost there... we need one more step for rendering Markdown.

Apply the safe filter.

<div> {{ post.body|convert_markdown|safe }} </div>

We are telling Django not to escape the HTML that was generated by the convert_markdown filter.

By default, Django escapes all variables in templates to protect against XSS (Cross-Site Scripting) attacks. This means if your Markdown input contains <strong>bold</strong> or <code>, Django would show them as plain text instead of rendering them as HTML. But with safe Django won't escape the HTML.

Voilá... you can render Markdown now.

Test it with:

### This is a Header
_This is in italic_

But because Django is not escaping HTML because of the use of the safe filter you can also do this:

<script> alert("Upps... XSS") </script>

Ohhh... you now have an unsafe web that is prone to attacks.

Second step: sanitation

Sanitizing HTML means cleaning user-generated HTML content to remove any potentially dangerous code — especially scripts or malicious tags — before displaying it on a website.

When users submit content (like blog posts, comments, or Markdown), they could try to inject harmful HTML or JavaScript like this:

<script src="https://example.com/evil.js"></script>

I'll use the nh3 library for sanitation.

Install it:

pip install nh3

Use clean() to sanitize HTML fragments.

We have one question to answer now: where are we going to perform the sanitation?

There are three common places:

In the forms
In the models
In the filter

In the forms.

As we applied the safe filter in the body, the sanitation will be applied in the body as well. We have to add a clean_body method in PostForm.


import nh3
from django.forms import ModelForm

# forms.py
class PostForm(ModelForm):
    class Meta:
        model = Post
        fields = "__all__"

    def clean_body(self):
        return nh3.clean(
            html=self.cleaned_data["body"],
            tags={"p", "b", "i", "u", "em", "strong",
                "ul", "ol", "li", "a", "table", "thead",
                "tbody", "tr", "th", "td", "blockquote",
            },
        )

This method returns a cleaned html.

It basically receives the html to be sanitized as first parameter and a whitelist of allowed HTML tags. You can check all parameters here.

Why would you sanitize in the form?

✅ Pros:

Early validation: The data is sanitized before it’s saved to the database, allowing you to catch and report disallowed content to the user immediately during form submission.
Keeps the database safe: Prevents malicious HTML from ever being stored.
Works well with user input: Forms are the primary place where users submit HTML.

🚫 Cons:

Can be bypassed if data is inserted directly into the model — for example, through the Django admin, an API endpoint, or scripts run via the terminal or in a view (e.g. Post.objects.create(...))

🔥 When to use it:
Ideal for user-facing forms. You control 100% of your write paths and they all go through Django forms.

It’s a great balance between security and usability.

In the model.

If you want to be sure there is no bypassing you can sanitize directly in the model.


import nh3

from django.db import models

class Post(models.Model):
    title = models.CharField()
    title_tag = models.CharField()
    body = models.TextField()
    user = models.ForeignKey(User, on_delete=models.CASCADE)

    def save(self, *args, **kwargs):
        self.body = nh3.clean(
            self.body,
            tags={"p", "b", "i", "u", "em", "strong",
                "ul", "ol", "li", "a", "table", "thead",
                "tbody", "tr", "th", "td", "blockquote",
            },
        )
        super().save(*args, **kwargs)

You're making sure that when you save the model all the HTML will be cleaned.

You override the save() method. This ensures sanitization always happens, no matter where save() is called — from forms, views, admin, API, shell, etc.

Why would you sanitize in the model?

✅ Pros:

Catches everything: Ensures that all data saved to the model (no matter the source) is sanitized.
Centralized protection: No matter how the data enters the model, it’s cleaned.

🚫 Cons:

Not as readable or obvious
Adds logic to your model that might be better kept elsewhere.
Harder to unit test in isolation.

🔥 When to use it:
If data comes from multiple sources (not just forms), or if you're building a security-critical app.

In the filter.

Remember the convert_markdown template filter we created earlier?
When you're converting Markdown to HTML you can clean the HTML.


import markdown
import nh3

from django import template
from django.template.defaultfilters import stringfilter

@register.filter
@stringfilter
def convert_markdown(value):
    html = markdown.markdown(
        value,
        extensions=["markdown.extensions.fenced_code", 
                    "markdown.extensions.tables"],
    )
    clean_html = nh3.clean(
            html,
            tags={"p", "b", "i", "u", "em", "strong",
                "ul", "ol", "li", "a", "table", "thead",
                "tbody", "tr", "th", "td", "blockquote",
            },
    )
    return clean_html

By now I guess you already grasp the idea of what is this function doing.
No matter what is saved in the database or introduced in the form, the HTML will always be sanitized when rendering.

Why would you sanitize in the template filter?

✅ Pros:

Gives full control over rendering: You only sanitize right before displaying.
Allows raw HTML in the DB: You can reprocess it differently in the future (e.g. export to PDF).

🚫 Cons:

Risky if someone forgets to apply the filter: Dangerous content may be displayed.
Leaves dangerous HTML in your database, which may leak if you export it.

🔥 When to use it:
Only when you need to store rich content exactly as submitted, but want to control how it’s rendered in views/templates.

As a summary:

Where	Pro	Con	Use When...
Forms	Keeps DB safe early	May miss Admin/shell inserts	Users submit HTML through forms
Models	Catches all data sources	Can clutter model logic	Data comes from multiple sources
Filters	Flexible rendering	Dangerous if someone forgets to filter	You want to store raw HTML + control output

And, of course, you can combine approaches if needed.

If you're going to sanitize in many places you may apply some DRY concepts and abstract a little bit.

Instead of repeating nh3.clean() everywhere, we will create an utils function that will take care of the sanitization.

At the root level create a folder named utils, inside create a file named sanitization.py.


# utils/sanitization.py

import nh3

SAFE_TAGS = {"p", "strong", "em", "ul", "ol", "li", "a",
             "u", "h1", "h2", "h3", "table", "thead", "tbody",
             "tr", "th", "td", "pre", "code", "blockquote", "span",
}

SAFE_ATTRIBUTES = {
    "a": {"href", "title"},
    "span": {"style"},
}


def sanitize_html(html: str) -> str:
    return nh3.clean(
        html,
        tags=SAFE_TAGS,
        attributes=SAFE_ATTRIBUTES,
    )

And now we change all the places where we did sanitization:


# <app_name>/forms.py

import utils.sanitization as sanitization

def clean_body(self):
        html = self.cleaned_data["body"]
        html_cleaned = sanitization.sanitize_html(html)
        return html_cleaned


# <app_name>/models.py

import utils.sanitization as sanitization

def save(self, *args, **kwargs):
        self.body = sanitization.sanitize_html(self.body)
        super().save(*args, **kwargs)


# templatetags/markdown_extras.py

import utils.sanitization as sanitization

@register.filter
@stringfilter
def convert_markdown(value):
    html = markdown.markdown(
        value,
        extensions=["markdown.extensions.fenced_code", 
                    "markdown.extensions.tables"],
    )
    clean_html = sanitization.sanitize_html(html)
    return clean_html

Wrapping Up

Allowing users to write Markdown in your Django app makes content creation a breeze — but it also opens the door to potentially dangerous HTML if you're not careful. The safe filter tells Django to trust the output, so it's your job to ensure that what you're marking as "safe" is actually sanitized.

By using a library like nh3 to clean the rendered HTML, you can keep your app both functional and secure.

Markdown + Sanitization = Clean formatting without the XSS headaches.

Happy coding — and stay safe!

Further reading: https://adamj.eu/tech/2023/12/13/django-sanitize-incoming-html-nh3/