The Wayback Machine - https://web.archive.org/web/20221127025903/https://github.com/guardian/html-janitor
Skip to content
This repository has been archived by the owner before Nov 9, 2022. It is now read-only.

guardian/html-janitor

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
Oct 9, 2018
Oct 7, 2015
May 29, 2014

html-janitor

Cleans up your markup and allows you to take control of your HTML.

HTMLJanitor uses a defined whitelist to limit HTML it is given to a defined subset.

## XSS Note

This library has not been extensively tested. In particular versions prior to 2.0.3 are vulnerable to XSS attacks. See here and here.

Please upgrade to 2.0.4 or above and consider building your own additional checks on user input.

Usage

var janitor = new HTMLJanitor(options);

var sanitisedHtml = janitor.clean(html);

Options

A configuration object.

tags defines a whitelist of elements that are allowed in the sanitised output. Each entry in the map should be the name of the element and the attributes that a valid for the element.

E.g. {tags: { p:{}, a: { href: true} }} would limit the valid HTML subset to just paragraphs and anchor tags. Paragraph tags would have all attributes stripped, and the anchor tags would only have the href attribute preserved.

Blacklisting and whitelisting all attributes

You can set an element to be true to allow all attributes on an element and false to remove all attributes.

Using logic

If you need to apply logic when determining whether to whitelist an element or an attribute, you can pass a function.

Here's an example that removes all <u> elements that are empty.

    u: function(el){
      // Remove empty underline tags.
      var shouldKeep = el.textContent !== '';
      return shouldKeep;
    },

A function can also be used for attributes, only the attribute's value and the element are passed as the function arguments:

     img: {
      height: function(value){
        // Only allow if height is less than 10.
        return parseInt(value) < 10;
      },
      width: function(value, el){
        // Only allow if height also specified.
        return el.hasAttribute('height');
      }
    }

Functions may return any value that's accepted as a regular value, including an object:

     blockquote: function(el) {
      if (el.classList.contains('indent')){
        return { 'class': true, 'style': true }; // If blockquote has class 'indent', also allow style.
      } else {
        return {}; // Strip everything
      }
    }

Distribution

Uses UMD for support in AMD and Common JS environments.

Not suitable for Node

This library is designed for use in a browser and requires access to document and createTreeWalker to work.

Installation

npm install html-janitor

Development

To run unit tests:

npm install
npm run test