Format certain JSON objects on one line

Question

Consider the following code:

>>> import json
>>> data = {
...     'x': [1, {'$special': 'a'}, 2],
...     'y': {'$special': 'b'},
...     'z': {'p': True, 'q': False}
... }
>>> print(json.dumps(data, indent=2))
{
  "y": {
    "$special": "b"
  },
  "z": {
    "q": false,
    "p": true
  },
  "x": [
    1,
    {
      "$special": "a"
    },
    2
  ]
}

What I want is to format the JSON so that JSON objects that have only a single property '$special' are rendered on a single line, as follows.

{
  "y": {"$special": "b"},
  "z": {
    "q": false,
    "p": true
  },
  "x": [
    1,
    {"$special": "a"},
    2
  ]
}

I have played around with implementing a custom JSONEncoder and passing that in to json.dumps as the cls argument, but the two methods on JSONEncoder each have a problem:

The JSONEncoder default method is called for each part of data, but the return value is not a raw JSON string, so there doesn't appear to be any way to adjust its formatting.
The JSONEncoder encode method does return a raw JSON string, but it is only called once for the data as a whole.

Is there any way I can get JSONEncoder to do what I want?

Why do you need this in the first place? The json module is not really set up to let you control the output format to that extent, really. — Martijn Pieters
– Martijn Pieters, Commented Oct 13, 2016 at 18:59
Also, when "$special" is present, is it guaranteed to be the only key? — Martijn Pieters
– Martijn Pieters, Commented Oct 13, 2016 at 19:04
@MartijnPieters I want to display JSON data in a developer-oriented UI. JSON objects of the form {'$special': 'some key'} appear abundantly throughout this JSON data, so I was just exploring the possibility of visually compacting it a bit. It can be assumed that '$special' is the only key if it is present, although I suppose that is orthogonal to what I am really asking: how to locally modify JSON formatting. It might be the answer is simply "you can't with the json module." — Timothy Shields
– Timothy Shields, Commented Oct 13, 2016 at 19:09
I've tried to do something very similar to this myself and came up with no dice in the JSONEncoder. I ended up just giving up the fight and going with a standard prettify. — Taylor D. Edmiston
– Taylor D. Edmiston, Commented Oct 13, 2016 at 19:21
I was really hoping to find something like yapf but for formatting json, ideally as a Python lib. I haven't found one yet though. — Taylor D. Edmiston
– Taylor D. Edmiston, Commented Oct 13, 2016 at 19:22

Martijn Pieters · Accepted Answer · 2016-10-13 21:01:33Z

The json module is not really designed to give you that much control over the output; indentation is mostly meant to aid readability while debugging.

Instead of making json produce the output, you could transform the output using the standard library tokenize module:

import tokenize
from io import BytesIO


def inline_special(json_data):
    def adjust(t, ld,):
        """Adjust token line number by offset"""
        (sl, sc), (el, ec) = t.start, t.end
        return t._replace(start=(sl + ld, sc), end=(el + ld, ec))

    def transform():
        with BytesIO(json_data.encode('utf8')) as b:
            held = []  # to defer newline tokens
            lastend = None  # to track the end pos of the prev token
            loffset = 0     # line offset to adjust tokens by
            tokens = tokenize.tokenize(b.readline)
            for tok in tokens:
                if tok.type == tokenize.NL:
                    # hold newlines until we know there's no special key coming
                    held.append(adjust(tok, loffset))
                elif (tok.type == tokenize.STRING and
                        tok.string == '"$special"'):
                    # special string, collate tokens until the next rbrace
                    # held newlines are discarded, adjust the line offset
                    loffset -= len(held)
                    held = []
                    text = [tok.string]
                    while tok.exact_type != tokenize.RBRACE:
                        tok = next(tokens)
                        if tok.type != tokenize.NL:
                            text.append(tok.string)
                            if tok.string in ':,':
                                text.append(' ')
                        else:
                            loffset -= 1  # following lines all shift
                    line, col = lastend
                    text = ''.join(text)
                    endcol = col + len(text)
                    yield tokenize.TokenInfo(
                        tokenize.STRING, text, (line, col), (line, endcol),
                        '')
                    # adjust any remaining tokens on this line
                    while tok.type != tokenize.NL:
                        tok = next(tokens)
                        yield tok._replace(
                            start=(line, endcol),
                            end=(line, endcol + len(tok.string)))
                        endcol += len(tok.string)
                else:
                    # uninteresting token, yield any held newlines
                    if held:
                        yield from held
                        held = []
                    # adjust and remember last position
                    tok = adjust(tok, loffset)
                    lastend = tok.end
                    yield tok

    return tokenize.untokenize(transform()).decode('utf8')

This reformats your sample successfully:

import json

data = {
    'x': [1, {'$special': 'a'}, 2],
    'y': {'$special': 'b'},
    'z': {'p': True, 'q': False}
}

>>> print(inline_special(json.dumps(data, indent=2)))
{
  "x": [
    1,
    {"$special": "a"},
    2
  ],
  "y": {"$special": "b"},
  "z": {
    "p": true,
    "q": false
  }
}

Timothy Shields · Accepted Answer · 2016-10-18 17:17:40Z

I found the following regex-based solution to be simplest, albeit … regex-based.

import json
import re
data = {
    'x': [1, {'$special': 'a'}, 2],
    'y': {'$special': 'b'},
    'z': {'p': True, 'q': False}
}
text = json.dumps(data, indent=2)
pattern = re.compile(r"""
{
\s*
"\$special"
\s*
:
\s*
"
((?:[^"]|\\"))*  # Captures zero or more NotQuote or EscapedQuote
"
\s*
}
""", re.VERBOSE)
print(pattern.sub(r'{"$special": "\1"}', text))

The output follows.

{
  "x": [
    1,
    {"$special": "a"},
    2
  ],
  "y": {"$special": "b"},
  "z": {
    "q": false,
    "p": true
  }
}

Brendan Abel · Accepted Answer · 2016-10-13 19:50:35Z

You can do it, but you'd basically have to copy/modify a lot of the code out of json.encoder because the encoding functions aren't really designed to be partially overridden.

Basically, copy the entirety of _make_iterencode from json.encoder and make the changes so that your special dictionary gets printed without newline indents. Then monkeypatch the json package to use your modified version, run the json dump, then undo the monkeypatch (if you want).

The _make_iterencode function is pretty long, so I've only posted the portions that need to be changed.

import json
import json.encoder

def _make_iterencode(markers, _default, _encoder, _indent, _floatstr,
    ...
    def _iterencode_dict(dct, _current_indent_level):
        ...
        if _indent is not None:
            _current_indent_level += 1
            if '$special' in dct:
                newline_indent = ''
                item_separator = _item_separator
            else:
                newline_indent = '\n' + (' ' * (_indent * _current_indent_level))
                item_separator = _item_separator + newline_indent
            yield newline_indent
        ...
        if newline_indent is not None:
            _current_indent_level -= 1
            if '$special' not in dct:
                yield '\n' + (' ' * (_indent * _current_indent_level))

def main():
    data = {
        'x': [1, {'$special': 'a'}, 2],
        'y': {'$special': 'b'},
        'z': {'p': True, 'q': False},
    }

    orig_make_iterencoder = json.encoder._make_iterencode
    json.encoder._make_iterencode = _make_iterencode
    print(json.dumps(data, indent=2))
    json.encoder._make_iterencode = orig_make_iterencoder

Collectives™ on Stack Overflow

Format certain JSON objects on one line

3 Answers 3

Comments

Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Related