Update default json library #1479

ijl · 2019-01-21T20:05:42Z

I propose replacing the dependency on ujson with orjson.

sanic using orjson does 2.9x the requests per second on an example benchmark compared to when using ujson.

It also does not have the correctness issues ujson has. Its README has details: https://github.com/ijl/orjson.

I think the implementation details of json() leak by handling types differently (e.g., ujson serializing datetimes to epoch timestamps, supporting subclasses) and exposing kwargs, so switching would be a breaking change.

orjson:

Running 30s test @ http://127.0.0.1:8000/
  2 threads and 2 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   122.59us   12.41us 525.00us   90.59%
    Req/Sec     8.13k   260.69     8.55k    76.74%
  486869 requests in 30.10s, 24.23GB read
Requests/sec:  16175.08
Transfer/sec:    824.38MB

ujson:

Running 30s test @ http://127.0.0.1:8000/
  2 threads and 2 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   356.78us   40.74us   6.17ms   99.09%
    Req/Sec     2.81k    94.51     3.02k    73.54%
  168044 requests in 30.10s, 8.37GB read
Requests/sec:   5582.88
Transfer/sec:    284.58MB

The output is a 56KiB JSON document of a GitHub activity feed. This was measured using:

PYTHONPATH=$PWD gunicorn \
    --preload --reuse-port --log-level error --bind localhost:8000 \
    --workers 2 --worker-class sanic.worker.GunicornWorker \
    wsgi:app

import os
from json import loads

from sanic import Sanic
from sanic.response import json
from sanic.views import HTTPMethodView

filename = os.path.join(os.path.dirname(__file__), 'github.json')

with open(filename, 'r') as fileh:
    DATA = loads(fileh.read())

app = Sanic(__name__)
app.config.ACCESS_LOG = False

class View(HTTPMethodView):

    def get(self, request):
        return json(DATA)

app.add_route(View.as_view(), '/')

The minimal patch to benchmark:

diff --git a/sanic/response.py b/sanic/response.py
index 7b245a8..536b27c 100644
--- a/sanic/response.py
+++ b/sanic/response.py
@@ -11,7 +11,7 @@ from sanic.helpers import STATUS_CODES, has_message_body, remove_entity_headers


 try:
-    from ujson import dumps as json_dumps
+    from orjson import dumps as json_dumps
 except BaseException:
     from json import dumps

@@ -216,8 +216,11 @@ def json(
     :param headers: Custom Headers.
     :param kwargs: Remaining arguments that are passed to the json encoder.
     """
+    body_bytes = dumps(body, **kwargs)
+    if not isinstance(body_bytes, bytes):
+        body_bytes = body_bytes.encode('utf-8')
     return HTTPResponse(
-        dumps(body, **kwargs),
+        body_bytes=body_bytes,
         headers=headers,
         status=status,
         content_type=content_type,

With ujson fully replaced, the test suite is fine. I'll open a pull request if it's ok to go ahead.

ahopkins · 2019-01-22T10:01:38Z

I must admit, I am not familiar with orjson, but the project does look promising.

However, it is still a young project (not that ujson or sanic for that matter are that old either), and I am not sure if it is battle tested out in the wild or not.

I think a better approach is just allowing the developer to pass in their own dumps method. Indeed, the developer already has that option:

import my_favorite_json
from sanic.response import json

async def handler(request):
    return json(..., dumps=my_favorite_json.dumps)

The developer is free to use an alternative. And, indeed, perhaps we can add some documentation on this.

At this time I would be more in favor of a PR that fixes the documentation that would explain how to do this rather than universally making the change for the entire project (especially since there would be a breaking change as pointed out).

yunstanford · 2019-01-23T05:44:14Z

Yeah, i agree with @ahopkins . Also looks like orjson only has manylinux wheel published https://pypi.org/simple/orjson/. there is no source dist, and no wheel for macos currently.

szepnapot · 2019-01-23T10:37:37Z

Can I take this one? :)

ahopkins · 2019-01-23T10:38:57Z

@szepnapot Please do. 🍻

(So I know we are on the same page, you are talking about the documentation part, right?)

ijl · 2019-01-23T13:50:33Z

Ok. I think it would be more appropriate to change the default to rapidjson or worse to change it to json and make the user choose another library for performance reasons. ujson has been unmaintained for years, that'll only get worse, and neither rapidjson nor json have its issues. rapidjson is close in performance for strings and dicts type web app payloads.

I think a documentation change on using a different JSON library isn't likely to have much effect though in that people also don't really change defaults and it doesn't help the library's goal of being fast out of the box.

orjson can distribute macOS, Windows, and source distributions, but it hasn't been done.

szepnapot · 2019-01-23T14:35:16Z

@ahopkins Yes :)
Can you give me some ideas what other examples can we show on the doc?
Or it's enough to show how to use a handler via this json example?

chenjr0719 · 2019-01-28T03:46:31Z

@szepnapot Since you are working on it, maybe also mention how to pass a loads when access JSON data from request:

import my_favorite_json
from sanic.response import json

async def handler(request):
    json_data = request.load_json(loads=my_favorite_json.loads)
    ...
    return json(..., dumps=my_favorite_json.dumps)

ijl · 2019-02-05T15:24:28Z

@yunstanford orjson now publishes Linux, macOS, and Windows wheels. Can we either go forward with this or split the documentation to another ticket so I can close?

yunstanford · 2019-02-07T03:54:33Z

@ijl i've tried once, but looks like some unit tests are failing. but didn't dig deep.

robd003 · 2019-09-01T09:22:29Z

Any progress on this?

stalkerg · 2019-10-02T03:40:39Z

hmm for small JSONs ujson faster.

VMAtm · 2020-05-08T13:36:54Z

ultrajson 2.0.0 is released, supports Python 3.8 https://github.com/ultrajson/ultrajson/releases/tag/2.0.0

New release of the ujson contains breaking change: they do not serialize iterables anymore. So, for example, set can't be serialized anymore. Probably you should switch from the ujson finally

ahopkins · 2020-05-10T17:06:10Z

I'll reopen for anyone that wants to add discussion. Now that it is being supported again, I am less inclined to switch but only en to convincing arguments. Thanks for sharing @VMAtm , I'll take a read thru the release notes.

stalkerg · 2020-05-13T15:05:22Z

Looks like new ultrajson now degradate feature set to orjson.

ahopkins · 2020-05-13T15:53:28Z

What do you mean?

stalkerg · 2020-05-14T01:15:19Z

@ahopkins

Remove serialization of date/datetime objects (50181f0) @Jahaja
Remove double_precision encoding option and precise_float decoding option (eb7d894) @Jahaja
Remove generic serialization of objects/iterables (53f85b1) @Jahaja
Remove support for __json__ method on str (5f98f01) @Jahaja
Remove blist tests (3a6ba52) @Jahaja

for me, the most important feature was datetime conversion. Now ultrajson have the same feature set as orjson but slower.

ashleysommer · 2020-05-14T01:19:46Z

Personally on a couple of performance-critical applications in my organisation I have replaced ujson with orjson, specifically for the built-in iso-format on datetime objects (and orjson is super fast).

stalkerg · 2020-05-14T02:52:52Z

@ashleysommer I see, for me, was the most important deserialization what not supported by orjson.

ahopkins · 2020-05-14T07:31:54Z

Yeah. That was one of the biggest reasons I was hesitant to make the switch because of the changes that would cause for Sanic users.

If it is a problem we will have to endure either way, I am back to the beginning on this one.

Anyone willing to put together another round of performance testing?

klausmyrseth · 2020-10-16T11:24:57Z

I been in the json lib hellhole for a while now, and im a vivid Sanic user and what I would love to get access to instead of choosing one in particular is to do something like this:

https://github.com/mattgiles/mujson

Have a "proxy" so you can load the compatible ones and it deside which to use based on dev prefs and so on. This would make Sanic a bit more "configurable" to the specific issue you are having :)

I need to speed up my json handling since I am running our entire backend microservice installation on Sanic with quite a bit of throughput, being able to use whats strong for the usecase is a strong feat and gives a very flexible solution to the json serialization and deserialization for Sanic. (In my eyes there are no unicorn in the json lib family)

yunstanford added the idea discussion label Jan 23, 2019

ahopkins added documentation help wanted labels Jan 23, 2019

ahopkins mentioned this issue Jan 23, 2019

Update README.md ijl/orjson#6

Closed

yunstanford mentioned this issue Jan 25, 2019

Publish wheel bdist for MacOS ijl/orjson#7

Closed

yunstanford assigned yunstanford and unassigned yunstanford Feb 11, 2019

sjsadowski changed the title ~~Use orjson instead of ujson~~ Update default json library Mar 3, 2019

harshanarayana mentioned this issue Mar 4, 2019

WIP - Replace ujson with orjson as default json library #1509

Closed

yunstanford added the on hold label May 14, 2019

ijl closed this Oct 2, 2019

ahopkins mentioned this issue Feb 7, 2020

ujson can not be used with python 3.8.1 #1781

Closed

ahopkins reopened this May 10, 2020

huge-success / sanic

Update default json library #1479

Update default json library #1479

ijl commented Jan 21, 2019

ahopkins commented Jan 22, 2019

yunstanford commented Jan 23, 2019

szepnapot commented Jan 23, 2019

ahopkins commented Jan 23, 2019

ijl commented Jan 23, 2019

szepnapot commented Jan 23, 2019

chenjr0719 commented Jan 28, 2019

ijl commented Feb 5, 2019

yunstanford commented Feb 7, 2019

robd003 commented Sep 1, 2019

stalkerg commented Oct 2, 2019

VMAtm commented May 8, 2020

ahopkins commented May 10, 2020

stalkerg commented May 13, 2020

ahopkins commented May 13, 2020

stalkerg commented May 14, 2020

ashleysommer commented May 14, 2020 •

edited

stalkerg commented May 14, 2020

ahopkins commented May 14, 2020

klausmyrseth commented Oct 16, 2020 •

edited

Nov	DEC	Jan
	03
2019	2020	2021

huge-success / sanic

Join GitHub today

GitHub is where the world builds software

Update default json library #1479

Update default json library #1479

Comments

ijl commented Jan 21, 2019

ahopkins commented Jan 22, 2019

yunstanford commented Jan 23, 2019

szepnapot commented Jan 23, 2019

ahopkins commented Jan 23, 2019

ijl commented Jan 23, 2019

szepnapot commented Jan 23, 2019

chenjr0719 commented Jan 28, 2019

ijl commented Feb 5, 2019

yunstanford commented Feb 7, 2019

robd003 commented Sep 1, 2019

stalkerg commented Oct 2, 2019

VMAtm commented May 8, 2020

ahopkins commented May 10, 2020

stalkerg commented May 13, 2020

ahopkins commented May 13, 2020

stalkerg commented May 14, 2020

ashleysommer commented May 14, 2020 • edited

stalkerg commented May 14, 2020

ahopkins commented May 14, 2020

klausmyrseth commented Oct 16, 2020 • edited

Essential cookies

Always active

Analytics cookies

ashleysommer commented May 14, 2020 •

edited

klausmyrseth commented Oct 16, 2020 •

edited