The Wayback Machine - https://web.archive.org/web/20201203145445/https://github.com/huge-success/sanic/issues/1479
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update default json library #1479

Open
ijl opened this issue Jan 21, 2019 · 20 comments
Open

Update default json library #1479

ijl opened this issue Jan 21, 2019 · 20 comments

Comments

@ijl
Copy link

@ijl ijl commented Jan 21, 2019

I propose replacing the dependency on ujson with orjson.

sanic using orjson does 2.9x the requests per second on an example benchmark compared to when using ujson.

It also does not have the correctness issues ujson has. Its README has details: https://github.com/ijl/orjson.

I think the implementation details of json() leak by handling types differently (e.g., ujson serializing datetimes to epoch timestamps, supporting subclasses) and exposing kwargs, so switching would be a breaking change.

orjson:

Running 30s test @ http://127.0.0.1:8000/
  2 threads and 2 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   122.59us   12.41us 525.00us   90.59%
    Req/Sec     8.13k   260.69     8.55k    76.74%
  486869 requests in 30.10s, 24.23GB read
Requests/sec:  16175.08
Transfer/sec:    824.38MB

ujson:

Running 30s test @ http://127.0.0.1:8000/
  2 threads and 2 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   356.78us   40.74us   6.17ms   99.09%
    Req/Sec     2.81k    94.51     3.02k    73.54%
  168044 requests in 30.10s, 8.37GB read
Requests/sec:   5582.88
Transfer/sec:    284.58MB

The output is a 56KiB JSON document of a GitHub activity feed. This was measured using:

PYTHONPATH=$PWD gunicorn \
    --preload --reuse-port --log-level error --bind localhost:8000 \
    --workers 2 --worker-class sanic.worker.GunicornWorker \
    wsgi:app
import os
from json import loads

from sanic import Sanic
from sanic.response import json
from sanic.views import HTTPMethodView

filename = os.path.join(os.path.dirname(__file__), 'github.json')

with open(filename, 'r') as fileh:
    DATA = loads(fileh.read())

app = Sanic(__name__)
app.config.ACCESS_LOG = False

class View(HTTPMethodView):

    def get(self, request):
        return json(DATA)

app.add_route(View.as_view(), '/')

The minimal patch to benchmark:

diff --git a/sanic/response.py b/sanic/response.py
index 7b245a8..536b27c 100644
--- a/sanic/response.py
+++ b/sanic/response.py
@@ -11,7 +11,7 @@ from sanic.helpers import STATUS_CODES, has_message_body, remove_entity_headers


 try:
-    from ujson import dumps as json_dumps
+    from orjson import dumps as json_dumps
 except BaseException:
     from json import dumps

@@ -216,8 +216,11 @@ def json(
     :param headers: Custom Headers.
     :param kwargs: Remaining arguments that are passed to the json encoder.
     """
+    body_bytes = dumps(body, **kwargs)
+    if not isinstance(body_bytes, bytes):
+        body_bytes = body_bytes.encode('utf-8')
     return HTTPResponse(
-        dumps(body, **kwargs),
+        body_bytes=body_bytes,
         headers=headers,
         status=status,
         content_type=content_type,

With ujson fully replaced, the test suite is fine. I'll open a pull request if it's ok to go ahead.

@ahopkins
Copy link
Member

@ahopkins ahopkins commented Jan 22, 2019

I must admit, I am not familiar with orjson, but the project does look promising.

However, it is still a young project (not that ujson or sanic for that matter are that old either), and I am not sure if it is battle tested out in the wild or not.

I think a better approach is just allowing the developer to pass in their own dumps method. Indeed, the developer already has that option:

import my_favorite_json
from sanic.response import json

async def handler(request):
    return json(..., dumps=my_favorite_json.dumps)

The developer is free to use an alternative. And, indeed, perhaps we can add some documentation on this.

At this time I would be more in favor of a PR that fixes the documentation that would explain how to do this rather than universally making the change for the entire project (especially since there would be a breaking change as pointed out).

@yunstanford
Copy link
Member

@yunstanford yunstanford commented Jan 23, 2019

Yeah, i agree with @ahopkins . Also looks like orjson only has manylinux wheel published https://pypi.org/simple/orjson/. there is no source dist, and no wheel for macos currently.

@szepnapot
Copy link

@szepnapot szepnapot commented Jan 23, 2019

Can I take this one? :)

@ahopkins
Copy link
Member

@ahopkins ahopkins commented Jan 23, 2019

@szepnapot Please do. 🍻

(So I know we are on the same page, you are talking about the documentation part, right?)

@ijl
Copy link
Author

@ijl ijl commented Jan 23, 2019

Ok. I think it would be more appropriate to change the default to rapidjson or worse to change it to json and make the user choose another library for performance reasons. ujson has been unmaintained for years, that'll only get worse, and neither rapidjson nor json have its issues. rapidjson is close in performance for strings and dicts type web app payloads.

I think a documentation change on using a different JSON library isn't likely to have much effect though in that people also don't really change defaults and it doesn't help the library's goal of being fast out of the box.

orjson can distribute macOS, Windows, and source distributions, but it hasn't been done.

@szepnapot
Copy link

@szepnapot szepnapot commented Jan 23, 2019

@ahopkins Yes :)
Can you give me some ideas what other examples can we show on the doc?
Or it's enough to show how to use a handler via this json example?

@chenjr0719
Copy link
Member

@chenjr0719 chenjr0719 commented Jan 28, 2019

@szepnapot Since you are working on it, maybe also mention how to pass a loads when access JSON data from request:

import my_favorite_json
from sanic.response import json

async def handler(request):
    json_data = request.load_json(loads=my_favorite_json.loads)
    ...
    return json(..., dumps=my_favorite_json.dumps)
@ijl
Copy link
Author

@ijl ijl commented Feb 5, 2019

@yunstanford orjson now publishes Linux, macOS, and Windows wheels. Can we either go forward with this or split the documentation to another ticket so I can close?

@yunstanford
Copy link
Member

@yunstanford yunstanford commented Feb 7, 2019

@ijl i've tried once, but looks like some unit tests are failing. but didn't dig deep.

@sjsadowski sjsadowski changed the title Use orjson instead of ujson Update default json library Mar 3, 2019
@yunstanford yunstanford added the on hold label May 14, 2019
@robd003
Copy link

@robd003 robd003 commented Sep 1, 2019

Any progress on this?

@stalkerg
Copy link

@stalkerg stalkerg commented Oct 2, 2019

hmm for small JSONs ujson faster.

@VMAtm
Copy link

@VMAtm VMAtm commented May 8, 2020

ultrajson 2.0.0 is released, supports Python 3.8 https://github.com/ultrajson/ultrajson/releases/tag/2.0.0

New release of the ujson contains breaking change: they do not serialize iterables anymore. So, for example, set can't be serialized anymore. Probably you should switch from the ujson finally

@ahopkins
Copy link
Member

@ahopkins ahopkins commented May 10, 2020

I'll reopen for anyone that wants to add discussion. Now that it is being supported again, I am less inclined to switch but only en to convincing arguments. Thanks for sharing @VMAtm , I'll take a read thru the release notes.

@ahopkins ahopkins reopened this May 10, 2020
@stalkerg
Copy link

@stalkerg stalkerg commented May 13, 2020

Looks like new ultrajson now degradate feature set to orjson.

@ahopkins
Copy link
Member

@ahopkins ahopkins commented May 13, 2020

What do you mean?

@stalkerg
Copy link

@stalkerg stalkerg commented May 14, 2020

@ahopkins

Remove serialization of date/datetime objects (50181f0) @Jahaja
Remove double_precision encoding option and precise_float decoding option (eb7d894) @Jahaja
Remove generic serialization of objects/iterables (53f85b1) @Jahaja
Remove support for __json__ method on str (5f98f01) @Jahaja
Remove blist tests (3a6ba52) @Jahaja

for me, the most important feature was datetime conversion. Now ultrajson have the same feature set as orjson but slower.

@ashleysommer
Copy link
Member

@ashleysommer ashleysommer commented May 14, 2020

Personally on a couple of performance-critical applications in my organisation I have replaced ujson with orjson, specifically for the built-in iso-format on datetime objects (and orjson is super fast).

@stalkerg
Copy link

@stalkerg stalkerg commented May 14, 2020

@ashleysommer I see, for me, was the most important deserialization what not supported by orjson.

@ahopkins
Copy link
Member

@ahopkins ahopkins commented May 14, 2020

Yeah. That was one of the biggest reasons I was hesitant to make the switch because of the changes that would cause for Sanic users.

If it is a problem we will have to endure either way, I am back to the beginning on this one.

Anyone willing to put together another round of performance testing?

@klausmyrseth
Copy link

@klausmyrseth klausmyrseth commented Oct 16, 2020

I been in the json lib hellhole for a while now, and im a vivid Sanic user and what I would love to get access to instead of choosing one in particular is to do something like this:

https://github.com/mattgiles/mujson

Have a "proxy" so you can load the compatible ones and it deside which to use based on dev prefs and so on. This would make Sanic a bit more "configurable" to the specific issue you are having :)

I need to speed up my json handling since I am running our entire backend microservice installation on Sanic with quite a bit of throughput, being able to use whats strong for the usecase is a strong feat and gives a very flexible solution to the json serialization and deserialization for Sanic. (In my eyes there are no unicorn in the json lib family)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.