The Wayback Machine - https://web.archive.org/web/20210603005649/https://github.com/nodejs/node/issues/38708
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wanted: Reimplement WHATWG URL Parser in WASM #38708

Open
jasnell opened this issue May 17, 2021 · 12 comments
Open

Wanted: Reimplement WHATWG URL Parser in WASM #38708

jasnell opened this issue May 17, 2021 · 12 comments

Comments

@jasnell
Copy link
Member

@jasnell jasnell commented May 17, 2021

The current WHATWG URL parser implementation is written in C/C++ and incurs a fairly significant cost crossing the JS/C++ boundary. It should be possible to realize a significant performance improvement by porting the implementation to WASM (similar to how the https://github.com/nodejs/undici project has seen a massive performance boost out of moving llhttp parser to wasm).

If someone wanted to pick up this effort, I am available to help mentor through it.

It will require c/c++ experience. What I would imagine would be best is setting it up as a separate project that we would vendor in as a dependency. Done right it would have no breaking API changes.

@devsnek
Copy link
Member

@devsnek devsnek commented May 17, 2021

Can we use rust?

@jasnell
Copy link
Member Author

@jasnell jasnell commented May 17, 2021

As a vendored in dependency I personally wouldn't care. But... need to consider the impact on the prerequisites that are necessary to build node.js. If vendoring the dependency requires someone to set up a whole rust development environment to be able to build, then that may be problematic. If, alternatively, what we vendor in is just the wasm files and some glue code such that we only need the wasm compiler, then it really doesn't matter what the source language is.

@akshay1992kalbhor
Copy link

@akshay1992kalbhor akshay1992kalbhor commented May 17, 2021

How much C/C++ experience would be needed? I have submitted small patches to Gecko before

@voxpelli
Copy link

@voxpelli voxpelli commented May 17, 2021

The Servo WHATWG URL parser in Rust has CI set up for WASM. Compiling that and publishing the resulting WASM to eg. npm could maybe even be something they themselves would be interested in? https://github.com/servo/rust-url/blob/d673c4d5e22b3a8ac91b7f52faa45dc32a275f75/.github/workflows/main.yml

@legendecas
Copy link
Member

@legendecas legendecas commented May 17, 2021

Since the practice already exists in undici, can we first try to work this on an npm module?

@jasnell
Copy link
Member Author

@jasnell jasnell commented May 17, 2021

I'll be happy with whatever works and keeps us spec compliant. It would be good to ensure that we can still quickly respond to changes in the base spec, so we'll need to make sure that whatever the implementation is it stays maintained.

@arsalanhub
Copy link

@arsalanhub arsalanhub commented May 18, 2021

@jasnell Has anyone taken the issue? If not then I would like to work on this.

@puzpuzpuz
Copy link
Member

@puzpuzpuz puzpuzpuz commented May 18, 2021

t should be possible to realize a significant performance improvement by porting the implementation to WASM (similar to how the https://github.com/nodejs/undici project has seen a massive performance boost out of moving llhttp parser to wasm).

It tried to find any benchmark results for the mentioned undici migration to wasm and found only the following message: nodejs/undici#575

Probably that's the not right place to look into since I don't see a significant performance improvement in the above message. Most likely that's a wrong place to look at, so could someone tell me where to find more information on existing experiments?

@voxpelli
Copy link

@voxpelli voxpelli commented May 18, 2021

So, simplified, in essence what it would take is to swap the internalBinding('url') in internal/url.js for a WASM-import of the same stuff:

node/lib/internal/url.js

Lines 79 to 102 in 910efc2

const {
domainToASCII: _domainToASCII,
domainToUnicode: _domainToUnicode,
encodeAuth,
toUSVString: _toUSVString,
parse,
setURLConstructor,
URL_FLAGS_CANNOT_BE_BASE,
URL_FLAGS_HAS_FRAGMENT,
URL_FLAGS_HAS_HOST,
URL_FLAGS_HAS_PASSWORD,
URL_FLAGS_HAS_PATH,
URL_FLAGS_HAS_QUERY,
URL_FLAGS_HAS_USERNAME,
URL_FLAGS_IS_DEFAULT_SCHEME_PORT,
URL_FLAGS_SPECIAL,
kFragment,
kHost,
kHostname,
kPathStart,
kPort,
kQuery,
kSchemeStart
} = internalBinding('url');

Similar to how the import of llhttp in undici works.

Considering that, I guess it can easier to compile the existing c++ code to WebAssembly than to set up some wasm-pack / wasm-bindgen flow that compiles JS-bindings for servo/rust-url using something like wasm-pack build --target=nodejs

I did however open servo/rust-url#712 to see if they would be interested in publishing such bindings themselves.

@Flarna
Copy link
Member

@Flarna Flarna commented May 19, 2021

Hope my question is not too stupid. But may I ask how WASM is debugged? JS and C++ are quite easy to debug but I never debugged WASM till now.

@devsnek
Copy link
Member

@devsnek devsnek commented May 19, 2021

In this context, you can use chrome devtools. it supports dwarf sections and whatnot so it should "just work"

@jasnell
Copy link
Member Author

@jasnell jasnell commented May 19, 2021

@voxpelli ... yes, that's essentially it. The one complicating bit there are the domainTo... bits that rely on functionality in ICU, which is not exposed to WASM yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment