lua entry thread aborted: runtime error: /etc/nginx/lua/domainproxy.lua:32: bad request
Closed, ResolvedPublicBUG REPORT

Description

Steps to replicate the issue (include links if applicable):

What happens?:

  • For some reason the webbrowser blocks scripts and style sheets loaded from tools-static.wmflabs.org.

What should have happened instead?:

  • The scripts and style sheets should be loaded correctly.

Software version (on Special:Version page; skip for WMF-hosted wikis like Wikipedia):

Other information (browser name/version, screenshots, etc.):

  • The browser console log on Firefox (137.0.2 on Ubuntu) shows that Javascipt and css from tools-static.wmflabs.org is blocked with the error message: NS_BINDING_ABORTED same for Firefox 138.0 on MacOS
  • Chromium (135.0.7049.114 on Ubuntu) has the same error with the error message ERR_BLOCKED_BY_ORB
  • Safari seems not to be affected

Event Timeline

The Cloud-Services project tag is not intended to have any tasks. Please check the list on https://phabricator.wikimedia.org/project/profile/832/ and replace it with a more specific project tag to this task. Thanks!

taavi renamed this task from All tools using tools-static.wmflabs.org broken to lua entry thread aborted: runtime error: /etc/nginx/lua/domainproxy.lua:32: bad request.Apr 30 2025, 3:50 PM
taavi edited projects, added Cloud-VPS; removed Toolforge.
taavi triaged this task as High priority.Apr 30 2025, 3:54 PM
taavi subscribed.

Thanks for the report! This seems to be a problem with the new proxies that T379175: Enable IPv6 for the Cloud VPS web proxy is introducing. The particular error being logged is P75678, which I thought https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/0e4620fe503a5cc46b118492d27528dbc8b8f5ce%5E%21/ fixed but apparently not.

For now I've dropped the AAAA record from tools-static.wmflabs.org which means that you should be routed to the working old proxies as soon as DNS TTLs expire.

Mentioned in SAL (#wikimedia-cloud) [2025-04-30T15:54:46Z] <taavi> project-proxy drop AAAA record from tools-static T393024

One thing to note is that this is only happening with HTTP 2.0 requests, and seemingly only for requests that are not the first request on a given connection.

Change #1142572 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/puppet@production] dynamicproxy: Add missing redis_shutdown() call

https://gerrit.wikimedia.org/r/1142572

Change #1142572 merged by Majavah:

[operations/puppet@production] dynamicproxy: Add missing redis_shutdown() call

https://gerrit.wikimedia.org/r/1142572

I tried to:

  • Handle possible Lua errors in the code, and set lua_socket_log_errors off; instead (as recommended in the lua-nginx-redis upstream README)
  • Use :close() instead of :set_keepalive()

Neither of those had any impact. Now I've set loglevel verbose in the Redis configuration in the hopes that that will be more helpful.

Now I've set loglevel verbose in the Redis configuration in the hopes that that will be more helpful.

Nothing useful in there either. I've also tried to set the code to talk to an old Redis server (proxy-03) with no impact, suggesting that the issue is within Nginx and not the Redis server.

Change #1142598 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/puppet@production] dynamicproxy: Declare functions as local

https://gerrit.wikimedia.org/r/1142598

Change #1142598 merged by Majavah:

[operations/puppet@production] dynamicproxy: Declare functions as local

https://gerrit.wikimedia.org/r/1142598

The latest patch seems to have done the trick. I'm confident enough that I'm moving some tools-static traffic back to the new proxy to validate the fix with a higher amount of traffic. I'll check back tomorrow and will close the task if there are no additional reports of issues or logged errors.