When a "Black Box" Plugin Consumes 45% CPU: How We Pinpointed Lua Line 93 Without Source Code
This article reviews a real customer case. In this scenario, an API gateway cluster built on open-source OpenResty experienced a severe CPU bottleneck, and traditional perf tools proved ineffective against “black box” plugins. This article will detail how the OpenResty XRay team utilized dynamic tracing technology to non-invasively probe the Lua VM, accurately identifying a pkey_rsa_decrypt function in the OpenSSL C library that accounted for 44.8% of CPU usage, and revealing its true impact on the system’s concurrency performance.
API Gateways are the critical bottleneck of modern microservice architectures, making their performance and stability paramount.
Recently, a customer from the financial industry contacted us, reporting a tricky production issue: their core business’s gateway cluster experienced continuous CPU saturation (100% utilization) during peak hours, leading to significant P99 latency spikes (99th percentile latency).
When perf Meets the Black Box
The customer’s SRE team quickly intervened and attempted the following diagnostic methods:
top/htop: The results only indicated that theopenrestyprocess was the primary CPU consumer, offering no further details.perf record: The customer tried usingperf recordand flame graphs. In the best-case scenario, perf’s flame graph might reveal C functions likepkey_rsa_decryptconsuming CPU, although these are often obscured by VM symbols such asluajit_vm_dispatchin a JIT environment.
However, this immediately presented a second, more critical problem: attribution was impossible. Perf could not associate this C function call with any upstream Lua code. It merely identified an “isolated hotspot.” We couldn’t determine which plugin, API route, or line of Lua code triggered this expensive C call.
This highlights the core limitation of perf in a mixed Lua+C language stack: it loses context. On a complex gateway running dozens of plugins, an isolated C function hotspot provides insufficient guidance for optimization.
- APM Monitoring: The customer’s APM system (based on
ngx.nowandngx.log) showed that most of the execution time occurred during theaccess_by_luaphase. While this narrowed the scope, it was still too broad, as theaccessphase could have a dozen or more plugins attached.
Through a systematic investigation, the customer team pinpointed a custom plugin, cb-session-validation, used for session validation. The challenge, however, was that this plugin was a “black box provided by a third-party vendor,” and the customer team did not possess its complete source code.
This represented a typical “blind spot”:
- At a macro level, they knew the CPU was saturated.
- At a meso level, they knew it was the
access_by_luaphase. - At a micro level, they were unable to pinpoint the specific function causing the high consumption.
From Sampling to Full-Stack Dynamic Tracing
According to the OpenResty XRay technical expert team, since both Lua-level profilers and system-level perf tools failed to provide answers, the bottleneck was almost certainly occurring at the boundary between the Lua VM and Native C libraries.
This represents a “blind spot” for traditional sampling tools. While perf excels at analyzing C/C++ code or kernel operations, it lacks the ability to capture call stacks for application-level languages such as LuaJIT, V8, and JVM. Consequently, the information it provides is very limited. Open-source APM tools, on the other hand, have a steep learning curve, with only a few developers proficient enough to use them effectively, making them unsuitable for deployment in production environments.
We recommended the customer perform a “CT scan” directly in the problematic production environment using OpenResty XRay.
The core distinction of OpenResty XRay is its reliance on dynamic tracing rather than sampling. It can non-invasively and in real-time reconstruct the complete call stack from the NGINX event loop, through the LuaJIT VM, down to C libraries, and even into the kernel, all without requiring any restarts or code modifications.
Once the analysis task was initiated, the evidence clearly pointed to a surprising fact: a C function named pkey_rsa_decrypt was consuming a staggering 44.8% of the CPU time.
Uncovering the Evidence Trail
This pkey_rsa_decrypt function clearly originates from the OpenSSL library, used for RSA private key decryption. But how is it being called?
The complete call stack provided by OpenResty XRay reveals the following evidence:
@access_by_lua(nginx.conf:310):2
http_access_phase@/usr/local/openresty/app/init.lua:721
run_plugin@/usr/local/openresty/app/plugin.lua:1154
phase_func@/opt/openresty/cb_plugins/openresty/plugins/cb-session-validation.lua:93 <-- Customer's "black box" plugin
validate_session@/opt/openresty/cblualib/cbmodules/cb-session-validator.lua:283
verify_session@/opt/openresty/cblualib/cbmodules/cb-session-validator.lua:105
load_jwt@/opt/openresty/cblualib/resty/cb_jwt.lua:624
pcall
[builtin#pcall]
@/opt/openresty/cblualib/resty/cb_jwt.lua:250
...
C:pkey_rsa_decrypt [/usr/lib64/libcrypto.so.1.1.1k] <-- 44.8% CPU bottleneck
@/usr/src/debug/openssl-1.1.1k-12.el8_9.x86_64/crypto/rsa/rsa_pmeth.c:337
The value of this chain of evidence lies in:
- Information transparency: It provides the attribution context for the missing perf, clearly tracing the C language hotspot
pkey_rsa_decryptback to its Lua perpetratorcb-session-validation.luaplugin, line 93. - Connecting Lua and C: It perfectly draws the complete path from the black-box plugin to the JWT library (Lua code), and finally “piercing into”
libcrypto.so(C library).
Without source code, without GDB, and without restarting the service, we also knew which file and which line of code triggered the problem. This is the insight engineers truly need, which can be acted upon immediately.
What are the implications for system concurrency?
This 44.8% CPU utilization fundamentally represents a severe bottleneck for the system’s throughput capacity.
It means nearly half of the gateway cluster’s CPU cycles are being wasted on repetitive cryptographic operations that could easily be cached and optimized. This directly leads to premature CPU saturation, preventing the system from handling more concurrent requests, and inevitably causing P99 latency to skyrocket.
Summary
This case perfectly highlighted the limitations of conventional observability and APM tools. Traditional monitoring methods, whether perf or APM, proved ineffective because the bottleneck occurred at the boundary between the VM and Native C libraries. perf could only inspect the C call stack, failing to gain deep insight into the Lua call stack, thus lacking crucial last-mile visibility.
As mentioned earlier, perf might identify pkey_rsa_decrypt as a hotspot. However, in a high-concurrency, event-driven JIT environment like an API gateway based on open-source OpenResty, this information is almost useless.
The core issue lies in the attribution gap. perf cannot cross the boundary of the LuaJIT VM. It doesn’t know which specific piece of Lua code within the VM triggered this C function. In a system with dozens of plugins processing tens of thousands of requests per second, it’s impossible to determine whether the issue was caused by plugin A or plugin B.
In essence:
- perf observed the C function’s high utilization but could not attribute it to a specific Lua request.
- APM detected the slowness of Lua requests but could not “penetrate” the black box of C language.
The core advantage of OpenResty XRay lies in its full-stack dynamic tracing capabilities. It can reconstruct a complete, hybrid call stack spanning both Lua and C environments, precisely telling you: “Line 93 of the cb-session-validation.lua file, through a series of calls, ultimately executed pkey_rsa_decrypt and consumed 44.8% of the CPU.”
For high-performance middleware like Kong, which is based on OpenResty, the most challenging areas of performance optimization often reside at the C language level, FFI boundaries, or system calls. To master such complex systems, senior engineers require a new tool: one that offers full-stack dynamic tracing, seamlessly connecting the worlds of Lua and C.
In this diagnosis, OpenResty XRay acted as a “CT scanner,” enabling the client’s team to move beyond blind guessing and perform precise optimization based on data and evidence. If your team also maintains complex OpenResty systems and is tired of searching for a needle in a haystack within perf’s results, we invite you to apply for a trial of OpenResty XRay to uncover the true “performance black holes” in your system.
What is OpenResty XRay
OpenResty XRay is a dynamic-tracing product that automatically analyzes your running applications to troubleshoot performance problems, behavioral issues, and security vulnerabilities with actionable suggestions. Under the hood, OpenResty XRay is powered by our Y language targeting various runtimes like Stap+, eBPF+, GDB, and ODB, depending on the contexts.
About The Author
Yichun Zhang (Github handle: agentzh), is the original creator of the OpenResty® open-source project and the CEO of OpenResty Inc..
Yichun is one of the earliest advocates and leaders of “open-source technology”. He worked at many internationally renowned tech companies, such as Cloudflare, Yahoo!. He is a pioneer of “edge computing”, “dynamic tracing” and “machine coding”, with over 22 years of programming and 16 years of open source experience. Yichun is well-known in the open-source space as the project leader of OpenResty®, adopted by more than 40 million global website domains.
OpenResty Inc., the enterprise software start-up founded by Yichun in 2017, has customers from some of the biggest companies in the world. Its flagship product, OpenResty XRay, is a non-invasive profiling and troubleshooting tool that significantly enhances and utilizes dynamic tracing technology. And its OpenResty Edge product is a powerful distributed traffic management and private CDN software product.
As an avid open-source contributor, Yichun has contributed more than a million lines of code to numerous open-source projects, including Linux kernel, Nginx, LuaJIT, GDB, SystemTap, LLVM, Perl, etc. He has also authored more than 60 open-source software libraries.


















