Posted on Jun 6

Building Real-Time Text Streaming with SSE in Node.js

#typescript #ai #programming #webdev

When streaming AI-generated text to a browser, delays of even a few hundred milliseconds can break user immersion. At Axrisi, we faced this challenge while building our browser extension for real-time AI text processing. Traditional polling wasn't cutting it—enter Server-Sent Events (SSE).

Introduction

Our browser extension processes text through AI models in real-time, requiring an efficient streaming solution that could handle thousands of concurrent users while maintaining low latency. This article shares our journey implementing SSE and the lessons we learned along the way.

Why Server-Sent Events?

Figure 1: Technology Comparison

Feature	WebSockets	Long Polling	Server-Sent Events
Protocol Overhead	High (WebSocket handshake + frames)	Very High (Repeated HTTP)	Low (Single HTTP)
Reconnection	Manual Implementation	Manual Implementation	Automatic
Bi-directional	Yes	No	No
Browser Support	Excellent	Universal	Modern Browsers
Memory Usage	~50KB per connection	~30KB per request	~40KB per connection
Best For	Chat, Gaming	Legacy Systems	One-way Streaming

We chose SSE for several compelling reasons:

Lightweight Protocol: Single HTTP connection with minimal overhead
Auto-Reconnection: Built-in retry mechanism saved us development time
Native Browser Support: No additional libraries needed
Perfect Match: One-way streaming aligned perfectly with our AI text generation flow

Real Implementation Deep Dive

Figure 2: SSE Architecture Flow

Here's our actual server-side implementation in NestJS:

In the server controller, we define a POST route (/api/run) that responds with the necessary SSE headers to keep the connection open. We then attach a handler function (streamHandler) that formats each incoming AI-generated chunk into a JSON payload with a timestamp-based identifier. Inside a try/catch, we invoke the OpenAI streaming API to receive chunks; for each chunk, we pass it to the handler so it is written to the response stream. When the stream ends, we send a final “done” event before closing the connection. If an error occurs, we emit an SSE “error” event with a sanitized error message and close the response.

Figure 3: Stream Processing Pipeline

Real-World Example

Here's an actual stream output from our production system:

Each line represents one SSE event. The first event carries a title field along with a commandId, followed by several “content” events that append text fragments. Finally, a done: true event signals the end of streaming.

data: {"title":"Summary of the Minisforum UM870 Mini PC","commandId":1749186088941}
data: {"content":"##","commandId":1749186088941}
data: {"content":" Overview","commandId":1749186088941}
data: {"content":" of","commandId":1749186088941}
data: {"content":" the","commandId":1749186088941}
// ... streaming continues ...
data: {"done":true,"commandId":1749186088941}

Client-Side Implementation

Our browser extension processes these streams efficiently:

On the client side, we create a StreamProcessor class that opens an EventSource to the SSE endpoint. We listen for onmessage events, parse each incoming JSON payload, and append content fragments to an internal buffer. When the payload indicates done, we close the connection. We also attach an onerror listener to log minimal error information and clean up resources. A helper method (processChunk) concatenates content and triggers UI updates, while cleanup simply closes the EventSource.

Performance & Resource Management

Figure 4: Connection Lifecycle

Our production measurements show:

Memory Usage: ~40KB per active connection
Concurrent Connections: Safely handles 1000-2000 connections on a 1GB RAM server
Latency: Average 100ms from AI token generation to browser render
CPU Usage: Peak 40% reduction compared to our previous polling implementation

Resource Management Implementation

Connection Tracking:

We define a safeWrite helper that checks two internal flags—whether the response has already ended and whether the stream identifier is valid—before attempting to write data. This prevents attempts to write after the stream is closed or flagged as ended.
Cleanup on Disconnect:

We listen for the response’s close event. When the client disconnects, we set an internal flag (isResponseEnded) and call a method on our AI service client to abort the active streaming request tied to that user, ensuring no orphaned processes remain.

Security Considerations

Our production security measures include:

Authentication:

We extract the bearer token from the Authorization header and pass it to our authentication service, which verifies the JWT. If verification fails, we return a 401 Unauthorized response with a generic error.
Rate Limiting:

We invoke our rate-limiter service by passing the user ID. If the user has exceeded their allotted quota, we immediately return a 429 Too Many Requests response with a simple “Rate limit exceeded” message, omitting any sensitive timing details.

When Not to Use SSE

Despite our success with SSE, it's not always the best choice:

Binary Data Streaming: WebSockets are better for binary data
Bi-directional Communication: Chat applications need WebSockets
Legacy Browser Support: IE11 requires polyfills or fallbacks

Conclusion

SSE proved to be the perfect choice for our AI text streaming needs at Axrisi. The combination of simplicity, efficiency, and built-in features allowed us to focus on delivering a great user experience rather than wrestling with complex protocols.

Next Steps

Try our browser extension to see SSE in action
Follow us @AxrisiAI for updates

Questions or want to learn more about our implementation? Drop a comment below or reach out on Twitter!

Top comments (0)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.