DEV Community

Cover image for Building Real-Time Text Streaming with SSE in Node.js

Building Real-Time Text Streaming with SSE in Node.js

When streaming AI-generated text to a browser, delays of even a few hundred milliseconds can break user immersion. At Axrisi, we faced this challenge while building our browser extension for real-time AI text processing. Traditional polling wasn't cutting it—enter Server-Sent Events (SSE).

Introduction

Our browser extension processes text through AI models in real-time, requiring an efficient streaming solution that could handle thousands of concurrent users while maintaining low latency. This article shares our journey implementing SSE and the lessons we learned along the way.

Why Server-Sent Events?

Figure 1: Technology Comparison

Feature WebSockets Long Polling Server-Sent Events
Protocol Overhead High (WebSocket handshake + frames) Very High (Repeated HTTP) Low (Single HTTP)
Reconnection Manual Implementation Manual Implementation Automatic
Bi-directional Yes No No
Browser Support Excellent Universal Modern Browsers
Memory Usage ~50KB per connection ~30KB per request ~40KB per connection
Best For Chat, Gaming Legacy Systems One-way Streaming

We chose SSE for several compelling reasons:

  1. Lightweight Protocol: Single HTTP connection with minimal overhead
  2. Auto-Reconnection: Built-in retry mechanism saved us development time
  3. Native Browser Support: No additional libraries needed
  4. Perfect Match: One-way streaming aligned perfectly with our AI text generation flow

Real Implementation Deep Dive

Figure 2: SSE Architecture Flow

SSE Architecture Flow

Here's our actual server-side implementation in NestJS:

In the server controller, we define a POST route (/api/run) that responds with the necessary SSE headers to keep the connection open. We then attach a handler function (streamHandler) that formats each incoming AI-generated chunk into a JSON payload with a timestamp-based identifier. Inside a try/catch, we invoke the OpenAI streaming API to receive chunks; for each chunk, we pass it to the handler so it is written to the response stream. When the stream ends, we send a final “done” event before closing the connection. If an error occurs, we emit an SSE “error” event with a sanitized error message and close the response.

Figure 3: Stream Processing Pipeline

Stream Processing Pipeline

Real-World Example

Here's an actual stream output from our production system:

Each line represents one SSE event. The first event carries a title field along with a commandId, followed by several “content” events that append text fragments. Finally, a done: true event signals the end of streaming.

data: {"title":"Summary of the Minisforum UM870 Mini PC","commandId":1749186088941}
data: {"content":"##","commandId":1749186088941}
data: {"content":" Overview","commandId":1749186088941}
data: {"content":" of","commandId":1749186088941}
data: {"content":" the","commandId":1749186088941}
// ... streaming continues ...
data: {"done":true,"commandId":1749186088941}
Enter fullscreen mode Exit fullscreen mode

Client-Side Implementation

Our browser extension processes these streams efficiently:

On the client side, we create a StreamProcessor class that opens an EventSource to the SSE endpoint. We listen for onmessage events, parse each incoming JSON payload, and append content fragments to an internal buffer. When the payload indicates done, we close the connection. We also attach an onerror listener to log minimal error information and clean up resources. A helper method (processChunk) concatenates content and triggers UI updates, while cleanup simply closes the EventSource.

Performance & Resource Management

Figure 4: Connection Lifecycle

Connection Lifecycle

Our production measurements show:

  • Memory Usage: ~40KB per active connection
  • Concurrent Connections: Safely handles 1000-2000 connections on a 1GB RAM server
  • Latency: Average 100ms from AI token generation to browser render
  • CPU Usage: Peak 40% reduction compared to our previous polling implementation

Resource Management Implementation

  1. Connection Tracking:

    We define a safeWrite helper that checks two internal flags—whether the response has already ended and whether the stream identifier is valid—before attempting to write data. This prevents attempts to write after the stream is closed or flagged as ended.

  2. Cleanup on Disconnect:

    We listen for the response’s close event. When the client disconnects, we set an internal flag (isResponseEnded) and call a method on our AI service client to abort the active streaming request tied to that user, ensuring no orphaned processes remain.

Security Considerations

Our production security measures include:

  1. Authentication:

    We extract the bearer token from the Authorization header and pass it to our authentication service, which verifies the JWT. If verification fails, we return a 401 Unauthorized response with a generic error.

  2. Rate Limiting:

    We invoke our rate-limiter service by passing the user ID. If the user has exceeded their allotted quota, we immediately return a 429 Too Many Requests response with a simple “Rate limit exceeded” message, omitting any sensitive timing details.

When Not to Use SSE

Despite our success with SSE, it's not always the best choice:

  1. Binary Data Streaming: WebSockets are better for binary data
  2. Bi-directional Communication: Chat applications need WebSockets
  3. Legacy Browser Support: IE11 requires polyfills or fallbacks

Conclusion

SSE proved to be the perfect choice for our AI text streaming needs at Axrisi. The combination of simplicity, efficiency, and built-in features allowed us to focus on delivering a great user experience rather than wrestling with complex protocols.

Next Steps

  • Try our browser extension to see SSE in action
  • Follow us @AxrisiAI for updates

Questions or want to learn more about our implementation? Drop a comment below or reach out on Twitter!

Top comments (0)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.