When streaming AI-generated text to a browser, delays of even a few hundred milliseconds can break user immersion. At Axrisi, we faced this challenge while building our browser extension for real-time AI text processing. Traditional polling wasn't cutting it—enter Server-Sent Events (SSE).
Introduction
Our browser extension processes text through AI models in real-time, requiring an efficient streaming solution that could handle thousands of concurrent users while maintaining low latency. This article shares our journey implementing SSE and the lessons we learned along the way.
Why Server-Sent Events?
Figure 1: Technology Comparison
Feature | WebSockets | Long Polling | Server-Sent Events |
---|---|---|---|
Protocol Overhead | High (WebSocket handshake + frames) | Very High (Repeated HTTP) | Low (Single HTTP) |
Reconnection | Manual Implementation | Manual Implementation | Automatic |
Bi-directional | Yes | No | No |
Browser Support | Excellent | Universal | Modern Browsers |
Memory Usage | ~50KB per connection | ~30KB per request | ~40KB per connection |
Best For | Chat, Gaming | Legacy Systems | One-way Streaming |
We chose SSE for several compelling reasons:
- Lightweight Protocol: Single HTTP connection with minimal overhead
- Auto-Reconnection: Built-in retry mechanism saved us development time
- Native Browser Support: No additional libraries needed
- Perfect Match: One-way streaming aligned perfectly with our AI text generation flow
Real Implementation Deep Dive
Figure 2: SSE Architecture Flow
Here's our actual server-side implementation in NestJS:
In the server controller, we define a POST route (
/api/run
) that responds with the necessary SSE headers to keep the connection open. We then attach a handler function (streamHandler
) that formats each incoming AI-generated chunk into a JSON payload with a timestamp-based identifier. Inside atry/catch
, we invoke the OpenAI streaming API to receive chunks; for each chunk, we pass it to the handler so it is written to the response stream. When the stream ends, we send a final “done” event before closing the connection. If an error occurs, we emit an SSE “error” event with a sanitized error message and close the response.
Figure 3: Stream Processing Pipeline
Real-World Example
Here's an actual stream output from our production system:
Each line represents one SSE event. The first event carries a
title
field along with acommandId
, followed by several “content” events that append text fragments. Finally, adone: true
event signals the end of streaming.
data: {"title":"Summary of the Minisforum UM870 Mini PC","commandId":1749186088941}
data: {"content":"##","commandId":1749186088941}
data: {"content":" Overview","commandId":1749186088941}
data: {"content":" of","commandId":1749186088941}
data: {"content":" the","commandId":1749186088941}
// ... streaming continues ...
data: {"done":true,"commandId":1749186088941}
Client-Side Implementation
Our browser extension processes these streams efficiently:
On the client side, we create a
StreamProcessor
class that opens anEventSource
to the SSE endpoint. We listen foronmessage
events, parse each incoming JSON payload, and append content fragments to an internal buffer. When the payload indicatesdone
, we close the connection. We also attach anonerror
listener to log minimal error information and clean up resources. A helper method (processChunk
) concatenates content and triggers UI updates, whilecleanup
simply closes the EventSource.
Performance & Resource Management
Figure 4: Connection Lifecycle
Our production measurements show:
- Memory Usage: ~40KB per active connection
- Concurrent Connections: Safely handles 1000-2000 connections on a 1GB RAM server
- Latency: Average 100ms from AI token generation to browser render
- CPU Usage: Peak 40% reduction compared to our previous polling implementation
Resource Management Implementation
Connection Tracking:
We define asafeWrite
helper that checks two internal flags—whether the response has already ended and whether the stream identifier is valid—before attempting to write data. This prevents attempts to write after the stream is closed or flagged as ended.Cleanup on Disconnect:
We listen for the response’sclose
event. When the client disconnects, we set an internal flag (isResponseEnded
) and call a method on our AI service client to abort the active streaming request tied to that user, ensuring no orphaned processes remain.
Security Considerations
Our production security measures include:
Authentication:
We extract the bearer token from theAuthorization
header and pass it to our authentication service, which verifies the JWT. If verification fails, we return a 401 Unauthorized response with a generic error.Rate Limiting:
We invoke our rate-limiter service by passing the user ID. If the user has exceeded their allotted quota, we immediately return a 429 Too Many Requests response with a simple “Rate limit exceeded” message, omitting any sensitive timing details.
When Not to Use SSE
Despite our success with SSE, it's not always the best choice:
- Binary Data Streaming: WebSockets are better for binary data
- Bi-directional Communication: Chat applications need WebSockets
- Legacy Browser Support: IE11 requires polyfills or fallbacks
Conclusion
SSE proved to be the perfect choice for our AI text streaming needs at Axrisi. The combination of simplicity, efficiency, and built-in features allowed us to focus on delivering a great user experience rather than wrestling with complex protocols.
Next Steps
- Try our browser extension to see SSE in action
- Follow us @AxrisiAI for updates
Questions or want to learn more about our implementation? Drop a comment below or reach out on Twitter!
Top comments (0)
Some comments may only be visible to logged-in visitors. Sign in to view all comments.