How does SSE reconnection work when a stream drops mid-response?

The browser's EventSource API reconnects automatically using the Last-Event-ID header. Your server needs to tag each SSE event with an incrementing ID, then on reconnect, skip events the client already received. Without this, the client replays the entire response from scratch.

Why do AI streaming responses sometimes arrive in bursts instead of smoothly?

Almost always a proxy buffering issue. Nginx, Cloudflare, and AWS ALB all buffer responses by default. You need proxy_buffering off in Nginx, X-Accel-Buffering no as a response header, and gzip disabled on your streaming endpoints. Any one of these being wrong causes batching.

How do you cancel a streaming response without closing the entire connection?

With SSE, you can't — cancellation requires either aborting the fetch (AbortController.abort()) or sending a separate HTTP request to a cancel endpoint. WebSockets solve this natively since the cancel message travels over the same bidirectional connection that's streaming tokens.

What token rate can the requestAnimationFrame batching pattern handle?

It caps DOM writes at 60 per second regardless of token arrival rate. GPT-4o generates 80-100 tokens/second, so you'd batch roughly 1-2 tokens per frame — imperceptible to users but dramatically reducing render overhead compared to per-token state updates.

How should you handle tool call JSON that arrives as partial fragments?

Buffer each fragment by tool call index and attempt JSON.parse() after every new fragment. When the parse succeeds, the tool call is complete. You can't parse individual chunks because they're incomplete JSON — the model streams arguments character by character.

Why does Cloudflare kill SSE connections after 100 seconds of silence?

Cloudflare's proxy treats idle connections as stale. During long tool executions where no tokens flow, send SSE comment lines (: keepalive) every 30 seconds as heartbeats. Enterprise plans extend the timeout to 600 seconds, but heartbeats are the reliable fix regardless of plan.

How do you measure Time to First Token (TTFT) accurately in production?

Start a timer when the client sends the request, stop it when the first token event arrives. For SSE with fetch, that's the gap between calling fetch() and processing the first data: line. The Chanl SDK's useStreamingChat hook exposes TTFT automatically — no manual timing code needed.

Streaming AI Responses: SSE, WebSockets, and the Architecture Behind ChatGPT's Typing Effect | Chanl Blog

That typing effect in ChatGPT isn't animation. Each character arrives from the server the moment it's generated — one token at a time, pushed over a persistent connection. Behind that smooth rendering sits a stack of decisions about transport protocols, buffer management, partial JSON parsing, and proxy configuration that can each break the experience in subtle ways. This tutorial builds streaming from scratch three ways (SSE, WebSocket, Chanl SDK) and covers every layer between the model generating a token and a character appearing on screen.

What you'll build	Why it matters
SSE streaming server (Express)	The default protocol for 90% of AI streaming — simple, reliable, browser-native
WebSocket streaming server	Bidirectional streaming for voice AI, cancellation, and multi-user scenarios
Tool call accumulator	Parse partial JSON fragments as they stream — the hardest part of AI streaming
React streaming chat	Production-ready UI with TTFT metrics, cancellation, and tool call rendering
Backpressure handling	What happens when the client can't keep up — and how to prevent buffer bloat
Production proxy config	Nginx/CDN settings that break streaming if you get them wrong

What you'll need

Runtime:

Node.js 20+ and a Chanl account (free tier includes streaming)

Install dependencies:

bash

# TypeScript — SDK + server framework
npm install @chanl-ai/sdk express
# For WebSocket examples:
npm install ws

Set your API key:

bash

export CHANL_API_KEY="your-api-key-here"

Create a test agent we'll stream responses from:

typescript

import { ChanlClient } from "@chanl-ai/sdk";
const chanl = new ChanlClient({ apiKey: process.env.CHANL_API_KEY });
 
const agent = await chanl.agents.create({
  name: "Streaming Demo Agent",
  instructions: "You are a helpful assistant. Give detailed, thoughtful responses.",
  model: "claude-sonnet-4-20250514",
});
console.log("Agent ID:", agent.id); // Save this — we'll use it throughout

All code in this tutorial is complete and runnable.

Why streaming matters for AI

Streaming cuts perceived latency from 5-15 seconds to under 500ms. That's not a UX nicety — it's the difference between a product people use and one they abandon.

Without streaming, users stare at a blank screen while the entire response generates:

typescript

// Non-streaming: user waits for the ENTIRE response
import { ChanlClient } from "@chanl-ai/sdk";
 
const chanl = new ChanlClient({
  apiKey: process.env.CHANL_API_KEY,
  model: "claude-sonnet-4-20250514",
});
 
const start = Date.now();
const response = await chanl.chat.send(agentId, [
  { role: "user", content: "Explain quantum computing" },
]);
const elapsed = Date.now() - start;
console.log(`User waited ${elapsed}ms for first character`);
// User waited 8,432ms for first character
// The full response appears all at once
console.log(response.content);

With streaming, the first token arrives in a few hundred milliseconds:

typescript

// Streaming: first token arrives in ~200-400ms
import { ChanlClient } from "@chanl-ai/sdk";
 
const chanl = new ChanlClient({
  apiKey: process.env.CHANL_API_KEY,
  model: "claude-sonnet-4-20250514",
});
 
const start = Date.now();
let ttft: number | null = null;
 
const stream = chanl.chat.stream(agentId, [
  { role: "user", content: "Explain quantum computing" },
]);
 
for await (const chunk of stream) {
  if (chunk.type === "token") {
    if (!ttft) {
      ttft = Date.now() - start;
      console.log(`First token in ${ttft}ms`); // First token in 287ms
    }
    process.stdout.write(chunk.content);
  }
}

Total generation time is identical — the model produces tokens at the same speed either way. But the user starts reading immediately instead of staring at nothing for 8 seconds.

Streaming also unlocks three capabilities batch responses can't provide:

Early cancellation. If the model goes off-track, the user (or your code) can abort mid-stream. Without streaming, you pay for the full generation whether you use it or not.

Progressive rendering. Markdown, code blocks, and lists render incrementally. The UI feels alive.

Real-time tool calls. When an agent invokes a tool during generation — looking up a database, calling an API, searching a knowledge base — streaming shows that execution live rather than hiding it behind a spinner.

Server-Sent Events: the default streaming protocol

SSE is a browser-native protocol for one-way server-to-client event streams over standard HTTP. OpenAI, Anthropic, and Google all use it. No WebSocket upgrade, no special handshake — just a persistent HTTP connection with Content-Type: text/event-stream.

Here's the data flow. The client opens a standard HTTP connection, the server proxies each token from the LLM as an SSE event, and the client appends it to the UI:

SSE streaming: server pushes tokens as they're generated

Building the SSE endpoint

Three headers establish the connection. Then each token from the Chanl SDK pipes directly to the client as an SSE event:

typescript

import express from "express";
import { ChanlClient } from "@chanl-ai/sdk";
 
const app = express();
const chanl = new ChanlClient({
  apiKey: process.env.CHANL_API_KEY,
  model: "claude-sonnet-4-20250514",
});
 
app.use(express.json());
 
app.post("/api/chat/stream", async (req, res) => {
  const { agentId, messages } = req.body;
 
  // SSE headers — these three are non-negotiable
  res.setHeader("Content-Type", "text/event-stream");
  res.setHeader("Cache-Control", "no-cache");
  res.setHeader("Connection", "keep-alive");
  res.flushHeaders();
 
  try {
    const stream = chanl.chat.stream(agentId, messages);
 
    for await (const chunk of stream) {
      if (chunk.type === "token") {
        // SSE format: "data: <payload>\n\n"
        res.write(
          `data: ${JSON.stringify({ type: "token", content: chunk.content })}\n\n`
        );
      }
 
      // Check for tool calls in the stream
      if (chunk.type === "tool_call") {
        res.write(
          `data: ${JSON.stringify({
            type: "tool_call",
            id: chunk.id,
            name: chunk.name,
            arguments: chunk.arguments,
          })}\n\n`
        );
      }
 
      // Stream finished
      if (chunk.type === "done") {
        res.write(
          `data: ${JSON.stringify({
            type: "done",
            reason: chunk.reason,
          })}\n\n`
        );
      }
    }
  } catch (error) {
    res.write(
      `data: ${JSON.stringify({
        type: "error",
        message: error instanceof Error ? error.message : "Unknown error",
      })}\n\n`
    );
  }
 
  res.end();
});
 
app.listen(3000);

Consuming SSE from the browser

The browser has a built-in SSE client. For GET endpoints, it's one line:

typescript

const source = new EventSource("/api/chat/stream");
source.onmessage = (event) => {
  const data = JSON.parse(event.data);
  if (data.type === "token") {
    document.getElementById("response")!.textContent += data.content;
  }
};

But EventSource only supports GET. For POST (which you need for chat), use fetch with a streaming body reader:

typescript

async function streamChat(
  agentId: string,
  messages: Array<{ role: string; content: string }>
) {
  const response = await fetch("/api/chat/stream", {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({ agentId, messages }),
  });
 
  const reader = response.body!.getReader();
  const decoder = new TextDecoder();
  let buffer = "";
  const start = Date.now();
  let ttft: number | null = null;
 
  while (true) {
    const { done, value } = await reader.read();
    if (done) break;
 
    buffer += decoder.decode(value, { stream: true });
 
    // Parse SSE lines from buffer
    const lines = buffer.split("\n");
    buffer = lines.pop()!; // Keep incomplete line in buffer
 
    for (const line of lines) {
      if (!line.startsWith("data: ")) continue;
      const data = JSON.parse(line.slice(6));
 
      if (data.type === "token") {
        if (!ttft) {
          ttft = Date.now() - start;
          console.log(`TTFT: ${ttft}ms`);
        }
        document.getElementById("response")!.textContent += data.content;
      }
 
      if (data.type === "done") {
        console.log(`Total time: ${Date.now() - start}ms`);
      }
    }
  }
}

Simplifying with the Chanl SDK

That's a lot of boilerplate. The SDK handles SSE connection management, reconnection, and event parsing for you:

typescript

import { ChanlClient } from "@chanl-ai/sdk";
 
const chanl = new ChanlClient({
  apiKey: process.env.CHANL_API_KEY,
  model: "claude-sonnet-4-20250514",
});
 
// Stream a message — tokens arrive in real-time
const stream = chanl.chat.stream(agentId, [
  { role: "user", content: "Explain how our pricing works" },
]);
 
for await (const chunk of stream) {
  if (chunk.type === "token") process.stdout.write(chunk.content);
  if (chunk.type === "tool_call") console.log("Calling:", chunk.name);
  if (chunk.type === "done") console.log("\nComplete");
}

Your agent's tools and MCP servers execute transparently — tool call events surface automatically during the stream.

WebSocket streaming: bidirectional and persistent

SSE is one-directional: server pushes to client. That covers "user sends message, model responds." But some scenarios need the client to push events during a stream — cancellation, voice interruption, real-time audio input. That's where WebSockets come in.

WebSockets open a persistent, full-duplex TCP connection. Either side can send messages at any time. More complexity, but you get bidirectional communication.

Notice the cancel message below traveling from client to server over the same connection that's actively streaming tokens:

WebSocket streaming with bidirectional cancel support

Server implementation

The server manages an AbortController per connection. When the client sends cancel, the server aborts the stream immediately:

typescript

import { WebSocketServer } from "ws";
import { ChanlClient } from "@chanl-ai/sdk";
 
const wss = new WebSocketServer({ port: 8080 });
const chanl = new ChanlClient({
  apiKey: process.env.CHANL_API_KEY,
  model: "claude-sonnet-4-20250514",
});
 
wss.on("connection", (ws) => {
  let activeController: AbortController | null = null;
 
  ws.on("message", async (raw) => {
    const message = JSON.parse(raw.toString());
 
    if (message.type === "cancel") {
      // Client can cancel mid-stream — this is the WebSocket advantage
      activeController?.abort();
      ws.send(JSON.stringify({ type: "cancelled" }));
      return;
    }
 
    if (message.type === "chat") {
      activeController = new AbortController();
      const start = Date.now();
 
      try {
        const stream = chanl.chat.stream(
          message.agentId,
          message.messages,
          { signal: activeController.signal }
        );
 
        let tokenCount = 0;
        for await (const chunk of stream) {
          if (chunk.type === "token") {
            tokenCount++;
            ws.send(
              JSON.stringify({
                type: "token",
                content: chunk.content,
                metrics: {
                  ttft: tokenCount === 1 ? Date.now() - start : undefined,
                  tokenCount,
                },
              })
            );
          }
 
          if (chunk.type === "done") {
            const elapsed = Date.now() - start;
            ws.send(
              JSON.stringify({
                type: "done",
                metrics: {
                  totalMs: elapsed,
                  tokenCount,
                  tokensPerSecond: Math.round((tokenCount / elapsed) * 1000),
                },
              })
            );
          }
        }
      } catch (err: any) {
        if (err.name === "AbortError") return; // Expected on cancel
        ws.send(JSON.stringify({ type: "error", message: err.message }));
      } finally {
        activeController = null;
      }
    }
  });
});

Client-side WebSocket with cancellation

This wrapper gives you a clean API for sending messages and cancelling mid-stream:

typescript

class StreamingChatClient {
  private ws: WebSocket;
  private handlers = new Map<string, (data: any) => void>();
 
  constructor(url: string) {
    this.ws = new WebSocket(url);
    this.ws.onmessage = (event) => {
      const data = JSON.parse(event.data);
      this.handlers.get(data.type)?.(data);
    };
  }
 
  send(agentId: string, messages: Array<{ role: string; content: string }>) {
    this.ws.send(JSON.stringify({ type: "chat", agentId, messages }));
  }
 
  cancel() {
    this.ws.send(JSON.stringify({ type: "cancel" }));
  }
 
  on(event: string, handler: (data: any) => void) {
    this.handlers.set(event, handler);
    return this;
  }
}
 
// Usage
const chat = new StreamingChatClient("ws://localhost:8080");
chat
  .on("token", (data) => appendToUI(data.content))
  .on("done", (data) => showMetrics(data.metrics))
  .on("cancelled", () => showCancelledMessage());
 
chat.send(agentId, [{ role: "user", content: "Explain streaming" }]);
 
// User clicks "Stop" — cancels mid-stream
stopButton.onclick = () => chat.cancel();

The key difference from SSE: cancel() travels over the same connection receiving tokens. With SSE, cancellation requires closing the connection entirely or sending a separate HTTP request.

Streaming tool calls and structured outputs

Plain text streaming is straightforward — append each token to a string. Tool calls are where it gets hard. When an LLM invokes a function, it streams the call as partial JSON fragments:

text

// These arrive as separate chunks:
{"type": "tool_call_delta", "arguments": "{\""}
{"type": "tool_call_delta", "arguments": "query"}
{"type": "tool_call_delta", "arguments": "\": \""}
{"type": "tool_call_delta", "arguments": "weather"}
{"type": "tool_call_delta", "arguments": " in"}
{"type": "tool_call_delta", "arguments": " NYC"}
{"type": "tool_call_delta", "arguments": "\"}"}

You can't JSON.parse() any individual chunk. You need to accumulate fragments and detect when you have a complete JSON object. Here's an accumulator that buffers each fragment by index, trying to parse after every addition:

typescript

interface ToolCallBuffer {
  id: string;
  name: string;
  argumentsBuffer: string;
}
 
interface CompleteToolCall {
  id: string;
  name: string;
  arguments: Record<string, unknown>;
}
 
function createToolCallAccumulator() {
  const buffers = new Map<number, ToolCallBuffer>();
 
  return {
    feed(delta: {
      index: number;
      id?: string;
      name?: string;
      arguments?: string;
    }): { complete: boolean; toolCall?: CompleteToolCall } {
      // Initialize buffer for new tool call
      if (!buffers.has(delta.index)) {
        buffers.set(delta.index, {
          id: delta.id || "",
          name: delta.name || "",
          argumentsBuffer: "",
        });
      }
 
      const buffer = buffers.get(delta.index)!;
 
      // Accumulate pieces
      if (delta.id) buffer.id = delta.id;
      if (delta.name) buffer.name = delta.name;
      if (delta.arguments) buffer.argumentsBuffer += delta.arguments;
 
      // Try to parse — if it succeeds, the tool call is complete
      try {
        const parsed = JSON.parse(buffer.argumentsBuffer);
        buffers.delete(delta.index);
        return {
          complete: true,
          toolCall: {
            id: buffer.id,
            name: buffer.name,
            arguments: parsed,
          },
        };
      } catch {
        // JSON not complete yet — keep accumulating
        return { complete: false };
      }
    },
 
    reset() {
      buffers.clear();
    },
  };
}

Wire the accumulator into your stream — text tokens go to the UI, tool call fragments feed the accumulator until complete JSON emerges:

typescript

import { ChanlClient } from "@chanl-ai/sdk";
 
const chanl = new ChanlClient({
  apiKey: process.env.CHANL_API_KEY,
  model: "claude-sonnet-4-20250514",
});
 
const accumulator = createToolCallAccumulator();
 
const stream = chanl.chat.stream(agentId, messages, { tools: true });
 
for await (const chunk of stream) {
  // Regular text tokens
  if (chunk.type === "token") {
    appendToUI(chunk.content);
  }
 
  // Tool call fragments
  if (chunk.type === "tool_call_delta") {
    const result = accumulator.feed({
      index: chunk.index,
      id: chunk.id,
      name: chunk.name,
      arguments: chunk.arguments,
    });
 
    if (result.complete) {
      console.log(
        `Tool call complete: ${result.toolCall!.name}`,
        result.toolCall!.arguments
      );
      // Execute the tool, send results back to the model
    }
  }
}

Letting the SDK handle accumulation

The Chanl SDK handles this automatically. The tool_call event fires only when a complete, parsed tool call is ready:

typescript

import { ChanlClient } from "@chanl-ai/sdk";
 
const chanl = new ChanlClient({
  apiKey: process.env.CHANL_API_KEY,
  model: "claude-sonnet-4-20250514",
});
 
const stream = chanl.chat.stream(agentId, messages, { tools: true });
 
for await (const chunk of stream) {
  if (chunk.type === "token") {
    document.getElementById("response")!.textContent += chunk.content;
  }
 
  if (chunk.type === "tool_call") {
    // Tool call JSON fully assembled from streaming chunks
    showToolResult(chunk);
    console.log(`Tool: ${chunk.name}`, chunk.arguments);
  }
 
  if (chunk.type === "done") {
    console.log(`TTFT: ${chunk.metrics.ttft}ms`);
    console.log(`Tokens/sec: ${chunk.metrics.tokensPerSecond}`);
  }
}

The SDK normalizes the event format regardless of model provider — whether it's OpenAI's choices[0].delta structure or Anthropic's content_block_delta. You write one handler; it works with both.

React streaming with Chanl SDK

A streaming chat UI in React introduces state management challenges: appending tokens without re-rendering the entire list, tracking metrics, supporting cancellation, handling tool calls — all while staying responsive.

The SDK's React hook wraps all of this into a single call:

tsx

import { useStreamingChat } from "@chanl-ai/sdk/react";
 
function ChatInterface({ agentId }: { agentId: string }) {
  const { messages, isStreaming, ttft, tokensPerSecond, send, cancel } =
    useStreamingChat(agentId, {
      onToolCall: (call) => {
        // Show real-time tool execution in UI
        addToolCallIndicator(call.name, call.arguments);
      },
    });
 
  return (
    <div className="flex flex-col h-full">
      <div className="flex-1 overflow-y-auto p-4 space-y-4">
        {messages.map((msg) => (
          <MessageBubble key={msg.id} {...msg} />
        ))}
      </div>
 
      {isStreaming && (
        <div className="flex items-center gap-3 px-4 py-2 text-sm text-muted-foreground">
          {ttft && <span>TTFT: {ttft}ms</span>}
          {tokensPerSecond && <span>{tokensPerSecond} tok/s</span>}
          <button
            onClick={cancel}
            className="ml-auto text-destructive hover:underline"
          >
            Stop generating
          </button>
        </div>
      )}
 
      <ChatInput onSend={send} disabled={isStreaming} />
    </div>
  );
}

Under the hood, useStreamingChat handles four things you'd otherwise build yourself:

Optimistic message insertion. The user's message appears before the server acknowledges it.
Token batching. Tokens batch on requestAnimationFrame boundaries — one React render per frame instead of per token.
Abort controller. cancel calls AbortController.abort(), tearing down the SSE connection cleanly.
TTFT tracking. Measures time from send() to the first token event automatically.

Manual implementation (without SDK)

If you're not using React or need full control, here's what the manual version looks like — about 80 lines of state management covering token buffering, animation-frame batching, and abort control:

typescript

function useManualStreamingChat(endpoint: string) {
  const [messages, setMessages] = useState<Message[]>([]);
  const [isStreaming, setIsStreaming] = useState(false);
  const [ttft, setTtft] = useState<number | null>(null);
  const controllerRef = useRef<AbortController | null>(null);
  const tokenBufferRef = useRef("");
  const rafRef = useRef<number | null>(null);
 
  // Batch DOM updates to animation frames
  const flushTokens = useCallback(() => {
    if (!tokenBufferRef.current) return;
    const tokens = tokenBufferRef.current;
    tokenBufferRef.current = "";
 
    setMessages((prev) => {
      const updated = [...prev];
      const last = updated[updated.length - 1];
      updated[updated.length - 1] = {
        ...last,
        content: last.content + tokens,
      };
      return updated;
    });
 
    rafRef.current = null;
  }, []);
 
  const appendToken = useCallback(
    (token: string) => {
      tokenBufferRef.current += token;
      if (!rafRef.current) {
        rafRef.current = requestAnimationFrame(flushTokens);
      }
    },
    [flushTokens]
  );
 
  const send = useCallback(
    async (content: string) => {
      const controller = new AbortController();
      controllerRef.current = controller;
      setIsStreaming(true);
      setTtft(null);
 
      // Add user message + empty assistant message
      const userMsg: Message = {
        id: crypto.randomUUID(),
        role: "user",
        content,
      };
      const assistantMsg: Message = {
        id: crypto.randomUUID(),
        role: "assistant",
        content: "",
      };
      setMessages((prev) => [...prev, userMsg, assistantMsg]);
 
      const start = Date.now();
 
      try {
        const response = await fetch(endpoint, {
          method: "POST",
          headers: { "Content-Type": "application/json" },
          body: JSON.stringify({ messages: [{ role: "user", content }] }),
          signal: controller.signal,
        });
 
        const reader = response.body!.getReader();
        const decoder = new TextDecoder();
        let buffer = "";
 
        while (true) {
          const { done, value } = await reader.read();
          if (done) break;
 
          buffer += decoder.decode(value, { stream: true });
          const lines = buffer.split("\n");
          buffer = lines.pop()!;
 
          for (const line of lines) {
            if (!line.startsWith("data: ")) continue;
            const data = JSON.parse(line.slice(6));
 
            if (data.type === "token") {
              if (ttft === null) setTtft(Date.now() - start);
              appendToken(data.content);
            }
          }
        }
      } catch (err: any) {
        if (err.name !== "AbortError") console.error(err);
      } finally {
        flushTokens(); // Flush remaining tokens
        setIsStreaming(false);
        controllerRef.current = null;
      }
    },
    [endpoint, appendToken, flushTokens, ttft]
  );
 
  const cancel = useCallback(() => {
    controllerRef.current?.abort();
  }, []);
 
  return { messages, isStreaming, ttft, send, cancel };
}

The requestAnimationFrame batching is the critical piece. Without it, you'd trigger a React render per token — 80-100 renders per second on a fast model — causing janky, sluggish UI.

Backpressure: when the client can't keep up

GPT-4o generates 80-100 tokens per second. Each token triggers a DOM update. On a fast laptop, fine. On a budget Android phone rendering Markdown with syntax highlighting, the browser falls behind. Tokens arrive faster than the UI can paint them.

Without handling, you get one of two failures: unbounded buffer growth (eventually crashing the tab) or an unresponsive UI choking on a backlog of queued updates.

Server-side drain handling

When res.write() returns false, the kernel's TCP send buffer is full. This pattern pauses the stream until the client catches up:

typescript

import express from "express";
import { ChanlClient } from "@chanl-ai/sdk";
 
const app = express();
const chanl = new ChanlClient({
  apiKey: process.env.CHANL_API_KEY,
  model: "claude-sonnet-4-20250514",
});
 
app.post("/api/chat/stream", async (req, res) => {
  res.setHeader("Content-Type", "text/event-stream");
  res.flushHeaders();
 
  const stream = chanl.chat.stream(req.body.agentId, req.body.messages);
 
  for await (const chunk of stream) {
    if (chunk.type !== "token") continue;
 
    const payload = `data: ${JSON.stringify({ type: "token", content: chunk.content })}\n\n`;
    const canContinue = res.write(payload);
 
    if (!canContinue) {
      // Buffer is full — wait for the client to catch up
      await new Promise<void>((resolve) => res.once("drain", resolve));
    }
  }
 
  res.end();
});

Client-side batch rendering

Buffer tokens and flush once per frame. This caps DOM writes at 60/second regardless of token rate:

typescript

class TokenRenderer {
  private buffer = "";
  private frameId: number | null = null;
  private element: HTMLElement;
 
  constructor(element: HTMLElement) {
    this.element = element;
  }
 
  append(token: string) {
    this.buffer += token;
    if (!this.frameId) {
      this.frameId = requestAnimationFrame(() => this.flush());
    }
  }
 
  private flush() {
    if (this.buffer) {
      // Single DOM write per frame — 60fps max
      this.element.textContent += this.buffer;
      this.buffer = "";
    }
    this.frameId = null;
  }
}

On a fast connection receiving 100 tokens/second, you batch roughly 1-2 tokens per frame. On a slow device, the buffer absorbs bursts without dropping frames.

Production: load balancers, proxies, and edge cases

Streaming works perfectly on localhost. Then you deploy behind Nginx, Cloudflare, or an AWS ALB, and tokens arrive in batches. The culprit is almost always response buffering.

Nginx configuration for SSE

Nginx buffers upstream responses by default. Every directive here matters — miss one and tokens batch up:

nginx

location /api/chat/stream {
    proxy_pass http://backend:3000;
 
    # Required for SSE streaming
    proxy_buffering off;
    proxy_cache off;
 
    # Prevent Nginx from buffering the response
    proxy_set_header X-Accel-Buffering no;
 
    # HTTP/1.1 keepalive
    proxy_http_version 1.1;
    proxy_set_header Connection '';
 
    # Don't timeout long-running streams
    proxy_read_timeout 300s;
    proxy_send_timeout 300s;
 
    # Disable gzip — it buffers until it has enough data to compress
    gzip off;
}

Why each one matters:

proxy_buffering off — Stops Nginx from waiting for a "full" response before forwarding.
X-Accel-Buffering no — Tells Nginx not to buffer at the application level. Can also be set as a response header from Express.
Connection '' — Enables keepalive. Without it, Nginx may close the connection early.
gzip off — Gzip buffers data until it has enough to compress. For tiny SSE events, it just adds latency.

Recovering from mid-stream disconnects

When the connection drops at token 47 of 200, the browser's EventSource reconnects automatically — but restarts from scratch, which is wrong for chat. Tag each event with a sequence number so the client can resume:

typescript

// Server: tag each event with a sequence number
let seq = 0;
for await (const chunk of stream) {
  if (chunk.type === "token") {
    res.write(
      `id: ${++seq}\ndata: ${JSON.stringify({
        type: "token",
        content: chunk.content,
        seq,
      })}\n\n`
    );
  }
}
 
// Client: track last received sequence
let lastSeq = 0;
const source = new EventSource(`/api/chat/stream?lastSeq=${lastSeq}`);
source.onmessage = (event) => {
  const data = JSON.parse(event.data);
  lastSeq = data.seq;
  // ... render token
};
 
// On reconnect, EventSource sends Last-Event-ID header automatically
// Server checks this and skips already-sent events

Timeout gotchas across your stack

Different layers have different defaults. Any one of them being too short kills your stream mid-response — especially during long tool calls when no tokens flow:

Layer	Default timeout	Fix
Nginx	60s `proxy_read_timeout`	Set to 300s for long streams
AWS ALB	60s idle timeout	Increase to 300s in target group settings
Cloudflare	100s (Free), 600s (Enterprise)	Send periodic `: keepalive\n\n` comments
Browser	No timeout for SSE	N/A — but HTTP/2 connections may timeout
Node.js	120s `server.timeout`	`server.timeout = 0` to disable

Cloudflare deserves special attention. It terminates connections that go silent for 100 seconds. If your model is running a tool call that takes 90 seconds, no tokens flow during that window. Send SSE comments as heartbeats:

typescript

// Keep Cloudflare alive during tool execution
const heartbeat = setInterval(() => {
  res.write(": keepalive\n\n");
}, 30000); // Every 30 seconds
 
// Clean up when stream ends
stream.on("end", () => clearInterval(heartbeat));

Pick SSE unless you need WebSockets

After building both, here's the decision framework:

Factor	SSE	WebSocket
Direction	Server to client only	Bidirectional
Protocol	Standard HTTP	Upgrade to `ws://` protocol
Reconnection	Automatic (built into EventSource)	Manual — you implement retry logic
Proxy/CDN support	Works everywhere (standard HTTP)	Needs explicit proxy support
Browser API	`EventSource` (built-in, 3 lines)	`WebSocket` (built-in, more setup)
HTTP/2 multiplexing	Yes — multiple SSE streams over one TCP connection	No — each WebSocket is a separate TCP connection
Auth	Standard HTTP headers (cookies, Bearer tokens)	Auth in query params or first message (no headers on upgrade)
Use case	Chat streaming, notifications, real-time updates	Voice AI, collaborative editing, gaming

Use SSE when: the user sends a message and the server streams a response. This covers chatbots, AI assistants, code completion, search-as-you-type — roughly 90% of AI streaming use cases. If you've built a RAG pipeline or an agent with tools, SSE is your transport.

Use WebSockets when: you need the client to send events during an active stream. Voice AI with interruption detection is the canonical example — the client streams audio while simultaneously receiving the agent's response. If your architecture involves scoring live conversations and feeding results back during the call, WebSockets give you that bidirectional channel.

Skip HTTP/2 Server Push. It was designed for preloading assets, not event streaming, and most browsers have removed support. HTTP/2's multiplexed streams work great with SSE (multiple SSE connections share one TCP connection), but the streaming protocol on top is still SSE.

For most teams building AI chat products: SSE for streaming, a separate REST endpoint for cancellation if you're using EventSource, and HTTP/2 at the transport layer for multiplexing. That combination handles production traffic with minimal complexity.

Streaming touches every layer of your stack: token generation, SSE/WebSocket protocols, backpressure, proxy buffering, partial JSON parsing, and frontend rendering. Each has its own failure mode, and they compound.

Once you've built it right, though, it's a stable foundation. The SSE server here handles ChatGPT-scale token rates. Backpressure patterns prevent buffer bloat on slow clients. The tool call accumulator handles streaming JSON from any provider. And the React patterns — SDK or manual — keep the UI responsive at any speed.

If you're building agents with tools and MCP servers, streaming is how those tool invocations become visible to users in real time. If you're evaluating agent quality, TTFT and tokens-per-second give you operational visibility batch responses can't match.

Start with SSE. Add WebSockets only when you have a bidirectional use case. Handle backpressure from day one.

Sources

MDN Web Docs — Server-Sent Events (EventSource API) — Browser API reference for SSE, including reconnection behavior and event format.
MDN Web Docs — Using Readable Streams — Web Streams API reference for reading streaming fetch responses.
OpenAI API Reference — Streaming — OpenAI's streaming response format, including tool call deltas and finish reasons.
Anthropic API Reference — Streaming Messages — Anthropic's streaming event types, including content_block_delta and message_stop.
Node.js Documentation — Stream Backpressure — Official guide to backpressure in Node.js writable streams, including drain event handling.
WHATWG — HTML Living Standard: Server-Sent Events — The specification for SSE protocol, event format, and reconnection rules.
RFC 6455 — The WebSocket Protocol — Full WebSocket protocol specification covering the upgrade handshake and frame format.
Nginx Documentation — Module ngx_http_proxy_module — Proxy buffering configuration that's critical for SSE passthrough.

Key Takeaway

Testing edge cases before production deployment can reduce customer complaints by 80% and prevent costly emergency fixes post-launch.

learning-ai streaming sse websockets real-time typescript chanl-sdk

Lucas Dalamarta

Engineering Lead

Building the platform for AI agents at Chanl — tools, testing, and observability for customer experience.

Learn Agentic AI

One lesson a week — practical techniques for building, testing, and shipping AI agents. From prompt engineering to production monitoring. Learn by doing.