LangGraph from Scratch, Part 5: Streaming Responses

In Part 4 your bot made you wait. You hit Send, an animated waiting state held the assistant's place, and a few seconds later the whole reply dropped in at once, like a vending machine. Real chat apps don't do that. The words appear as the model writes them, and you start reading before it has finished its sentence.

By the end of this page, yours does too. Same backend graph from Part 3, same chat UI from Part 4. The only thing that changes is when the words show up.

Three panels of the same chat card, left to right, labeled FIRST TOKENS, MID-STREAM and COMPLETE. In the first the assistant bubble holds one word, 'Recursion', followed by a thin violet caret bar. In the second the bubble has grown to 'Recursion is when a function calls' with the caret still trailing the text. In the third it reads the finished sentence 'Recursion is when a function calls itself.' and the caret is gone. Each panel carries the Lattice header and a STREAMING pill. — Today's destination. One reply, landing a word at a time. The Part 3 graph is untouched; it just narrates itself now instead of making you wait for the end.

Nothing new to install today. StreamingResponse has shipped with FastAPI since your Part 1 setup, and astream_events is already inside the LangGraph you installed. This part is all wiring, split evenly between the backend and the browser.

Why three seconds feels like forever

Here's the uncomfortable truth about the Part 4 version: the model was never the slow part. A small model writes that recursion answer in about three seconds whether you stream it or not. Streaming doesn't make the model faster. It makes the wait feel different.

When the whole reply lands at once, you stare at three bouncing dots for three seconds and read for one. When it streams, the first word shows up in a blink and you read along while the rest arrives. Same three seconds of work. Completely different experience.

Two timelines over the same three seconds. The top one, labeled WITHOUT STREAMING, is one long bar reading 'a blank screen, a spinner, nothing to read' with the whole reply arriving in a single block at 3.0 seconds. The bottom one, labeled WITH STREAMING, has the first word arriving at 0.3 seconds and then a steady run of words across the whole three seconds, with the sentence already readable partway through. — Same model, same total time. Streaming just stops hiding the first 2.7 seconds behind a spinner.

It's the difference between waiting at a restaurant table for a finished plate and watching the chef build it through the kitchen pass. The food takes the same time either way. One of them feels like service; the other feels like a closed door. We're moving your bot to the pass.

Two pipes, and why we pick the simpler one

There are two common ways for a server to push data to a browser as it happens. A WebSocket is a two-way phone call: both sides can talk at any time. Server-Sent Events (SSE) is a one-way radio broadcast: the server talks, the browser listens.

For an LLM reply you only need one direction. The model talks; you don't need to whisper back to it mid-word. (Stopping it, which we'll add later, is a separate cancel, not a message sent up the same pipe.) SSE also rides plain HTTP, the exact protocol your fetch already speaks, so there's no new handshake and no new library. When five ways exist and one is enough, take the simple one.

The shape of one streamed word

Before any code, one design decision worth thirty seconds. Each data: message could just carry a bare token, like data: Re. It would work today. But in Part 6 your bot grows tools, and the stream will need to carry other things too: "started searching the web," "calculator returned 4." If tokens are bare strings, every new kind of event means a new parser on the frontend.

So we wrap each token in a tiny envelope with a type field:

TEXT

data: {"type": "token", "content": "Re"}

data: {"type": "token", "content": "cursion"}

data: {"type": "done"}

Now the frontend reads one shape forever: parse the JSON, look at type, react. A token grows the bubble. A done says the reply is complete. When Part 6 adds {"type": "tool_start", ...}, the parser you're about to write doesn't change by a line; it just learns one more type. Designing the envelope before you need it is the cheapest insurance in the series.

Teach the backend to hand over each word

Open app/main.py. Your /chat endpoint currently calls graph.invoke(...), which waits for the entire reply and returns it in one piece. You're going to swap that for a generator that yields one envelope per token as the model produces it.

First, a one-line helper that formats an envelope, and the generator itself. Add these above your /chat endpoint:

backend/app/main.py

import json
from fastapi.responses import StreamingResponse


def sse(payload: dict) -> str:
    return f"data: {json.dumps(payload)}\n\n"


async def token_stream(message: str):
    inputs = {"messages": [HumanMessage(content=message)]}
    async for event in graph.astream_events(inputs, version="v2"):
        if event["event"] == "on_chat_model_stream":
            token = event["data"]["chunk"].content
            if token:
                yield sse({"type": "token", "content": token})
    yield sse({"type": "done"})

sse does exactly what the wire format demands: JSON, prefixed with data: , terminated by the blank line. The real work is graph.astream_events. Instead of running the graph and handing you the final tray, it narrates the run as a series of events while it happens. You loop over them with async for and watch for one kind: on_chat_model_stream, which fires once per token the model emits. You pull chunk.content, the token's text, and yield it as an envelope. When the loop ends, one last done envelope closes the stream.

Now the endpoint. It stops returning a single object and starts returning a stream, so the -> ChatResponse annotation comes off and StreamingResponse takes over. That leaves class ChatResponse with nothing to describe; it's harmless sitting there, so leave it where it is. Replace your old /chat:

backend/app/main.py

@app.post("/chat")
async def chat(request: ChatRequest):
    return StreamingResponse(
        token_stream(request.message),
        media_type="text/event-stream",
    )

StreamingResponse takes your generator and feeds whatever it yields straight down the HTTP connection, one chunk at a time, without waiting for it to finish. The media_type="text/event-stream" is the official "this is SSE" label; it tells the browser, and any proxy in between, not to buffer the response and to let the bytes flow as they arrive.

Save it, and let's look at the raw stream before a browser ever touches it. From any terminal that isn't running the server, curl it with the -N flag:

BASH

curl -N -X POST http://localhost:8000/chat \
  -H "Content-Type: application/json" \
  -d '{"message": "explain recursion in one sentence"}'

A dark terminal running curl with the -N flag against localhost:8000/chat. After a comment noting that -N turns off curl's buffering, the output is a run of lines: data: {'type': 'token', 'content': 'Recursion'}, then 'is', 'when', 'a', 'function', 'calls', 'itself.', each on its own data line, and finally data: {'type': 'done'}. — The raw stream, before any browser touches it. The -N flag turns off curl's own buffering so each line prints the instant it lands. Each line is one envelope; the blank line the server sends between them is the part that matters next.

The words appear one by one in your terminal, with real pauses between them. That's the model thinking out loud over an open HTTP connection. The backend is done. Now the harder half: teaching the browser to read this.

Teach the browser to read a firehose

Back in frontend/app/page.tsx. In Part 4, the relevant lines of sendMessage waited for the whole reply and appended it as one finished bubble:

frontend/app/page.tsx

const data = await res.json();
setMessages((prev) => [...prev, { role: "assistant", content: data.reply }]);

res.json() is the problem now. It waits for the entire response body before it gives you anything, which is the exact pause we're deleting. The new plan has two moves: add an empty assistant bubble up front, then grow its text as tokens arrive.

Start with a small helper that grows the last message. Add it inside the component, next to sendMessage:

frontend/app/page.tsx

function appendToken(token: string) {
  setMessages((prev) => {
    const next = [...prev];
    const last = next[next.length - 1];
    next[next.length - 1] = { ...last, content: last.content + token };
    return next;
  });
}

It's the same immutability rule from Part 4: build a new array, with a new object for the last message whose content is the old content plus the new token. React sees fresh references and repaints, so the bubble visibly grows with every call.

Next, when the user sends, add both the user's message and an empty assistant bubble in one shot. That empty bubble is the thing appendToken will fill:

frontend/app/page.tsx

setMessages((prev) => [
  ...prev,
  { role: "user", content: text },
  { role: "assistant", content: "" },
]);

Now the part that reads the stream. The instinct is to read each chunk and parse it. Let's write that instinct out in full, run it, and watch it fail, because the way it fails teaches the fix:

frontend/app/page.tsx

const res = await fetch(`${API_BASE}/chat`, {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({ message: text }),
});
if (!res.ok || !res.body) throw new Error();

const reader = res.body.getReader();
const decoder = new TextDecoder();
while (true) {
  const { value, done } = await reader.read();
  if (done) break;
  const chunk = decoder.decode(value);
  const envelope = JSON.parse(chunk.replace("data: ", "")); // hopeful
  appendToken(envelope.content);
}

Save, send a message, and open your browser's dev tools console. Instead of a smooth reply, you get this:

A dark browser dev tools Console tab. The first line is a console.log(chunk) call, and under it the chunk it printed: a data: line whose JSON breaks off at 'content: fun' with no closing brace. Below that, a red error: Uncaught (in promise) SyntaxError: Unexpected end of JSON input, with a stack trace pointing at sendMessage in page.tsx line 46. A comment notes that JSON.parse got half a parcel because the chunk ended mid-envelope. — The error you will hit, met on purpose. A read handed you half a parcel, the JSON ended mid-object, and parse choked on the brace that never came.

Read it like the errors from Part 3 and Part 4: SyntaxError: Unexpected end of JSON input. Your JSON.parse got handed a string like data: {"type": "token", "content": " fun with no closing brace. The network split one envelope across two reads. The other failure mode is just as common: one read arrives holding two envelopes glued together, and parse trips on the second {. The bytes are fine. Your assumption that one read equals one message is what's wrong.

This is why you put a blank line after every envelope. The \n\n is a fence between messages, and the fix is to respect the fence: pile every chunk into a buffer, cut it on \n\n, handle the complete pieces, and keep the unfinished tail for next time.

frontend/app/page.tsx

const reader = res.body.getReader();
const decoder = new TextDecoder();
let buffer = "";

while (true) {
  const { value, done } = await reader.read();
  if (done) break;
  buffer += decoder.decode(value, { stream: true });
  const parts = buffer.split("\n\n");
  buffer = parts.pop() ?? "";          // the unfinished tail waits here
  for (const part of parts) {
    if (!part.startsWith("data: ")) continue;
    const envelope = JSON.parse(part.slice(6));
    if (envelope.type === "token") appendToken(envelope.content);
  }
}

Two lines carry the whole idea. buffer.split("\n\n") cuts on the fences, giving you an array of pieces. parts.pop() lifts off the last piece and parks it back in buffer, because the last piece after a split is whatever came after the final fence, which might be a half-finished envelope still arriving. Everything before it is guaranteed complete, so you parse those with confidence. Next read appends to the tail, and a partial envelope quietly completes itself. The if (envelope.type === "token") is your forward-compat seatbelt: a done envelope sails through untouched, and so will Part 6's tool events.

Save and send. The reply crawls across the screen, word by word, exactly like the terminal but inside a real chat bubble. It streams.

A pipeline diagram, left to right. On the left, a box labeled THE MODEL runs astream_events and emits an on_chat_model_stream event per word. In the middle, ONE BELT labeled text/event-stream carries four labeled envelopes reading 'Re', 'cursion', 'is', 'when', with a backslash-n-backslash-n marker between each. On the right, a box labeled THE BROWSER runs body.getReader, cuts on the blank line, and a chat bubble reads 'Recursion is when' with a caret. A caption says the model plates each word, the belt carries it, the bubble grows a word at a time. — The whole pipeline. The model plates a word, the belt carries one labeled envelope, the reader cuts on the blank line and grows the bubble. Hold this picture; Part 6 reuses every piece of it.

There's a particular relief the first time the words start crawling across the screen on their own. The app stops feeling like a form you submit and starts feeling like something thinking back at you. You built that with about forty lines and one blank-line convention.

Comic in two panels. Panel one: a cheerful mailman hands Yad, a bearded developer with headphones, a single large card with the letter H on it over a garden fence; Yad throws his arms up, delighted, and says 'IT BEGINS!'. Panel two: a moment later Yad happily holds up five cards spelling HELLO while the mailman walks off waving; Yad, beaming with a happy tear, says 'I READ IT AS IT CAME!'. — Streaming, taken literally. You don't wait for the whole word HELLO in a sealed box; you get the H, then the E, and you're already reading. Value arrives with the first token, not the last.

A button that says enough

Streaming opens a new problem. A long answer might run for fifteen seconds, and sometimes you can tell from the first line that the bot misread you. You want a way out. Right now there isn't one: the only button is Send, and it's disabled while a reply streams.

The browser's tool for canceling an in-flight fetch is an AbortController. You make one per request, hand its signal to fetch, and calling .abort() tears the connection down. Add a piece of state to hold the current controller, and a stop function:

frontend/app/page.tsx

const [controller, setController] = useState<AbortController | null>(null);

function stop() {
  controller?.abort();
}

In sendMessage, create a controller before the fetch, pass its signal, and remember it so stop can reach it:

frontend/app/page.tsx

const controller = new AbortController();
setController(controller);

const res = await fetch(`${API_BASE}/chat`, {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({ message: text }),
  signal: controller.signal,        // the cancel wire
});

When you abort, fetch throws an AbortError. That's not a real failure, so your catch should ignore it instead of flashing the red "could not reach the backend" banner from Part 4. Adjust the catch and finally:

frontend/app/page.tsx

} catch (err) {
  if ((err as Error).name !== "AbortError") {
    setError("Could not reach the backend. Is it running on :8000?");
  }
} finally {
  setLoading(false);
  setController(null);
}

Last, swap the footer button while a reply is streaming. Send becomes Stop:

frontend/app/page.tsx

{loading ? (
  <Button type="button" variant="outline" onClick={stop} aria-label="Stop response" className="h-11 rounded-xl px-4">
    <Square className="size-3.5 fill-current" aria-hidden="true" />
    <span className="hidden sm:inline">Stop</span>
  </Button>
) : (
  <Button
    type="submit"
    aria-label="Send message"
    disabled={!input.trim()}
    className="h-11 rounded-xl px-4 shadow-md shadow-primary/20"
  >
    <span className="hidden sm:inline">Send</span>
    <Send className="size-4" aria-hidden="true" />
  </Button>
)}

The Send branch is the button from Part 4 with one clause deleted: disabled used to read loading || !input.trim(), and now it's only !input.trim(). It doesn't need the loading half any more, because while a reply is in flight this branch isn't on screen at all; the Stop button is. Both buttons keep the same h-11 rounded-xl shape so the footer doesn't twitch when they swap, and both label themselves for screen readers while hiding the word on narrow screens, leaving the filled square and the paper plane to speak for themselves.

The chat card at localhost:3000, its header pill now reading 'Streaming enabled'. The user asked 'Explain recursion in one sentence.' and the assistant bubble is mid-stream, reading 'Recursion is when a function solves a problem by calling itself on a smaller' with a thin violet caret bar after the last word. In the footer, the filled Send button has been replaced by an outline button showing a small square and the word 'Stop'. — Mid-stream, Send becomes Stop. One click aborts the fetch, the words stop, and you stop paying for an answer you don't want.

Watch the backend terminal while you do it. The instant you hit Stop, Uvicorn notices the client went away and the async for loop stops pulling tokens from the model. You didn't just hide the reply; you genuinely called it off.

Give it a caret and a self-scroll

Two small touches separate "it works" from "it feels alive." First, that thin blinking bar trailing the text in every screenshot. It replaces Part 4's three-dot waiting state: once the answer is writing itself, the growing message is a better indicator than a separate loader.

Your ChatBubble from Part 4 learns one new prop, streaming, and draws the caret after the text when it's true:

frontend/app/page.tsx

function ChatBubble({ message, streaming }: { message: Message; streaming: boolean }) {
  // ...avatar and role label unchanged, down to the bubble itself...
  <div
    className={
      isUser
        ? "rounded-[1.35rem] rounded-br-md bg-primary px-4 py-3 text-sm leading-6 text-primary-foreground shadow-lg shadow-primary/15"
        : "min-h-12 rounded-[1.35rem] rounded-bl-md border bg-card px-4 py-3 text-sm leading-6 shadow-sm"
    }
  >
    {message.content}
    {streaming ? (
      <span className="ml-1 inline-block h-4 w-0.5 animate-pulse rounded-full bg-primary align-middle" aria-hidden="true" />
    ) : null}
  </div>
  // ...rest of the component unchanged...
}

The caret is an empty <span>, not a character: h-4 w-0.5 makes it a hairline bar, bg-primary paints it in the accent color, animate-pulse blinks it, and aria-hidden keeps a screen reader from announcing a decoration. The other new class is the min-h-12 on the assistant branch. Without it, a bubble holding nothing but a caret collapses to a sliver for the split second before the first token lands, and you watch the layout jump.

Now tell the bubble when to blink. Exactly one message in the list qualifies: the last one, when it's the assistant's, while loading is still true.

frontend/app/page.tsx

{messages.map((message, index) => (
  <ChatBubble
    key={`${message.role}-${index}`}
    message={message}
    streaming={loading && message.role === "assistant" && index === messages.length - 1}
  />
))}

Which leaves the thinking bubble with nothing to do. A message that writes itself says everything three bouncing dots used to say, so take the dots out in three edits: delete the ThinkingBubble function, delete the {loading ? <ThinkingBubble /> : null} line from the message list, and loosen the empty-state guard from {messages.length === 0 && !loading ? ( back to {messages.length === 0 ? (. That guard existed because loading used to go true while the message list was still empty; now sendMessage pushes the empty assistant bubble in the same setMessages call as your question, so the list is never empty while a reply is in flight. Leave the .typing-dot keyframes in globals.css. Nothing references them any more, and unused CSS isn't worth a trip to the stylesheet.

Second, auto-scroll. Part 4 already built this: a bottomRef, an empty marker div sitting after the .map(...), and a useEffect that calls scrollIntoView so new words don't slide past the bottom of the message list while you read. Streaming needs one thing changed, in the dependency array:

frontend/app/page.tsx

useEffect(() => {
  bottomRef.current?.scrollIntoView({ behavior: "smooth" });
}, [messages]);            // Part 4 had [messages, loading]

loading was in that array because the thinking bubble appearing and vanishing changed the height of the list. There's no thinking bubble now, and every token rewrites messages, so the effect fires dozens of times per reply on its own. The list follows the text down without being asked.

Two import lines, and only one new name in either of them. The React hooks all arrived in Part 4; the single thing this part adds to the top of the file is the Square icon for the Stop button. Check yours match:

frontend/app/page.tsx

import { useEffect, useRef, useState, type FormEvent } from "react";
import { Bot, CircleAlert, Send, Sparkles, Square, UserRound } from "lucide-react";

Send one more message and watch it land: the caret blinks in the empty bubble, words stream in to fill it, the list scrolls itself to follow, and the caret disappears the moment the stream closes and loading flips off in the finally. Note what does not end it: your reader loop never looks at the done envelope at all. It filters for type === "token" and lets done sail past. The connection closing is the signal.

Make the chrome tell the truth

One pass left, and it's all copy. The card still introduces itself the way it did back when it made you wait: the header pill claims a "Local workspace", the empty state offers to "figure out" something together, and the footer says answers "are generated by" your workflow. Six strings, and the app starts describing what it actually does:

frontend/app/page.tsx

// EmptyState: the eyebrow, the headline, the line under it
<p className="...">Streaming is on</p>
<h1 className="...">Read the answer as it is written.</h1>
<p className="...">
  Lattice streams each token from your LangGraph workflow, so every response feels immediate.
</p>

// The header pill, wide screens then narrow
<span className="hidden sm:inline">Streaming enabled</span>
<span className="sm:hidden">Live</span>

// The line under the composer
<p className="...">Responses stream from your local LangGraph workflow.</p>

No logic, no risk, thirty seconds of typing. It's also the difference between an app and a demo of an app: every figure in this part shows the new strings, and yours should too.

Right now you have: a backend that streams a model's reply token by token over SSE, a frontend that reads that stream through a buffer that never trips on a chunk boundary, a Stop button that genuinely cancels the work, and a UI that scrolls and blinks like the apps you use every day. Here's the full page.tsx, in case a piece drifted while you built it up:

frontend/app/page.tsx

"use client";

import { useEffect, useRef, useState, type FormEvent } from "react";
import { Bot, CircleAlert, Send, Sparkles, Square, UserRound } from "lucide-react";
import { Button } from "@/components/ui/button";
import { Input } from "@/components/ui/input";
import { Card } from "@/components/ui/card";

interface Message {
  role: "user" | "assistant";
  content: string;
}

const API_BASE = process.env.NEXT_PUBLIC_API_BASE_URL;

const STARTER_PROMPTS = [
  "Explain recursion in one sentence",
  "Give me a small Python project idea",
  "What makes LangGraph useful?",
];

function Brand() {
  return (
    <div className="flex min-w-0 items-center gap-3">
      <div className="grid size-10 shrink-0 place-items-center rounded-2xl bg-primary text-primary-foreground shadow-lg shadow-primary/20">
        <Sparkles className="size-5" aria-hidden="true" />
      </div>
      <div className="min-w-0">
        <p className="truncate text-base font-semibold tracking-tight">Lattice</p>
        <p className="truncate text-xs text-muted-foreground">A LangGraph assistant</p>
      </div>
    </div>
  );
}

function EmptyState({ onPick }: { onPick: (prompt: string) => void }) {
  return (
    <div className="mx-auto flex h-full w-full max-w-2xl flex-col items-center justify-center px-2 py-10 text-center">
      <div className="relative mb-6">
        <div className="absolute inset-0 scale-150 rounded-full bg-primary/15 blur-2xl" />
        <div className="relative grid size-16 place-items-center rounded-3xl border border-primary/20 bg-card shadow-xl shadow-primary/10">
          <Bot className="size-7 text-primary" aria-hidden="true" />
        </div>
      </div>
      <p className="mb-2 text-xs font-semibold uppercase tracking-[0.22em] text-primary">
        Streaming is on
      </p>
      <h1 className="text-balance text-3xl font-semibold tracking-tight sm:text-4xl">
        Read the answer as it is written.
      </h1>
      <p className="mt-3 max-w-lg text-pretty text-sm leading-6 text-muted-foreground sm:text-base">
        Lattice streams each token from your LangGraph workflow, so every response feels immediate.
      </p>
      <div className="mt-8 grid w-full gap-2 sm:grid-cols-3">
        {STARTER_PROMPTS.map((prompt) => (
          <button
            key={prompt}
            type="button"
            onClick={() => onPick(prompt)}
            className="rounded-2xl border bg-card/70 px-4 py-3 text-left text-sm leading-5 text-foreground shadow-sm transition hover:-translate-y-0.5 hover:border-primary/35 hover:shadow-md focus-visible:outline-none focus-visible:ring-2 focus-visible:ring-ring"
          >
            {prompt}
          </button>
        ))}
      </div>
    </div>
  );
}

function ChatBubble({ message, streaming }: { message: Message; streaming: boolean }) {
  const isUser = message.role === "user";

  return (
    <article className={`flex items-end gap-3 ${isUser ? "justify-end" : "justify-start"}`}>
      {!isUser ? (
        <div className="grid size-8 shrink-0 place-items-center rounded-xl bg-primary/10 text-primary">
          <Bot className="size-4" aria-hidden="true" />
        </div>
      ) : null}
      <div className={`flex max-w-[82%] flex-col sm:max-w-[72%] ${isUser ? "items-end" : "items-start"}`}>
        <p className={`mb-1.5 px-1 text-[11px] font-medium uppercase tracking-[0.16em] text-muted-foreground ${isUser ? "text-right" : "text-left"}`}>
          {isUser ? "You" : "Lattice"}
        </p>
        <div
          className={
            isUser
              ? "rounded-[1.35rem] rounded-br-md bg-primary px-4 py-3 text-sm leading-6 text-primary-foreground shadow-lg shadow-primary/15"
              : "min-h-12 rounded-[1.35rem] rounded-bl-md border bg-card px-4 py-3 text-sm leading-6 shadow-sm"
          }
        >
          {message.content}
          {streaming ? (
            <span className="ml-1 inline-block h-4 w-0.5 animate-pulse rounded-full bg-primary align-middle" aria-hidden="true" />
          ) : null}
        </div>
      </div>
      {isUser ? (
        <div className="grid size-8 shrink-0 place-items-center rounded-xl bg-secondary text-secondary-foreground">
          <UserRound className="size-4" aria-hidden="true" />
        </div>
      ) : null}
    </article>
  );
}

export default function Chat() {
  const [messages, setMessages] = useState<Message[]>([]);
  const [input, setInput] = useState("");
  const [loading, setLoading] = useState(false);
  const [error, setError] = useState<string | null>(null);
  const [controller, setController] = useState<AbortController | null>(null);
  const bottomRef = useRef<HTMLDivElement>(null);

  useEffect(() => {
    bottomRef.current?.scrollIntoView({ behavior: "smooth" });
  }, [messages]);

  function appendToken(token: string) {
    setMessages((prev) => {
      const next = [...prev];
      const last = next[next.length - 1];
      next[next.length - 1] = { ...last, content: last.content + token };
      return next;
    });
  }

  function stop() {
    controller?.abort();
  }

  async function sendMessage(e: FormEvent) {
    e.preventDefault();
    const text = input.trim();
    if (!text || loading) return;

    setMessages((prev) => [
      ...prev,
      { role: "user", content: text },
      { role: "assistant", content: "" },
    ]);
    setInput("");
    setLoading(true);
    setError(null);

    const controller = new AbortController();
    setController(controller);

    try {
      const res = await fetch(`${API_BASE}/chat`, {
        method: "POST",
        headers: { "Content-Type": "application/json" },
        body: JSON.stringify({ message: text }),
        signal: controller.signal,
      });
      if (!res.ok || !res.body) throw new Error();

      const reader = res.body.getReader();
      const decoder = new TextDecoder();
      let buffer = "";

      while (true) {
        const { value, done } = await reader.read();
        if (done) break;
        buffer += decoder.decode(value, { stream: true });
        const parts = buffer.split("\n\n");
        buffer = parts.pop() ?? "";
        for (const part of parts) {
          if (!part.startsWith("data: ")) continue;
          const envelope = JSON.parse(part.slice(6));
          if (envelope.type === "token") appendToken(envelope.content);
        }
      }
    } catch (err) {
      if ((err as Error).name !== "AbortError") {
        setError("Could not reach the backend. Is it running on :8000?");
      }
    } finally {
      setLoading(false);
      setController(null);
    }
  }

  return (
    <main className="relative min-h-dvh overflow-hidden p-3 sm:p-5 lg:p-7">
      <Card className="relative mx-auto flex h-[calc(100dvh-1.5rem)] min-h-[34rem] max-w-6xl flex-col gap-0 overflow-hidden rounded-[1.75rem] border-foreground/10 bg-card/95 py-0 shadow-2xl shadow-slate-950/10 ring-1 ring-foreground/10 backdrop-blur-xl sm:h-[calc(100dvh-2.5rem)] lg:h-[calc(100dvh-3.5rem)]">
        <header className="flex h-20 shrink-0 items-center justify-between border-b px-4 sm:px-7">
          <Brand />
          <div className="flex items-center gap-2 rounded-full border bg-background/70 px-3 py-1.5 text-xs font-medium text-muted-foreground shadow-sm">
            <span className="size-2 rounded-full bg-emerald-500 ring-4 ring-emerald-500/15" />
            <span className="hidden sm:inline">Streaming enabled</span>
            <span className="sm:hidden">Live</span>
          </div>
        </header>

        <div
          className="chat-scrollbar flex-1 overflow-y-auto px-4 py-5 sm:px-7 sm:py-7"
          role="log"
          aria-live="polite"
          aria-relevant="additions text"
        >
          {messages.length === 0 ? (
            <EmptyState onPick={setInput} />
          ) : (
            <div className="mx-auto w-full max-w-3xl space-y-6">
              {messages.map((message, index) => (
                <ChatBubble
                  key={`${message.role}-${index}`}
                  message={message}
                  streaming={loading && message.role === "assistant" && index === messages.length - 1}
                />
              ))}
              <div ref={bottomRef} />
            </div>
          )}
        </div>

        <div className="shrink-0 border-t bg-card/80 p-3 sm:p-5">
          {error ? (
            <div
              role="alert"
              className="mx-auto mb-3 flex max-w-3xl items-start gap-2 rounded-xl border border-destructive/25 bg-destructive/10 px-3 py-2.5 text-sm text-destructive"
            >
              <CircleAlert className="mt-0.5 size-4 shrink-0" aria-hidden="true" />
              <span>{error}</span>
            </div>
          ) : null}
          <form
            onSubmit={sendMessage}
            className="mx-auto flex max-w-3xl items-center gap-2 rounded-2xl border bg-background/85 p-2 shadow-lg shadow-slate-950/5 transition focus-within:border-primary/45 focus-within:ring-4 focus-within:ring-primary/10"
          >
            <label htmlFor="chat-message" className="sr-only">
              Message Lattice
            </label>
            <Input
              id="chat-message"
              value={input}
              onChange={(e) => setInput(e.target.value)}
              placeholder="Message Lattice…"
              autoComplete="off"
              disabled={loading}
              className="h-11 flex-1 border-0 bg-transparent px-3 text-sm shadow-none focus-visible:ring-0"
            />
            {loading ? (
              <Button type="button" variant="outline" onClick={stop} aria-label="Stop response" className="h-11 rounded-xl px-4">
                <Square className="size-3.5 fill-current" aria-hidden="true" />
                <span className="hidden sm:inline">Stop</span>
              </Button>
            ) : (
              <Button
                type="submit"
                aria-label="Send message"
                disabled={!input.trim()}
                className="h-11 rounded-xl px-4 shadow-md shadow-primary/20"
              >
                <span className="hidden sm:inline">Send</span>
                <Send className="size-4" aria-hidden="true" />
              </Button>
            )}
          </form>
          <p className="mt-2 text-center text-[11px] text-muted-foreground">
            Responses stream from your local LangGraph workflow.
          </p>
        </div>
      </Card>
    </main>
  );
}

Tokens streaming into the assistant bubble behind a blinking caret, with Send swapping to Stop mid-stream.

What you built

Part 5

A streaming backend: /chat now returns a StreamingResponse that yields one SSE envelope per token, read straight off the graph with astream_events.
A wire format you designed on purpose: data: {type, content} envelopes framed by a blank line, built to carry Part 6's tool events without a parser rewrite.
A frontend that reads a stream: getReader() plus a buffer that splits on \n\n and keeps the unfinished tail, so a chunk boundary never breaks JSON.parse again.
The buffering bug met head-on: you know why parsing a raw chunk fails, and you know the blank-line fence is the fix.
A Stop button that truly cancels the work with an AbortController, plus a blinking caret and auto-scroll that make the whole thing feel alive.

Test yourself

Score ··

Why wrap each streamed token in an envelope like {'type': 'token', 'content': '...'} instead of sending the bare token text?

Your first reader loop does JSON.parse(chunk.replace('data: ', '')) on every read and throws Unexpected end of JSON input. What's actually wrong?

After splitting the buffer on \n\n, why do you parts.pop() and stash that last piece back in the buffer instead of parsing it?

Between WebSockets and Server-Sent Events, why does this chat use SSE for the model's reply?

The Part 3 graph didn't change at all to make streaming work. So what did?

The commit, from the project root, in any terminal that isn't hosting a server:

BASH

git add .
git commit -m "part 5: stream tokens over SSE with a Stop button"

Your bot answers fast now, but it only knows what the model already carries in its head. Ask it for today's news or to multiply two big numbers exactly, and it will cheerfully make something up. In Part 6 you'll hand it tools, real functions it can call mid-answer, and they'll ride the exact belt you just built.

The complete, tested code for this part lives in part-05-streaming in the companion repo. Code blocks with a GitHub icon link straight to the exact file; "View full file" shows the whole file in place with this section's changes highlighted.