Claude Agent SDK in Production, Part 5: Sessions: The Analyst Remembers

Ask your analyst "Which store grew fastest between January and June?" and it answers: Riverside, 70.6% growth. Now type the follow-up that every real conversation contains: "Now chart that store's weekly numbers." In every part so far, that sentence was a coin flip pointed at a confused agent. Today the analyst filters store_id == "S04" without being told, because it remembers which store you were both talking about. The entire backend change that makes this work is one line. The rest of this part is discovering how much machinery the SDK already built while nobody was looking, and closing Act I with a product.

The Beanline Analyst with a new conversations sidebar on the left. The chat shows the end of a first answer: Riverside grew fastest with 70.6% growth, from $19.7K to $33.5K, with a receipt of $0.0391 and 36 seconds. Below it the user asks: now chart that store's weekly numbers. The agent runs one Bash badge and answers with Riverside's weekly revenue, peaking at $8,070 in week 26, for $0.0114 and 16 seconds. The artifacts panel lists report.md and two charts, previewing riverside_weekly.png, a rising bar chart. — The two-message demo, from a real run. Nobody told the second turn which store. The receipt detail worth savoring: the turn WITH full memory cost less than a fifth of the first one.

That sidebar on the left is new too, and here's the part that should raise an eyebrow: it lists conversations this server never stored. No database was added. No sessions table, no conversations.json, no ORM. The SDK has been keeping a diary of every conversation since your first query() in Part 1, and today we finally read it.

Break it first: the goldfish with a job title

Give the Part 4 analyst the follow-up cold, in a fresh conversation, and watch a very expensive goldfish at work:

The amnesia run, verbatim, $0.0113. Note the last sentence: it offers to check a memory it does not have. The offer is real; the wiring is what's missing, and it's today's work.

This is Part 2's amnesia demo all over again, unchanged after three parts of building: every query() starts from zero, and the files have been the only memory the whole time. But read the agent's reply closely, because the failure got more interesting since Part 2. "Let me check my memory to see if there's relevant context": the claude_code preset we adopted in Part 4 knows conversations can have pasts. It's offering to do the exact thing we never wired up. The gap between that sentence and reality is precisely one option on ClaudeAgentOptions, and the SDK has been patiently waiting for us to pass it.

The diary, revisited

In Part 1 you glanced at ~/.claude/projects/ and moved on. Time to actually open it:

A dark terminal. Listing ~/.claude/projects filtered to part-05 shows two directories whose names end in the workspace ids from the running app, one shard per working directory. Listing inside one shard shows two JSONL files, the conversation and its fork. The first three lines of a session file show queue-operation records and then a user record whose content begins: which store grew fastest. A comment notes that every turn since Part 1 wrote a file like this and nobody was reading it. — The session store, on disk, from this part's real runs. One folder per working directory, one JSONL per conversation, one JSON record per event.

Three facts from that listing run today's whole agenda. One: every conversation is a JSONL file whose lines are the raw API messages: user turns, assistant turns with their tool calls, results. A complete transcript, written by the SDK whether or not you care (the concepts page has the anatomy). Two: files are named by session_id, the very id we've been dutifully emitting in session_start parcels since Part 2 and using for absolutely nothing. That bill comes due today. Three, and sneakiest: the store is sharded by working directory. The SDK files each diary under an escaped version of the cwd the agent ran in. In Part 1 that meant one shard for the whole app. But Part 4 gave every conversation its own workspace as cwd, which means every conversation now gets its own shard too. Hold that thought; it reshapes the sidebar section.

The memory switch

Here is the entire backend diff that turns amnesia into memory:

backend/app/main.py

def build_options(workspace: Path, session_id: str | None) -> ClaudeAgentOptions:
    """Part 4's options plus one line: resume. Hand the SDK a session id
    and the new turn starts with the whole diary already in its head."""
    return ClaudeAgentOptions(
        cwd=str(workspace),
        tools=["Read", "Glob", "Grep", "Bash", "Write"],
        permission_mode="bypassPermissions",
        model=MODEL,
        include_partial_messages=True,
        system_prompt={"type": "preset", "preset": "claude_code", "append": ANALYST_PROMPT},
        resume=session_id,
    )

resume=session_id tells the SDK: reload that diary, and let this turn arrive with everything in it already in context. Same session id, same file, the story continues. (resume's sibling continue_conversation reopens "whatever was most recent", which means nothing on a server handling many conversations; the concepts page draws the line.) With resume=None the behavior is exactly what you've had since Part 1: a fresh session. The request model grows the matching field:

backend/app/main.py

class ChatRequest(BaseModel):
    message: str
    workspace_id: str | None = None
    session_id: str | None = None  # the memory switch: absent = fresh start

And the client plays the same echo game it learned in Part 4 with the workspace id, now for the second id that's been riding session_start all along:

frontend/app/page.tsx

        } else if (event.type === "session_start") {
          // Both echoes, collected: the desk id (Part 4) and, at last, the
          // diary id we've been forwarding unused since Part 2.
          if (event.workspace_id) setWorkspaceId(event.workspace_id);
          setSessionId(event.session_id);

First message: no session id, server starts fresh, session_start echoes the new id, client keeps it. Every later message sends it back, and the analyst remembers. Note what the server does not do: store anything. It's still a stateless translator between HTTP and the SDK; the state lives in the SDK's files, keyed by ids the client carries. That's the whole architecture.

Run the dessert now. "Which store grew fastest between January and June?" then "Now chart that store's weekly numbers." In my recorded run the first turn cost $0.0397 and took 21 seconds; the follow-up knew that store meant Riverside, charted its weekly revenue from $2,674 in week 1 to $7,693 in week 26, and cost $0.0220 in 18 seconds. Read those numbers again: the turn carrying a full conversation's memory was cheaper than the turn that started cold. Prompt caching makes the reloaded history nearly free, and the follow-up needed fewer tool calls because the conversation already knew where the data lived. Memory isn't a luxury feature with a surcharge; wired this way, it's a discount.

Comic in three panels. Panel one: Yad, a bearded developer with headphones, leans on the analyst's desk and says: now chart that store. Panel two: the laptop analyst stands at a bookshelf of identical worn diaries, pulling out one labeled US and reading it intently. Panel three: the analyst, diary in hand, says warmly: ah yes, Tuesday, you prefer bar charts. Yad looks touched but slightly scared, hand on chest, one sweat drop. — resume, illustrated. The diary was always being written; the switch decides whether anyone opens it before speaking.

A product with memory needs a list of conversations to come back to. The old-fashioned move is a sessions.json you maintain yourself; the SDK grew real session utilities while nobody was watching (list_sessions, get_session_info, get_session_messages, rename_session, fork_session), and hand-rolling an index now would be teaching a workaround. So the sidebar is SDK-native. But the obvious first call is a trap, and it's worth stepping in it deliberately.

list_sessions(), no arguments, on my machine: 415 sessions. Every Claude Code conversation I have ever had, on any project, anywhere on this laptop. The unscoped call reads the entire store, and a sidebar built on it would cheerfully leak the developer's whole working life into the product. This is the per-cwd sharding fact paying off: scoped with directory=, the same call returns only the diaries of that one folder. And since Part 4 made every conversation's cwd its own workspace, the app's conversations are exactly the union of its workspace shards:

backend/app/sessions.py

    rows = []
    if not WORKSPACES_ROOT.is_dir():
        return rows
    for ws in WORKSPACES_ROOT.iterdir():
        if not ws.is_dir():
            continue
        for s in list_sessions(directory=str(ws)):
            # first_prompt, not summary: the summary drifts to whatever the
            # LATEST turn was about, so two-turn chats kept renaming
            # themselves mid-conversation. The first question is the
            # stable name (until the user renames it).
            rows.append({
                "session_id": s.session_id,
                "workspace_id": ws.name,
                "title": s.custom_title or s.first_prompt or "Untitled analysis",
                "last_modified": s.last_modified,  # epoch milliseconds
            })
    rows.sort(key=lambda r: r["last_modified"], reverse=True)
    return rows

The walk is the design: our workspaces/ folder provides the scope, the SDK provides the data, and no third source of truth ever exists. Every row keeps its workspace_id next to its session_id, because reopening a conversation means restoring both the diary and the desk.

Diagram titled: where the diaries live. On the left, the server's backend/workspaces folder with two workspace directories holding CSVs and generated charts. Dashed arrows labeled one shard per cwd point to the right side, the SDK's ~/.claude/projects folder, where each workspace has its own directory of JSONL session files, one marked as the conversation and one as its fork. Below, a highlighted box titled the sidebar walk shows the two-line loop calling list_sessions with directory equals each workspace. A warning underneath notes that unscoped list_sessions returned every session on the machine, 415 conversations, including the developer's own Claude Code history. — Two worlds, joined by the cwd. The sidebar never queries anything it doesn't own: our folder list scopes the SDK's store.

That comment about first_prompt earns its lines: the plan was to title rows with the SDK's summary field, until live testing showed the summary drifts. It tracks the newest turn, so after the follow-up question, both test conversations proudly renamed themselves "Now chart that store's weekly numbers." A sidebar where titles mutate mid-conversation feels haunted; first_prompt is stable, and custom_title (set by rename, below) beats both. Small field choice, real product behavior; this is why we test against the live SDK before writing prose.

The rail itself is plain React: rows with a title and a relative time, an active highlight, and two hover actions. The pencil swaps the row for an inline input; submitting calls the rename endpoint, which is rename_session wearing two lines of FastAPI (PATCH /conversations/{ws}/{sid} in the repo), lands a "Renamed." toast, and from then on custom_title wins the title contest:

frontend/components/Sidebar.tsx

                <span className="absolute right-2 top-2 hidden gap-1.5 group-hover:flex">
                  <button
                    type="button"
                    title="Rename"
                    onClick={() => {
                      setEditing(c.session_id);
                      setDraft(c.title);
                    }}
                  >
                    <PencilIcon />
                  </button>
                  <button type="button" title="Duplicate chat" onClick={() => onFork(c)}>
                    <BranchIcon />
                  </button>
                </span>

Replaying a conversation you weren't there for

Click a sidebar row and the transcript reappears, badges and all. The server rebuilds it from the diary with get_session_messages, and the mapping should feel like déjà vu, because it's Part 3's applyEvent run over the past instead of the present. Text extends the open turn. A tool_use block opens a tool block. A tool_result, which arrives inside a user message exactly as Part 1's anatomy warned, completes its block by id:

backend/app/sessions.py

    for m in get_session_messages(session_id, directory=str(workspace)):
        content = m.message.get("content")
        if m.type == "user":
            if isinstance(content, str):
                close_turn()
                messages.append({"role": "user", "text": content})
                continue
            for block in content or []:
                if block.get("type") == "tool_result" and turn:
                    for b in turn["blocks"]:
                        if b.get("id") == block.get("tool_use_id"):
                            raw = block.get("content")
                            b["result"] = raw if isinstance(raw, str) else ""
                            b["isError"] = bool(block.get("is_error"))
                            b["done"] = True
                elif _text_of(block) is not None:
                    close_turn()
                    messages.append({"role": "user", "text": block["text"]})

(The assistant half is eleven more lines in the file: text and tool_use blocks appended to the open turn, thinking blocks skipped until Part 10 renders them.) The payoff of Part 3's block model lands here: history and live streaming produce the same shape, so the frontend renders a replayed conversation with zero new components. The client's half restores the desk too, splitting the workspace's current files the same way Part 4 taught: plain files become chips, images and markdown become the artifacts panel:

frontend/app/page.tsx

  async function openConversation(c: Conversation) {
    if (working) return;
    try {
      const { messages: history, files: desk } = await fetchConversation(
        c.workspace_id,
        c.session_id,
      );
      setMessages(history as ChatMessage[]);
      setSessionId(c.session_id);
      setWorkspaceId(c.workspace_id);
      const all = desk as WorkspaceFile[];
      setFiles(all.filter((f) => f.kind === "file").map((f) => f.path));
      const outputs: Artifact[] = all
        .filter((f) => f.kind !== "file")
        .map((f) => ({ path: f.path, kind: f.kind, size: f.size, updatedAt: Date.now() }));
      setArtifacts(outputs);
      setSelectedArtifact(outputs[outputs.length - 1]?.path ?? null);
    } catch (err) {
      setToast({ message: (err as Error).message });
    }
  }

Refresh the page mid-thought and nothing is lost: the sidebar comes back from disk, one click restores transcript, desk, and memory, and the next message resumes the same session. That's the refresh-survival Part 3 could only apologize for. (The stream of a turn in flight still dies with the tab; that harder promise is Part 9's durable streams.)

The "New analysis" button is the inverse and it's exactly as boring as it should be: clear the messages, drop both ids, empty the chips and the panel. Fresh desk, fresh diary; the server mints new ones on the next message.

Duplicate chat: sessions are a tree

Here's a move no chat product taught you to expect, and the SDK gives it away. Say the Riverside analysis is at a good checkpoint and you want to explore "what if we drop the Airport store?" without wrecking the thread you have. Photocopy the diary:

backend/app/main.py

@app.post("/conversations/{workspace_id}/{session_id}/fork")
async def fork_conversation(workspace_id: str, session_id: str) -> dict:
    """Branch the analysis: a NEW session that inherits the whole history,
    while the original stays untouched. No model call, no cost; the SDK
    copies the diary and hands back a fresh id. The desk is shared."""
    workspace = workspace_path(workspace_id)
    info = get_session_info(session_id, directory=str(workspace))
    base = (info and (info.custom_title or info.first_prompt)) or "Analysis"
    result = fork_session(session_id, directory=str(workspace), title=f"{base} (branch)")
    return {"session_id": result.session_id, "workspace_id": workspace_id}

fork_session is an offline operation: no model call, no tokens, $0.00, instant. The fork arrives with the entire history and a new id; the original never notices. I verified the inheritance the honest way, by asking the fork a question only the history could answer, and it answered from turns it was never present for. In the UI this is the branch icon, "Duplicate chat": the client calls the endpoint, opens the branch, and toasts "Duplicated. You're on the branch now." Both rows sit in the sidebar, both resumable forever, diverging from a shared past. A conversation is a line; fork makes it a tree.

Timeline diagram using the real session ids from this part's runs. A horizontal line for session 8664da65 shows turn one, grew fastest, turn two, chart that store, and later turns each labeled resume with the same id. From turn three a curved branch labeled fork_session, copies the diary, new id, no model call, zero dollars, drops to a second line for session caf6b788, which inherits turns one to three and continues with its own new turn labeled resume with the fork's id. A note explains both branches stay resumable and share the same desk, because the fork copies the diary, not the workspace. A serif caption reads: a conversation is a line until you photocopy the diary, then it's a tree. — The fork from this part's real runs, ids and all. Everything above the curve happened once and is shared; everything after it diverges.

One honest caveat, visible right in that figure: the fork copies the diary, not the desk. Both branches keep working in the same workspace folder, so if each branch charts something new, both panels see both charts, and a report.md written by one branch overwrites the other's. For "try a different angle on this analysis" that's usually what you want (same data, shared outputs). For true isolation you'd copy the workspace too and re-point the fork's cwd, and the session store's per-cwd sharding makes that a real project rather than a flag; production apps that need it keep conversation state in their own database, which is the honest ceiling of Act I (more below).

Comic in three panels. Panel one: Yad holds a worn diary over an office photocopier and wonders: what if we drop the airport store? Panel two: the photocopier flashes and two identical laptop analysts stand side by side, each holding a copy of the diary, waving at each other. Panel three: at two small desks, each analyst writes its own chapter four, one diary labeled keep airport, the other labeled no airport, while Yad watches with crossed arms. — fork_session in one office appliance. Same past on both copies; the futures are theirs to write.

Here's the whole tree living in the product, after a rename and a duplicate, with the toast still up:

The Beanline Analyst after renaming and duplicating a conversation. The sidebar lists three rows: Riverside growth deep-dive branch, active and one second old, the original Riverside growth deep-dive, and the older conversation titled by its first prompt. The chat shows the inherited history including the Riverside answer and weekly chart turn, the artifacts panel lists three files with store_growth.png previewed, and a green toast reads: Duplicated. You're on the branch now. — Rename gave the thread a human name; Duplicate gave it a branch. The branch opened with history it inherited, not history it lived.

The bug the camera caught

Recording this part's demo video exposed a real bug that shipped in Parts 3 and 4, and it's too instructive to fix silently. The success toast from "Load sample data" stayed on screen for the entire 30-second agent run, self-dismiss timer be damned. The culprit is a classic React foot-gun:

frontend/app/page.tsx

  // A stable identity, or the toast's 6s timer resets on every render
  // and a busy stream keeps the banner up forever (found on camera).
  const dismissToast = useCallback(() => setToast(null), []);

The toast's useEffect lists onDismiss as a dependency, which is correct; the page was passing () => setToast(null) inline, which is a new function on every render. Normally you'd never notice. But a streaming turn re-renders the page dozens of times per second, so the effect re-ran constantly, and every re-run cancelled the 6-second timer and started a fresh one. The toast could only die if the app held still for six straight seconds, which a streaming app never does. One useCallback pins the identity and the timer finally runs to completion. The lesson travels: in streaming UIs, unstable identities turn from style nits into visible behavior, because "re-render" stops being an occasional event and becomes the weather.

The cost ritual

All real runs from building this part:

Run	Result	Cost
Follow-up with no session (amnesia)	"which store... could you provide..."	$0.0113 · 4s
"Which store grew fastest?" (fresh)	Riverside +70.6%, chart + report	$0.0397 · 21s
"Now chart that store's weekly numbers." (resumed)	Riverside weekly chart, right store, no hints	$0.0220 · 18s
Duplicate chat (fork_session)	new id, full history inherited	$0.00 · instant
Rename	custom_title set	$0.00 · instant
Reopen conversation (history replay)	transcript + desk restored	$0.00 · one disk read

The shape of this table is the part's thesis. The two paid rows are model turns; everything session-shaped (fork, rename, replay, the sidebar itself) is file work the SDK does for free. And the resumed turn undercuts the fresh one, because cached history is cheap and remembered context kills redundant exploration. Compare that with the LangGraph series, where memory meant building a checkpointer and wiring threads by hand: same capability, and the contrast is the point of choosing a crate engine.

Act I, the curtain

Five parts ago this was uv init and an empty folder. Count what's on the table now: a terminal agent that became a streaming HTTP service (Parts 1 and 2), a product UI where every tool call is visible and expandable (Part 3), per-conversation workspaces with uploads, a hardened file pipeline, and an artifacts panel collecting charts and reports (Part 4), and now memory, a conversations sidebar, renames, and branching, all riding the SDK's own session store (Part 5). That is a complete AI data analyst, and its architecture is worth saying out loud one time: a stateless FastAPI translator, an event vocabulary that has absorbed two extensions without breaking a parser, and state that lives entirely in files the SDK manages.

Honesty about the ceiling: everything is per-machine. The diaries live in this computer's home directory, the workspaces on its disk; two servers would mean two split brains, and there's no login, so every visitor is the same "user". The production reference app this series shadows keeps conversations in Postgres with real auth for exactly these reasons. Act I's job was works on my machine, works well; the gap between that and production is not more features, it's trust: what the agent may do, who approves it, what gets logged, and what survives a crash. That's Act II: a real database behind a custom tool (Part 6), risky commands pausing for a human Approve click (Part 7), hooks that see every tool call and write the audit log (Part 8), and streams that survive a refresh mid-turn (Part 9). The scissors from Part 1 finally get taken away, one guardrail at a time.

A real run, recorded: sample data loaded, the growth question answered (Riverside, +70.6%), then the follow-up "Now chart that store's weekly numbers" charted the right store from memory, $2,674 in week 1 to $7,693 in week 26, all numbers verified against the dataset.

What you built

Part 5

Conversation memory in one line: resume=session_id reloads the SDK's diary, the client echoes the id back each turn, and the server stays stateless.
A conversations sidebar with zero new storage: our workspaces folder scopes list_sessions(directory=...) per shard, dodging the unscoped call that returns every session on the machine.
History replay through the same block model as live streaming: get_session_messages mapped by the applyEvent rules, so old conversations render with zero new components.
Rename and Duplicate chat riding rename_session and fork_session: forks inherit the whole history for zero dollars, and sessions become a tree.
Act I complete: a working AI data analyst with streaming, tool visibility, workspaces, artifacts, and memory, all state in SDK-managed files.

Test yourself

Score ··

Why does the sidebar call list_sessions(directory=ws) once per workspace instead of list_sessions() once?

What does resume=session_id actually change about a query() call?

What does fork_session give you, and what does it cost?

Why do sidebar rows use first_prompt (or custom_title) instead of the SDK's summary field?

When history() replays a session, where do tool results come from and how do they reach the right badge?

Commit it, from the project root:

BASH

git add backend frontend
git commit -m "part 5: sessions - memory, sidebar, rename, and fork"

Your analyst remembers, but it still trusts everyone and everything: it runs whatever shell command it likes on Part 1's scissors, and its only data source is whatever CSVs land on the desk. In Part 6 Beanline's live numbers move into a real SQLite database and the analyst gets its first custom tool to query it, read-only by construction, which is the first rung of Act II's safety ladder.

The complete, tested code for this part lives in part-05-sessions in the companion repo. Code blocks with a GitHub icon link straight to the exact file; "View full file" shows the whole file in place with this section's changes highlighted.

Break it first: the goldfish with a job title

The diary, revisited

The memory switch

The sidebar, and the trap in the obvious API

Replaying a conversation you weren't there for

Duplicate chat: sessions are a tree

The bug the camera caught

The cost ritual

Act I, the curtain

What you built