Series · Claude Agent SDK in Production · Part 5 of 14
· 25 min read
Claude Agent SDK in Production, Part 5: Sessions: The Analyst Remembers
One line of backend code turns amnesia into memory. Then the SDK's own session store powers a conversations sidebar, renames, and forkable analyses, and Act I closes.
claude-agent-sdk · fastapi · nextjs · tutorial
Ask your analyst "Which store grew fastest between January and June?" and it answers: Riverside, 70.6% growth. Now type the follow-up that every real conversation contains: "Now chart that store's weekly numbers." In every part so far, that sentence was a coin flip pointed at a confused agent. Today the analyst filters store_id == "S04" without being told, because it remembers which store you were both talking about. The entire backend change that makes this work is one line. The rest of this part is discovering how much machinery the SDK already built while nobody was looking, and closing Act I with a product.
That sidebar on the left is new too, and here's the part that should raise an eyebrow: it lists conversations this server never stored. No database was added. No sessions table, no conversations.json, no ORM. The SDK has been keeping a diary of every conversation since your first query() in Part 1, and today we finally read it.
Break it first: the goldfish with a job title
Give the Part 4 analyst the follow-up cold, in a fresh conversation, and watch a very expensive goldfish at work:
This is Part 2's amnesia demo all over again, unchanged after three parts of building: every query() starts from zero, and the files have been the only memory the whole time. But read the agent's reply closely, because the failure got more interesting since Part 2. "Let me check my memory to see if there's relevant context": the claude_code preset we adopted in Part 4 knows conversations can have pasts. It's offering to do the exact thing we never wired up. The gap between that sentence and reality is precisely one option on ClaudeAgentOptions, and the SDK has been patiently waiting for us to pass it.
The diary, revisited
In Part 1 you glanced at ~/.claude/projects/ and moved on. Time to actually open it:
Three facts from that listing run today's whole agenda. One: every conversation is a JSONL file whose lines are the raw API messages: user turns, assistant turns with their tool calls, results. A complete transcript, written by the SDK whether or not you care (the concepts page has the anatomy). Two: files are named by session_id, the very id we've been dutifully emitting in session_start parcels since Part 2 and using for absolutely nothing. That bill comes due today. Three, and sneakiest: the store is sharded by working directory. The SDK files each diary under an escaped version of the cwd the agent ran in. In Part 1 that meant one shard for the whole app. But Part 4 gave every conversation its own workspace as cwd, which means every conversation now gets its own shard too. Hold that thought; it reshapes the sidebar section.
The memory switch
Here is the entire backend diff that turns amnesia into memory:
def build_options(workspace: Path, session_id: str | None) -> ClaudeAgentOptions: """Part 4's options plus one line: resume. Hand the SDK a session id and the new turn starts with the whole diary already in its head.""" return ClaudeAgentOptions( cwd=str(workspace), tools=["Read", "Glob", "Grep", "Bash", "Write"], permission_mode="bypassPermissions", model=MODEL, include_partial_messages=True, system_prompt={"type": "preset", "preset": "claude_code", "append": ANALYST_PROMPT}, resume=session_id, )resume=session_id tells the SDK: reload that diary, and let this turn arrive with everything in it already in context. Same session id, same file, the story continues. (resume's sibling continue_conversation reopens "whatever was most recent", which means nothing on a server handling many conversations; the concepts page draws the line.) With resume=None the behavior is exactly what you've had since Part 1: a fresh session. The request model grows the matching field:
class ChatRequest(BaseModel): message: str workspace_id: str | None = None session_id: str | None = None # the memory switch: absent = fresh startAnd the client plays the same echo game it learned in Part 4 with the workspace id, now for the second id that's been riding session_start all along:
} else if (event.type === "session_start") { // Both echoes, collected: the desk id (Part 4) and, at last, the // diary id we've been forwarding unused since Part 2. if (event.workspace_id) setWorkspaceId(event.workspace_id); setSessionId(event.session_id);First message: no session id, server starts fresh, session_start echoes the new id, client keeps it. Every later message sends it back, and the analyst remembers. Note what the server does not do: store anything. It's still a stateless translator between HTTP and the SDK; the state lives in the SDK's files, keyed by ids the client carries. That's the whole architecture.
Run the dessert now. "Which store grew fastest between January and June?" then "Now chart that store's weekly numbers." In my recorded run the first turn cost $0.0397 and took 21 seconds; the follow-up knew that store meant Riverside, charted its weekly revenue from $2,674 in week 1 to $7,693 in week 26, and cost $0.0220 in 18 seconds. Read those numbers again: the turn carrying a full conversation's memory was cheaper than the turn that started cold. Prompt caching makes the reloaded history nearly free, and the follow-up needed fewer tool calls because the conversation already knew where the data lived. Memory isn't a luxury feature with a surcharge; wired this way, it's a discount.
The sidebar, and the trap in the obvious API
A product with memory needs a list of conversations to come back to. The old-fashioned move is a sessions.json you maintain yourself; the SDK grew real session utilities while nobody was watching (list_sessions, get_session_info, get_session_messages, rename_session, fork_session), and hand-rolling an index now would be teaching a workaround. So the sidebar is SDK-native. But the obvious first call is a trap, and it's worth stepping in it deliberately.
list_sessions(), no arguments, on my machine: 415 sessions. Every Claude Code conversation I have ever had, on any project, anywhere on this laptop. The unscoped call reads the entire store, and a sidebar built on it would cheerfully leak the developer's whole working life into the product. This is the per-cwd sharding fact paying off: scoped with directory=, the same call returns only the diaries of that one folder. And since Part 4 made every conversation's cwd its own workspace, the app's conversations are exactly the union of its workspace shards:
rows = [] if not WORKSPACES_ROOT.is_dir(): return rows for ws in WORKSPACES_ROOT.iterdir(): if not ws.is_dir(): continue for s in list_sessions(directory=str(ws)): # first_prompt, not summary: the summary drifts to whatever the # LATEST turn was about, so two-turn chats kept renaming # themselves mid-conversation. The first question is the # stable name (until the user renames it). rows.append({ "session_id": s.session_id, "workspace_id": ws.name, "title": s.custom_title or s.first_prompt or "Untitled analysis", "last_modified": s.last_modified, # epoch milliseconds }) rows.sort(key=lambda r: r["last_modified"], reverse=True) return rowsThe walk is the design: our workspaces/ folder provides the scope, the SDK provides the data, and no third source of truth ever exists. Every row keeps its workspace_id next to its session_id, because reopening a conversation means restoring both the diary and the desk.
That comment about first_prompt earns its lines: the plan was to title rows with the SDK's summary field, until live testing showed the summary drifts. It tracks the newest turn, so after the follow-up question, both test conversations proudly renamed themselves "Now chart that store's weekly numbers." A sidebar where titles mutate mid-conversation feels haunted; first_prompt is stable, and custom_title (set by rename, below) beats both. Small field choice, real product behavior; this is why we test against the live SDK before writing prose.
The rail itself is plain React: rows with a title and a relative time, an active highlight, and two hover actions. The pencil swaps the row for an inline input; submitting calls the rename endpoint, which is rename_session wearing two lines of FastAPI (PATCH /conversations/{ws}/{sid} in the repo), lands a "Renamed." toast, and from then on custom_title wins the title contest:
<span className="absolute right-2 top-2 hidden gap-1.5 group-hover:flex"> <button type="button" title="Rename" onClick={() => { setEditing(c.session_id); setDraft(c.title); }} > <PencilIcon /> </button> <button type="button" title="Duplicate chat" onClick={() => onFork(c)}> <BranchIcon /> </button> </span>Replaying a conversation you weren't there for
Click a sidebar row and the transcript reappears, badges and all. The server rebuilds it from the diary with get_session_messages, and the mapping should feel like déjà vu, because it's Part 3's applyEvent run over the past instead of the present. Text extends the open turn. A tool_use block opens a tool block. A tool_result, which arrives inside a user message exactly as Part 1's anatomy warned, completes its block by id:
for m in get_session_messages(session_id, directory=str(workspace)): content = m.message.get("content") if m.type == "user": if isinstance(content, str): close_turn() messages.append({"role": "user", "text": content}) continue for block in content or []: if block.get("type") == "tool_result" and turn: for b in turn["blocks"]: if b.get("id") == block.get("tool_use_id"): raw = block.get("content") b["result"] = raw if isinstance(raw, str) else "" b["isError"] = bool(block.get("is_error")) b["done"] = True elif _text_of(block) is not None: close_turn() messages.append({"role": "user", "text": block["text"]})(The assistant half is eleven more lines in the file: text and tool_use blocks appended to the open turn, thinking blocks skipped until Part 10 renders them.) The payoff of Part 3's block model lands here: history and live streaming produce the same shape, so the frontend renders a replayed conversation with zero new components. The client's half restores the desk too, splitting the workspace's current files the same way Part 4 taught: plain files become chips, images and markdown become the artifacts panel:
async function openConversation(c: Conversation) { if (working) return; try { const { messages: history, files: desk } = await fetchConversation( c.workspace_id, c.session_id, ); setMessages(history as ChatMessage[]); setSessionId(c.session_id); setWorkspaceId(c.workspace_id); const all = desk as WorkspaceFile[]; setFiles(all.filter((f) => f.kind === "file").map((f) => f.path)); const outputs: Artifact[] = all .filter((f) => f.kind !== "file") .map((f) => ({ path: f.path, kind: f.kind, size: f.size, updatedAt: Date.now() })); setArtifacts(outputs); setSelectedArtifact(outputs[outputs.length - 1]?.path ?? null); } catch (err) { setToast({ message: (err as Error).message }); } }Refresh the page mid-thought and nothing is lost: the sidebar comes back from disk, one click restores transcript, desk, and memory, and the next message resumes the same session. That's the refresh-survival Part 3 could only apologize for. (The stream of a turn in flight still dies with the tab; that harder promise is Part 9's durable streams.)
The "New analysis" button is the inverse and it's exactly as boring as it should be: clear the messages, drop both ids, empty the chips and the panel. Fresh desk, fresh diary; the server mints new ones on the next message.
Duplicate chat: sessions are a tree
Here's a move no chat product taught you to expect, and the SDK gives it away. Say the Riverside analysis is at a good checkpoint and you want to explore "what if we drop the Airport store?" without wrecking the thread you have. Photocopy the diary:
@app.post("/conversations/{workspace_id}/{session_id}/fork")async def fork_conversation(workspace_id: str, session_id: str) -> dict: """Branch the analysis: a NEW session that inherits the whole history, while the original stays untouched. No model call, no cost; the SDK copies the diary and hands back a fresh id. The desk is shared.""" workspace = workspace_path(workspace_id) info = get_session_info(session_id, directory=str(workspace)) base = (info and (info.custom_title or info.first_prompt)) or "Analysis" result = fork_session(session_id, directory=str(workspace), title=f"{base} (branch)") return {"session_id": result.session_id, "workspace_id": workspace_id}fork_session is an offline operation: no model call, no tokens, $0.00, instant. The fork arrives with the entire history and a new id; the original never notices. I verified the inheritance the honest way, by asking the fork a question only the history could answer, and it answered from turns it was never present for. In the UI this is the branch icon, "Duplicate chat": the client calls the endpoint, opens the branch, and toasts "Duplicated. You're on the branch now." Both rows sit in the sidebar, both resumable forever, diverging from a shared past. A conversation is a line; fork makes it a tree.
One honest caveat, visible right in that figure: the fork copies the diary, not the desk. Both branches keep working in the same workspace folder, so if each branch charts something new, both panels see both charts, and a report.md written by one branch overwrites the other's. For "try a different angle on this analysis" that's usually what you want (same data, shared outputs). For true isolation you'd copy the workspace too and re-point the fork's cwd, and the session store's per-cwd sharding makes that a real project rather than a flag; production apps that need it keep conversation state in their own database, which is the honest ceiling of Act I (more below).
Here's the whole tree living in the product, after a rename and a duplicate, with the toast still up:
The bug the camera caught
Recording this part's demo video exposed a real bug that shipped in Parts 3 and 4, and it's too instructive to fix silently. The success toast from "Load sample data" stayed on screen for the entire 30-second agent run, self-dismiss timer be damned. The culprit is a classic React foot-gun:
// A stable identity, or the toast's 6s timer resets on every render // and a busy stream keeps the banner up forever (found on camera). const dismissToast = useCallback(() => setToast(null), []);The toast's useEffect lists onDismiss as a dependency, which is correct; the page was passing () => setToast(null) inline, which is a new function on every render. Normally you'd never notice. But a streaming turn re-renders the page dozens of times per second, so the effect re-ran constantly, and every re-run cancelled the 6-second timer and started a fresh one. The toast could only die if the app held still for six straight seconds, which a streaming app never does. One useCallback pins the identity and the timer finally runs to completion. The lesson travels: in streaming UIs, unstable identities turn from style nits into visible behavior, because "re-render" stops being an occasional event and becomes the weather.
The cost ritual
All real runs from building this part:
| Run | Result | Cost |
|---|---|---|
| Follow-up with no session (amnesia) | "which store... could you provide..." | $0.0113 · 4s |
| "Which store grew fastest?" (fresh) | Riverside +70.6%, chart + report | $0.0397 · 21s |
| "Now chart that store's weekly numbers." (resumed) | Riverside weekly chart, right store, no hints | $0.0220 · 18s |
| Duplicate chat (fork_session) | new id, full history inherited | $0.00 · instant |
| Rename | custom_title set | $0.00 · instant |
| Reopen conversation (history replay) | transcript + desk restored | $0.00 · one disk read |
The shape of this table is the part's thesis. The two paid rows are model turns; everything session-shaped (fork, rename, replay, the sidebar itself) is file work the SDK does for free. And the resumed turn undercuts the fresh one, because cached history is cheap and remembered context kills redundant exploration. Compare that with the LangGraph series, where memory meant building a checkpointer and wiring threads by hand: same capability, and the contrast is the point of choosing a crate engine.
Act I, the curtain
Five parts ago this was uv init and an empty folder. Count what's on the table now: a terminal agent that became a streaming HTTP service (Parts 1 and 2), a product UI where every tool call is visible and expandable (Part 3), per-conversation workspaces with uploads, a hardened file pipeline, and an artifacts panel collecting charts and reports (Part 4), and now memory, a conversations sidebar, renames, and branching, all riding the SDK's own session store (Part 5). That is a complete AI data analyst, and its architecture is worth saying out loud one time: a stateless FastAPI translator, an event vocabulary that has absorbed two extensions without breaking a parser, and state that lives entirely in files the SDK manages.
Honesty about the ceiling: everything is per-machine. The diaries live in this computer's home directory, the workspaces on its disk; two servers would mean two split brains, and there's no login, so every visitor is the same "user". The production reference app this series shadows keeps conversations in Postgres with real auth for exactly these reasons. Act I's job was works on my machine, works well; the gap between that and production is not more features, it's trust: what the agent may do, who approves it, what gets logged, and what survives a crash. That's Act II: a real database behind a custom tool (Part 6), risky commands pausing for a human Approve click (Part 7), hooks that see every tool call and write the audit log (Part 8), and streams that survive a refresh mid-turn (Part 9). The scissors from Part 1 finally get taken away, one guardrail at a time.
What you built
Part 5- Conversation memory in one line: resume=session_id reloads the SDK's diary, the client echoes the id back each turn, and the server stays stateless.
- A conversations sidebar with zero new storage: our workspaces folder scopes list_sessions(directory=...) per shard, dodging the unscoped call that returns every session on the machine.
- History replay through the same block model as live streaming: get_session_messages mapped by the applyEvent rules, so old conversations render with zero new components.
- Rename and Duplicate chat riding rename_session and fork_session: forks inherit the whole history for zero dollars, and sessions become a tree.
- Act I complete: a working AI data analyst with streaming, tool visibility, workspaces, artifacts, and memory, all state in SDK-managed files.
Test yourself
Why does the sidebar call list_sessions(directory=ws) once per workspace instead of list_sessions() once?
What does resume=session_id actually change about a query() call?
What does fork_session give you, and what does it cost?
Why do sidebar rows use first_prompt (or custom_title) instead of the SDK's summary field?
When history() replays a session, where do tool results come from and how do they reach the right badge?
Commit it, from the project root:
git add backend frontendgit commit -m "part 5: sessions - memory, sidebar, rename, and fork"Your analyst remembers, but it still trusts everyone and everything: it runs whatever shell command it likes on Part 1's scissors, and its only data source is whatever CSVs land on the desk. In Part 6 Beanline's live numbers move into a real SQLite database and the analyst gets its first custom tool to query it, read-only by construction, which is the first rung of Act II's safety ladder.
The complete, tested code for this part lives in part-05-sessions in the companion repo. Code blocks with a GitHub icon link straight to the exact file; "View full file" shows the whole file in place with this section's changes highlighted.