Series · Claude Agent SDK in Production · Reference
· 9 min read
Agent SDK Concepts, in Plain Words
The reference page for the Claude Agent SDK series: every recurring idea, defined once, linked from wherever it first appears.
claude-agent-sdk · reference
How to read this page: don't. Not top to bottom, anyway. It exists so that every part of Claude Agent SDK in Production can say "chart onto this idea" with a link instead of a re-explanation. Follow a link in, read one entry, go back to what you were building. Entries are added as the series grows; if an idea from a later act isn't here yet, its part hasn't shipped.
One more routing note: this series assumes you can write a FastAPI endpoint and a React component. If you're earlier in the journey than that, LangGraph from Scratch teaches those fundamentals from zero, and it's the better place to start.
Agent loop
The loop that makes something "agentic" rather than a one-shot prompt: think, act, observe, repeat. The model reads the conversation, decides it needs information or wants to change something, calls a tool, reads the tool's result, and decides again. It exits the loop when it judges the job done and writes its answer.
The important part is who owns the loop. With a raw LLM API, you do: you parse the model's tool request, run the tool, append the result, call the API again, and handle every edge case in between. With the Agent SDK, the loop ships in the box, the same one Claude Code runs. You start a turn with one call and receive a narrated stream of everything the loop did.
A useful mental model from the series: the agent is an intern with a toolbox who keeps working until the job's done. You don't schedule the intern's every move; you give them a task, a desk, and rules about what they're allowed to touch.
Agent SDK vs Messages API
Anthropic ships two ways to build on Claude. The Messages API is the raw model: you send messages, you get one response, and everything else (tool execution, retries, context management, the loop above) is your code. The Agent SDK (claude-agent-sdk on PyPI) is the full agent runtime extracted from Claude Code: built-in tools that read and write real files and run real shell commands, session persistence, permission enforcement, hooks, subagents.
The trade is control for capability. Ten lines of SDK code replace a few hundred lines of loop-and-plumbing code, but the SDK decides things the raw API would leave to you (how tools execute, how context compacts, what the system prompt scaffolding looks like). When you need a single completion inside ordinary software, use the Messages API. When you're building a thing that does work on a computer, the SDK is the shortcut that happens to also be the production-hardened path.
Built-in tools
The SDK's tools are the agent's hands, and they're real: Read, Write, and Edit touch actual files, Bash runs actual shell commands, Glob and Grep search the filesystem, WebSearch and WebFetch reach the internet. There is no simulation layer. When the analyst in this series "runs its own pandas", the Bash tool spawned a real Python process on your machine.
You choose the toolbox per run with ClaudeAgentOptions(tools=[...]): only the tools you list exist for that agent. A smaller toolbox is safer, cheaper (tool definitions ride in the prompt), and easier to reason about. Later in the series the toolbox grows custom entries via MCP servers with the same ergonomics.
Permission modes
Every tool call passes a permission check before it runs, and permission_mode sets the default posture. The modes you'll meet, from most guarded to least:
default: tools that aren't explicitly allowed require approval; with nobody wired up to approve, they're refused.plan: the agent may only read and explore; instead of acting, it produces a plan.acceptEdits: file edits inside the workspace are auto-approved; everything else still asks.dontAsk: skips the asking; anything not explicitly allowed is denied outright.bypassPermissions: everything is allowed, no questions asked. The agent can do whatever the process's user can do.
Think of it as how much you trust a new employee: shadowing only, supervised, or keys to the building. The series runs Act I on bypassPermissions pointed at a sandbox folder, names that trade-off loudly every time, and then spends Act II building the grown-up alternative: approvals routed to a human and hooks that see everything.
Sessions and JSONL
The SDK writes a transcript of every conversation to disk whether or not you ever read it: one JSONL file per session under ~/.claude/projects/<escaped-working-directory>/<session_id>.jsonl, where each line is a JSON record of one event (a user message, an assistant message, a tool result). This is the agent's diary, and it's the entire persistence story for Act I: no database, no schema, nothing to set up.
Two details matter in practice. First, sessions are sharded per working directory: the folder the agent ran in decides which project directory its diary lands in, so an app that gives every conversation its own workspace gets one shard per workspace. Second, the SDK ships utilities to use the diaries as data: list_sessions(), get_session_info(), get_session_messages(), and rename_session(). Part 5 builds a whole conversations sidebar out of them.
Resume vs continue
Two options continue an old conversation, and they answer different questions. resume="<session_id>" says continue this exact session: the SDK reloads that diary and the new turn arrives with full memory of everything in it. continue_conversation=True says continue the most recent session in this working directory, whatever it was; it's the SDK equivalent of "reopen my last chat".
Products use resume. A server handling many users can't mean anything by "the most recent conversation", but it can store each conversation's session_id and hand the right diary back every time. continue_conversation is a convenience for single-user, single-terminal workflows.
Fork session
resume with fork_session=True continues from an old session's full history but writes everything new into a fresh session with a new id, leaving the original untouched. Photocopy the diary mid-page and let two stories continue from the same past.
This makes sessions a tree, not a line. The practical use in this series: branch an analysis to try a different angle ("what if we exclude the airport store?") without contaminating, or losing, the original thread. Both branches remain resumable forever.
System prompt presets
The SDK's default system prompt is minimal. If you want the battle-tested prompt that makes Claude Code good at multi-step tool work (how to search before editing, when to re-read files, how to recover from errors), you opt in with a preset and append your own instructions on top:
system_prompt={"type": "preset", "preset": "claude_code", "append": YOUR_RULES}The pattern worth stealing: production apps append rather than replace. Thousands of hours of prompt-hardening live in the preset; your append supplies only what's unique to your product (for our analyst: save charts as PNG, write findings to report.md, prefer tables over prose). Replacing the whole prompt means re-earning all of that hardening yourself.
Partial messages
By default the SDK yields whole messages: you hear nothing while the model writes a paragraph, then receive the finished AssistantMessage. Setting include_partial_messages=True adds StreamEvent objects between them, carrying the raw token-by-token stream from the underlying API: content_block_delta events whose text_delta payloads are the individual word-fragments as they're generated.
The layering is the thing to understand: partial events add granularity, they don't replace the messages. You still receive the complete AssistantMessage afterward, so a translator can render deltas live and use the full message as the authoritative record. Part 2 wires exactly that.
SSE
Server-sent events: the plain-HTTP way to stream. The response stays open and the server writes messages shaped data: {...}\n\n as they happen; the blank line is the delimiter. One conveyor belt, labeled parcels. It's the same wire format the LangGraph series used, and this series only ever adds new label types to the belt.
This entry stays short on purpose: LangGraph Part 5 teaches SSE from zero, including the classic buffering bug in the browser-side parser. If SSE is new, read that; this series assumes it.
Event source
EventSource is the browser's built-in SSE client: point it at a URL and it fires an event per message, reconnecting automatically when the connection drops. The catch: it only speaks GET, with no request body, which is why chat apps that POST a message and stream the answer usually parse the stream by hand with fetch and a reader instead (LangGraph Part 5 builds that parser).
This series starts with the fetch-reader for the same reason, then switches to EventSource in Part 9, at the exact moment streams become resumable GETs and the automatic reconnection (with its Last-Event-ID header) becomes the feature we want.
Cost and tokens
An agent turn is not one model call. Every think-act-observe cycle is a fresh API call carrying the conversation so far, so tokens multiply with every tool the agent reaches for: a six-step analysis can easily bill twenty times the tokens of its final answer. Two things keep this sane. Prompt caching makes the repeated context cheap (you'll see it as cache_read_input_tokens in usage data). And the SDK tells you the damage: every run ends with a ResultMessage whose total_cost_usd is the real, computed cost of the whole turn, with a usage dict beside it.
The series ritual: print total_cost_usd after every run, from Part 1's first hello onward, so cost stays a number you watch rather than a surprise you get. In Part 13 the ritual becomes policy with max_budget_usd, which stops a run that spends past its limit. One subscription note: if you authenticate with a Claude subscription login instead of an API key, runs draw on your plan's usage rather than billing per token, but total_cost_usd still reports what the turn would cost, which keeps the numbers comparable.