Series · Claude Agent SDK in Production · Part 1 of 14
· 27 min read
Claude Agent SDK in Production, Part 1: Setup and Your First Agent
One pip package contains the whole engine inside Claude Code. By the end of this page it's reading your CSVs, writing its own analysis code, and handing you the bill.
claude-agent-sdk · python · agents · tutorial
Here's a terminal transcript from the end of this part. I asked, in plain English, which store in a six-store coffee chain had the best March. An agent found three CSV files it had never seen, read them, wrote an awk one-liner to aggregate 11,000 rows, joined the store names in, and answered: Downtown, $51,319.60, beating the runner-up by $7,893.70. Every number is right. It took nine turns, 22 seconds, and 2.4 cents. The agent code is 56 lines, and you didn't write the hard part of any of them.
Where this series is going
This is Part 1 of Claude Agent SDK in Production, a fourteen-part series that builds an AI data analyst: upload a CSV, ask questions in English, get back charts and written reports. By the end of the series it looks like this, and it's running on a real server:
Act I (Parts 1 to 5) builds that product end to end: this part's terminal agent, then HTTP streaming, then the UI, then file uploads and artifacts, then memory. Act II makes it governable: custom tools, human approvals, audit hooks, streams that survive a page refresh. Act III makes it smart and shippable: plan mode, subagents, sandboxing, evals, and a real VM deployment.
One thing this series is not: a from-zero web tutorial. It assumes you can write a FastAPI endpoint and a React component. If you can't yet, that's fine, there's a whole series for you: LangGraph from Scratch teaches FastAPI, Next.js, and streaming from an empty folder, and this series links the exact part whenever it leans on one of those basics. Do that one first, then come back. This page, though, is pure Python; everyone's welcome.
What the Agent SDK actually is
Claude Code, the terminal agent people use for day-to-day programming work, runs on an engine: a loop that reads a request, decides which tool to use, runs it for real, reads the result, and goes again until the job is done. The Claude Agent SDK is that exact engine, shipped as a Python package. Not a re-implementation, not "inspired by": the pip wheel literally bundles the Claude Code runtime, so pip install claude-agent-sdk (or uv add) is the entire installation story. No Node setup, no separate CLI download.
That's a different proposition from calling a model API. With the Messages API you supervise every nail: parse the tool request, run the tool, append the result, call again. With the SDK you hire the contractor. You state the job; the loop, the tools, and the context management come with the package. This series spends fourteen parts on what that buys you and what it costs you, starting with the buying.
Setup, checklist style
The whole series was written and tested against pinned versions. When something behaves differently on your machine, check this table before blaming yourself:
| Tool | Version |
|---|---|
| Python | 3.13 |
| uv | 0.8.x |
| claude-agent-sdk | 0.2.110 (bundles Claude Code CLI 2.1.191) |
| Model | claude-haiku-4-5 |
| Node.js | 22 LTS (not needed until Part 3) |
Install uv if you don't have it (one curl command on their install page), then make the project. This series uses uv for all Python work: it creates the virtual environment, pins the lockfile, and runs scripts, all without you ever activating anything.
mkdir beanline-analyst && cd beanline-analystuv init backend --bare --python 3.13cd backenduv add "claude-agent-sdk==0.2.110" "fastapi==0.136.3" "uvicorn[standard]==0.49.0" "python-dotenv==1.1.1"--bare means "no starter files, thanks". FastAPI and uvicorn won't be touched until Part 2; installing everything now means the versions are locked together and you never interrupt a later part for a dependency. If uv itself is new to you, it replaces the python -m venv + pip install dance from LangGraph Part 1; uv run quietly uses the project's own environment every time.
Two ways to authenticate, pick one
The SDK needs a way to call Claude, and there are two:
Option A, you already pay for Claude. If Claude Code is installed and logged in on this machine, you're already done; the SDK picks up that login automatically, and usage draws on your subscription like any Claude Code session. If you have a subscription but not the CLI, install it and log in once: npm install -g @anthropic-ai/claude-code, then run claude and follow the login prompt. This is the path I used for every run in this series.
Option B, an API key. Create one at console.anthropic.com, and before anything else set a monthly budget of $5 under Settings → Limits. Then make it available to your shell:
export ANTHROPIC_API_KEY=sk-ant-your-actual-key-hereCosts in this series are real either way: the SDK reports the computed cost of every run even on a subscription, and I'll quote my actual numbers throughout.
Hello, agent
Eleven lines, no options, no model wiring, no message list:
import asyncio
from claude_agent_sdk import query
async def main(): async for message in query(prompt="What's 144 * 89?"): print(message)
asyncio.run(main())query() starts an agent turn and gives you back an async iterator. Everything the engine does arrives through that iterator as message objects, and for now we dump their raw repr to see what we're dealing with. Run it:
uv run python hello.pyThe answer is right (144 × 89 is 12,816), and three kinds of objects scrolled past: a SystemMessage handshake, an AssistantMessage with the answer, and a ResultMessage receipt. Yours may also show a few housekeeping events between them; ignore those for now, we'll formally meet the whole cast later this page.
But look at that receipt. $0.34. For one multiplication. Two things went wrong, both defaults: with no options, the SDK used the default model configured for Claude Code on my machine (a big one), and it loaded the full default toolbox, whose definitions ride along in every request. The fix for both is the options object, which is where the real agent starts.
Give it a workspace
An analyst needs something to analyze. Meet Beanline, the fictional specialty-coffee chain whose data the whole series runs on: six stores, twelve products, six months of daily sales. Three CSVs, about 11,000 rows, generated by a deterministic script so your copy is byte-for-byte identical to mine and every number in these posts reproduces on your machine.
mkdir workspace && cd workspaceBASE=https://raw.githubusercontent.com/yadneshSalvi/claude-agent-sdk-in-production/main/datacurl -sO $BASE/stores.csv -O $BASE/products.csv -O $BASE/sales.csvcd ..Peek at what arrived:
head -3 workspace/sales.csv# date,store_id,product_id,units,revenue# 2026-01-01,S01,P01,44,140.80# 2026-01-01,S01,P02,77,354.20The workspace/ folder is the agent's desk: the one place it works, with everything it needs on top. In Part 4 every conversation will get its own desk; for now there's one. Keeping the agent at a desk, rather than loose in your home directory, is more than tidiness. It's the first safety decision of the series, and it matters in about ninety seconds.
The options object
Now the real file. Create agent.py next to hello.py, starting with imports and configuration:
"""Part 1: a terminal data analyst. Ask it a question, watch it work."""
import asyncioimport sys
from claude_agent_sdk import ( AssistantMessage, ClaudeAgentOptions, ResultMessage, SystemMessage, TextBlock, ToolResultBlock, ToolUseBlock, UserMessage, query,)
MODEL = "claude-haiku-4-5"
OPTIONS = ClaudeAgentOptions( cwd="workspace", tools=["Read", "Glob", "Grep", "Bash", "Write"], permission_mode="bypassPermissions", model=MODEL,)Four decisions, each earning its keep:
model=MODEL: Haiku, the cheap end of the current lineup, as a constant so upgrading to a bigger model is a one-line change in Part 13 when we can measure whether it's worth it. Agent loops multiply tokens; default to cheap.cwd="workspace": the desk. Every relative path the agent touches resolves inside this folder.tools=[...]: the toolbox, cut down to five. The agent's tools are real:Readreads actual files,Bashruns actual shell commands. Listing exactly what the job needs makes the agent safer and cheaper, because every tool definition you don't ship is context you don't pay for. The full catalog is on the concepts page.permission_mode="bypassPermissions": the one that deserves its own section.
The permission wall
Before I explain bypassPermissions, watch what happens without it. I deleted that one line and ran the March question. Here's the run, and it's the most instructive failure in this part:
Read that ending again, because it's the most important sentence in Part 1: denied every path to actually compute, the agent eventually "calculated manually" and produced wrong numbers with full confidence. $59,278.80 is not Downtown's March revenue. Nothing crashed. No error was raised. A polite, plausible, wrong answer, delivered after burning 62 turns and 37 cents trying everything else first.
Here's what was happening. Every tool call passes a permission check, and permission_mode sets the posture. The default posture is interactive: harmless reads pass, and anything with side effects or arbitrary execution waits for a human to approve it. Claude Code has a human at the keyboard for exactly that. Our script has nobody wired up to answer, so every request for approval is refused, and the agent works around, and around, and around.
bypassPermissions removes the checkpoint entirely: every tool call is allowed, no questions asked. Which makes this the right moment for the series' recurring warning:
Put the permission_mode="bypassPermissions" line back. The wall run also quietly demonstrated something worth naming: the refusals went into the conversation and the agent adapted its plan each time. Feedback of any kind, even "no", is information the loop uses. That mechanism becomes the whole human-in-the-loop design in Act II.
Ask a real question
Now the rest of agent.py: a printing loop that turns the message stream into a readable trace, and a main you can pass questions to:
def first_value(tool_input: dict) -> str: """One short line describing a tool call, e.g. the command or file path.""" return next((str(v) for v in tool_input.values() if v), "")[:90]
async def main(prompt: str) -> None: async for message in query(prompt=prompt, options=OPTIONS): if isinstance(message, SystemMessage) and message.subtype == "init": print(f"[session {message.data['session_id']}]\n") elif isinstance(message, AssistantMessage): for block in message.content: if isinstance(block, TextBlock): print(block.text) elif isinstance(block, ToolUseBlock): print(f" -> {block.name} {first_value(block.input)}") elif isinstance(message, UserMessage) and isinstance(message.content, list): for block in message.content: if isinstance(block, ToolResultBlock) and block.is_error: print(f" !! {str(block.content)[:120]}") elif isinstance(message, ResultMessage): print(f"\n[{message.num_turns} turns · ${message.total_cost_usd:.4f}]")Don't sweat every branch yet; the next section names each message type properly. The shape is the point: one async for, a small isinstance ladder, and each kind of message becomes one line of output. Text prints as prose, tool calls print as -> lines, failed tool results print as !! lines, and the receipt prints last. Finish the file:
if __name__ == "__main__": question = " ".join(sys.argv[1:]) or ( "Which store had the highest total revenue in March? Give the store's " "name, and how much it beat the runner-up by." ) asyncio.run(main(question))And run it:
uv run python agent.pyThis is the moment the series is named after, so let it land. You did not tell it there were three files. You did not tell it the schema, or that revenue lives in sales.csv but names live in stores.csv, or that March means filtering a date column. It discovered all of that, wrote an awk aggregation I would have had to look up the syntax for, joined the names, and did the subtraction. And when it guessed a wrong file path early on, it read the error message and fixed its own mistake, the same think-act-observe loop, running on a failure instead of a success.
One honest note: the plan for this part said "watch it write pandas", and instead my agent reached for awk. It picks its own tools, and for an 11,000-row aggregation a shell one-liner is the sensible choice; that's the intern with a toolbox showing judgment, not a bug to fix. When the questions get heavier (charts in Part 4, whole reports), it reaches for Python without being asked. Ask it something yourself and see what it picks:
uv run python agent.py "Which product category makes the most money on weekends?"Anatomy of the message stream
You've now seen the stream twice: raw reprs in hello.py, and through a printing loop in agent.py. Time to learn it properly, because this stream is the series' raw material. Part 2 translates it into server-sent events; Part 3 renders it as UI; Part 7 pauses it for approvals. Four message types carry everything:
The two shapes worth staring at:
AssistantMessage.content is a list of blocks, not a string. One assistant message can carry prose (TextBlock) and an action (ToolUseBlock with a name, an input dict, and an id) side by side. Hold onto that: when we design the chat UI's data model in Part 3, assistant turns will be sequences of blocks for exactly this reason, and the design will feel inevitable instead of clever.
Tool results arrive inside a UserMessage. That surprises everyone once. From the model's point of view there are only two speakers: itself, and the world. Your typed question and a tool's output are both "the world said something", so the awk output rides back wearing a user badge, as a ToolResultBlock whose tool_use_id points at the ToolUseBlock that asked for it. That id-matching is how you'll pair spinners with results in the UI, two parts from now.
The is_error field on a tool result is True for failures like our wrong-path Read, and the model reads those failures like any other input. And the final ResultMessage is the receipt: num_turns, duration, usage token counts, the final text, and total_cost_usd. Which brings us to a habit.
The cost ritual
Every part of this series ends its runs by printing total_cost_usd, and every part quotes the real measured numbers. Here's today's ledger, from my actual runs:
| Run | What | Cost |
|---|---|---|
hello.py | default model, default toolbox | $0.3418 |
| Wall run | no permission mode, 62 flailing turns | $0.3668 |
| The March question | Haiku, five tools, nine turns | $0.0240 |
The spread is the lesson. Same SDK, same machine, same kind of question: a 14x difference, decided entirely by configuration. Model choice dominates, toolbox size compounds (every tool definition ships with every request), and failure modes burn turns. Agent cost is a product decision you make in code, which is why the habit starts on day one and why Part 13 graduates it from watching costs to enforcing budgets with max_budget_usd. The mechanics of how turns are billed live on the concepts page.
You know the version of this lesson from the wild: the demo that cost pennies all week, the "let's just run it on the full dataset" Friday, and the Saturday email from billing. Agents don't make that story worse, but they do make it faster. Print the receipt.
The diary the SDK keeps
One more discovery before we ship this part, and it costs nothing to look at. The SDK has been keeping records this whole time:
Every session you've run today is on disk, as one JSONL file per session, filed under the working directory it ran in. That [session ...] id our script prints is the filename. We're not going to use the diary yet, but notice what it means: conversation persistence already exists, for free, before we've written a single line of storage code. Part 5 turns this folder into full conversation memory and a sessions sidebar, mostly by asking the SDK to read its own diary back.
Wrap it in git
Standard series ritual, from the beanline-analyst/ project root. The .gitignore matters more than usual because agents create files when you're not looking:
.venv/__pycache__/.env.env.localnode_modules/workspaces/.DS_Storeworkspaces/ (plural) doesn't exist yet; it's the Part 4 per-conversation desks, ignored in advance so no agent-generated scratch work ever lands in a commit. Then:
git initgit add .git commit -m "part 1: a terminal analyst that answers from real data"Glance at git status before that commit, today and every day: if .env or anything you didn't recognize is staged, stop and fix the ignore file first.
What you built
Part 1- A working agent in 56 lines:
query()plusClaudeAgentOptions, with the think-act-observe loop, real tools, and context management all inherited from the Claude Code engine. - A configured toolbox:
cwdas the agent's desk,toolscut to the five the job needs,MODELpinned to Haiku, and the 14x cost spread that justifies all three. - The message stream, decoded: init handshake with a
session_id,AssistantMessageas a list of blocks, tool results arriving inUserMessagewearing a user badge, and theResultMessagereceipt. - A working knowledge of the permission wall: what
bypassPermissionstrades away, why the default mode refuses a headless agent, and the scissors debt Parts 7 and 8 repay. - The cost ritual: print
total_cost_usdevery run, quote real numbers, never be surprised by a bill again.
Test yourself
Your agent calls the Bash tool and the command's output comes back in the stream. What object carries it?
Run without any permission_mode, our headless agent produced a confident, wrong answer. What actually happened?
What's the difference between tools=['Read', ...] and bypassPermissions in ClaudeAgentOptions?
Where did the March conversation end up after the run finished?
hello.py cost $0.34 and the March analysis cost $0.024. What explains a trivial question costing 14x more than a real one?
Your analyst lives in a terminal and answers one question per process. Nothing streams, nothing remembers, and only you can use it. In Part 2 it goes behind a URL: FastAPI in front, and the message stream you decoded today translated into a server-sent-event vocabulary that the next twelve parts extend without ever breaking a client.
Every part of this series has a companion folder in the claude-agent-sdk-in-production repo: the complete, tested project exactly as it exists at the end of that part. This part's folder is part-01-first-agent. Code blocks with a GitHub icon in the header link straight to the exact file, and "View full file" shows the whole file in place with this section's lines highlighted.