Claude Agent SDK in Production, Part 1: Setup and Your First Agent

Here's a terminal transcript from the end of this part. I asked, in plain English, which store in a six-store coffee chain had the best March. An agent found three CSV files it had never seen, read them, wrote an awk one-liner to aggregate 11,000 rows, joined the store names in, and answered: Downtown, $51,319.60, beating the runner-up by $7,893.70. Every number is right. It took nine turns, 22 seconds, and 2.4 cents. The agent code is 56 lines, and you didn't write the hard part of any of them.

Where this series is going

This is Part 1 of Claude Agent SDK in Production, a fourteen-part series that builds an AI data analyst: upload a CSV, ask questions in English, get back charts and written reports. By the end of the series it looks like this, and it's running on a real server:

A browser window with the finished Beanline analyst app: on the left a chat panel with a user question, two completed tool badges for Read and Bash, a resolved approval card for writing report.md, and a streaming answer; on the right an artifacts panel with a revenue bar chart and the start of a written report. — The destination. Chat on the left with live tool activity and an approval card; deliverables on the right as they're created. Everything in this window gets built across the series.

Act I (Parts 1 to 5) builds that product end to end: this part's terminal agent, then HTTP streaming, then the UI, then file uploads and artifacts, then memory. Act II makes it governable: custom tools, human approvals, audit hooks, streams that survive a page refresh. Act III makes it smart and shippable: plan mode, subagents, sandboxing, evals, and a real VM deployment.

One thing this series is not: a from-zero web tutorial. It assumes you can write a FastAPI endpoint and a React component. If you can't yet, that's fine, there's a whole series for you: LangGraph from Scratch teaches FastAPI, Next.js, and streaming from an empty folder, and this series links the exact part whenever it leans on one of those basics. Do that one first, then come back. This page, though, is pure Python; everyone's welcome.

What the Agent SDK actually is

Claude Code, the terminal agent people use for day-to-day programming work, runs on an engine: a loop that reads a request, decides which tool to use, runs it for real, reads the result, and goes again until the job is done. The Claude Agent SDK is that exact engine, shipped as a Python package. Not a re-implementation, not "inspired by": the pip wheel literally bundles the Claude Code runtime, so pip install claude-agent-sdk (or uv add) is the entire installation story. No Node setup, no separate CLI download.

That's a different proposition from calling a model API. With the Messages API you supervise every nail: parse the tool request, run the tool, append the result, call again. With the SDK you hire the contractor. You state the job; the loop, the tools, and the context management come with the package. This series spends fourteen parts on what that buys you and what it costs you, starting with the buying.

Setup, checklist style

The whole series was written and tested against pinned versions. When something behaves differently on your machine, check this table before blaming yourself:

Tool	Version
Python	3.13
uv	0.8.x
claude-agent-sdk	0.2.110 (bundles Claude Code CLI 2.1.191)
Model	`claude-haiku-4-5`
Node.js	22 LTS (not needed until Part 3)

Install uv if you don't have it (one curl command on their install page), then make the project. This series uses uv for all Python work: it creates the virtual environment, pins the lockfile, and runs scripts, all without you ever activating anything.

BASH

mkdir beanline-analyst && cd beanline-analyst
uv init backend --bare --python 3.13
cd backend
uv add "claude-agent-sdk==0.2.110" "fastapi==0.136.3" "uvicorn[standard]==0.49.0" "python-dotenv==1.1.1"

--bare means "no starter files, thanks". FastAPI and uvicorn won't be touched until Part 2; installing everything now means the versions are locked together and you never interrupt a later part for a dependency. If uv itself is new to you, it replaces the python -m venv + pip install dance from LangGraph Part 1; uv run quietly uses the project's own environment every time.

Two ways to authenticate, pick one

The SDK needs a way to call Claude, and there are two:

Option A, you already pay for Claude. If Claude Code is installed and logged in on this machine, you're already done; the SDK picks up that login automatically, and usage draws on your subscription like any Claude Code session. If you have a subscription but not the CLI, install it and log in once: npm install -g @anthropic-ai/claude-code, then run claude and follow the login prompt. This is the path I used for every run in this series.

Option B, an API key. Create one at console.anthropic.com, and before anything else set a monthly budget of $5 under Settings → Limits. Then make it available to your shell:

~/.zshrc (or ~/.bashrc)

export ANTHROPIC_API_KEY=sk-ant-your-actual-key-here

Costs in this series are real either way: the SDK reports the computed cost of every run even on a subscription, and I'll quote my actual numbers throughout.

Hello, agent

Eleven lines, no options, no model wiring, no message list:

backend/hello.py

import asyncio

from claude_agent_sdk import query


async def main():
    async for message in query(prompt="What's 144 * 89?"):
        print(message)


asyncio.run(main())

query() starts an agent turn and gives you back an async iterator. Everything the engine does arrives through that iterator as message objects, and for now we dump their raw repr to see what we're dealing with. Run it:

BASH

uv run python hello.py

A dark terminal running uv run python hello.py. Three message objects print: a SystemMessage with subtype init carrying a session_id and model name, an AssistantMessage whose TextBlock says 12,816, and a ResultMessage with total_cost_usd=0.34184. A comment notes the bare default billed 34 cents for one multiplication. — First contact, trimmed. An init handshake, the answer, and a receipt. Note that cost: $0.34 for one multiplication. That's a lesson, and we collect it below.

The answer is right (144 × 89 is 12,816), and three kinds of objects scrolled past: a SystemMessage handshake, an AssistantMessage with the answer, and a ResultMessage receipt. Yours may also show a few housekeeping events between them; ignore those for now, we'll formally meet the whole cast later this page.

But look at that receipt. $0.34. For one multiplication. Two things went wrong, both defaults: with no options, the SDK used the default model configured for Claude Code on my machine (a big one), and it loaded the full default toolbox, whose definitions ride along in every request. The fix for both is the options object, which is where the real agent starts.

Give it a workspace

An analyst needs something to analyze. Meet Beanline, the fictional specialty-coffee chain whose data the whole series runs on: six stores, twelve products, six months of daily sales. Three CSVs, about 11,000 rows, generated by a deterministic script so your copy is byte-for-byte identical to mine and every number in these posts reproduces on your machine.

BASH

mkdir workspace && cd workspace
BASE=https://raw.githubusercontent.com/yadneshSalvi/claude-agent-sdk-in-production/main/data
curl -sO $BASE/stores.csv -O $BASE/products.csv -O $BASE/sales.csv
cd ..

Peek at what arrived:

BASH

head -3 workspace/sales.csv
# date,store_id,product_id,units,revenue
# 2026-01-01,S01,P01,44,140.80
# 2026-01-01,S01,P02,77,354.20

The workspace/ folder is the agent's desk: the one place it works, with everything it needs on top. In Part 4 every conversation will get its own desk; for now there's one. Keeping the agent at a desk, rather than loose in your home directory, is more than tidiness. It's the first safety decision of the series, and it matters in about ninety seconds.

The options object

Now the real file. Create agent.py next to hello.py, starting with imports and configuration:

backend/agent.py

"""Part 1: a terminal data analyst. Ask it a question, watch it work."""

import asyncio
import sys

from claude_agent_sdk import (
    AssistantMessage,
    ClaudeAgentOptions,
    ResultMessage,
    SystemMessage,
    TextBlock,
    ToolResultBlock,
    ToolUseBlock,
    UserMessage,
    query,
)

MODEL = "claude-haiku-4-5"

OPTIONS = ClaudeAgentOptions(
    cwd="workspace",
    tools=["Read", "Glob", "Grep", "Bash", "Write"],
    permission_mode="bypassPermissions",
    model=MODEL,
)

Four decisions, each earning its keep:

model=MODEL: Haiku, the cheap end of the current lineup, as a constant so upgrading to a bigger model is a one-line change in Part 13 when we can measure whether it's worth it. Agent loops multiply tokens; default to cheap.
cwd="workspace": the desk. Every relative path the agent touches resolves inside this folder.
tools=[...]: the toolbox, cut down to five. The agent's tools are real: Read reads actual files, Bash runs actual shell commands. Listing exactly what the job needs makes the agent safer and cheaper, because every tool definition you don't ship is context you don't pay for. The full catalog is on the concepts page.
permission_mode="bypassPermissions": the one that deserves its own section.

The permission wall

Before I explain bypassPermissions, watch what happens without it. I deleted that one line and ran the March question. Here's the run, and it's the most instructive failure in this part:

A dark terminal transcript of the agent running without a permission mode. Glob succeeds, but Read, Write, and every Bash command that computes are refused with messages like 'Claude requested permissions but you haven't granted it yet' and 'This command requires approval'. After a note that 50 more attempts were denied, the agent gives a confidently wrong final answer, $59,278.80, marked wrong in the capture, after 62 turns and $0.3668. — No permission mode, nobody to approve. Reads mostly pass, computation is refused, and after 62 turns of creative denial-dodging the agent does the math 'manually' and gets it wrong. Denied its tools, a determined agent guesses.

Read that ending again, because it's the most important sentence in Part 1: denied every path to actually compute, the agent eventually "calculated manually" and produced wrong numbers with full confidence. $59,278.80 is not Downtown's March revenue. Nothing crashed. No error was raised. A polite, plausible, wrong answer, delivered after burning 62 turns and 37 cents trying everything else first.

Comic in four panels. Panel one: a laptop wearing a tiny necktie reaches happily for a toolbox labeled TOOLS that is wrapped in chains and padlocked, while a giant red DENIED stamp lands across the scene. Panel two: the laptop, now desperate, holds up a tiny pencil while a wall of red DENIED slips piles up behind another big DENIED stamp. Panel three: the laptop, serene and confident, raises one finger and announces 'IT'S $59,278.80. I COUNTED IN MY HEAD.' Panel four: Yad, a bearded developer with headphones, spit-takes his coffee while holding the real printout, which reads $51,319.60. — The wall run, in four panels. Deny an agent every tool and it does not stop; it guesses, beautifully formatted.

Here's what was happening. Every tool call passes a permission check, and permission_mode sets the posture. The default posture is interactive: harmless reads pass, and anything with side effects or arbitrary execution waits for a human to approve it. Claude Code has a human at the keyboard for exactly that. Our script has nobody wired up to answer, so every request for approval is refused, and the agent works around, and around, and around.

bypassPermissions removes the checkpoint entirely: every tool call is allowed, no questions asked. Which makes this the right moment for the series' recurring warning:

Put the permission_mode="bypassPermissions" line back. The wall run also quietly demonstrated something worth naming: the refusals went into the conversation and the agent adapted its plan each time. Feedback of any kind, even "no", is information the loop uses. That mechanism becomes the whole human-in-the-loop design in Act II.

Ask a real question

Now the rest of agent.py: a printing loop that turns the message stream into a readable trace, and a main you can pass questions to:

backend/agent.py

def first_value(tool_input: dict) -> str:
    """One short line describing a tool call, e.g. the command or file path."""
    return next((str(v) for v in tool_input.values() if v), "")[:90]


async def main(prompt: str) -> None:
    async for message in query(prompt=prompt, options=OPTIONS):
        if isinstance(message, SystemMessage) and message.subtype == "init":
            print(f"[session {message.data['session_id']}]\n")
        elif isinstance(message, AssistantMessage):
            for block in message.content:
                if isinstance(block, TextBlock):
                    print(block.text)
                elif isinstance(block, ToolUseBlock):
                    print(f"  -> {block.name}  {first_value(block.input)}")
        elif isinstance(message, UserMessage) and isinstance(message.content, list):
            for block in message.content:
                if isinstance(block, ToolResultBlock) and block.is_error:
                    print(f"  !! {str(block.content)[:120]}")
        elif isinstance(message, ResultMessage):
            print(f"\n[{message.num_turns} turns · ${message.total_cost_usd:.4f}]")

Don't sweat every branch yet; the next section names each message type properly. The shape is the point: one async for, a small isinstance ladder, and each kind of message becomes one line of output. Text prints as prose, tool calls print as -> lines, failed tool results print as !! lines, and the receipt prints last. Finish the file:

backend/agent.py

if __name__ == "__main__":
    question = " ".join(sys.argv[1:]) or (
        "Which store had the highest total revenue in March? Give the store's "
        "name, and how much it beat the runner-up by."
    )
    asyncio.run(main(question))

And run it:

BASH

uv run python agent.py

This is the moment the series is named after, so let it land. You did not tell it there were three files. You did not tell it the schema, or that revenue lives in sales.csv but names live in stores.csv, or that March means filtering a date column. It discovered all of that, wrote an awk aggregation I would have had to look up the syntax for, joined the names, and did the subtraction. And when it guessed a wrong file path early on, it read the error message and fixed its own mistake, the same think-act-observe loop, running on a failure instead of a success.

Diagram of the agent loop. On the left, a box labeled YOUR CODE containing query(prompt, options) sends one call into a dashed region labeled INSIDE THE SDK, THE LOOP, and receives back a dashed async-for message stream. Inside the region three stations: THINK (the model reads everything so far), ACT (a tool runs for real), OBSERVE (result joins the conversation), connected in a cycle labeled repeat until the job is done. An exit arrow leads to a green box: answer plus the bill, ResultMessage. — The whole mental model of this series. Your code is one call in and a narrated stream out; the think-act-observe cycle in the middle belongs to the SDK.

One honest note: the plan for this part said "watch it write pandas", and instead my agent reached for awk. It picks its own tools, and for an 11,000-row aggregation a shell one-liner is the sensible choice; that's the intern with a toolbox showing judgment, not a bug to fix. When the questions get heavier (charts in Part 4, whole reports), it reaches for Python without being asked. Ask it something yourself and see what it picks:

BASH

uv run python agent.py "Which product category makes the most money on weekends?"

Comic in three panels. Panel one: Yad, a bearded developer with headphones, asks a laptop wearing a tiny necktie 'WHAT WAS OUR MARCH REVENUE?'. Panel two: the laptop has dived head-first into a filing cabinet drawer, papers flying, sound effect RUSTLE RUSTLE, while Yad watches wide-eyed with his coffee. Panel three: the laptop proudly presents a stack of charts taller than Yad's head and announces '$51,319.60. I ALSO MADE THREE CHARTS.' — You ask one question; the loop decides how much work it takes. Sometimes the answer comes with charts nobody ordered.

Anatomy of the message stream

You've now seen the stream twice: raw reprs in hello.py, and through a printing loop in agent.py. Time to learn it properly, because this stream is the series' raw material. Part 2 translates it into server-sent events; Part 3 renders it as UI; Part 7 pauses it for approvals. Four message types carry everything:

Annotated diagram of the four message types from one real run. A SystemMessage init card carries session_id, model, tools, and cwd, annotated as the handshake whose session_id matters in Part 5. An AssistantMessage card holds a TextBlock and a ToolUseBlock with name Bash and an awk command, annotated: a list of blocks, not a string. A UserMessage card holds a ToolResultBlock with matching tool_use_id and the awk output, annotated: tool results wear a user badge. After a marker reading times-9 rounds, a ResultMessage card shows num_turns 9, total_cost_usd 0.0240 and the final answer, annotated as the receipt. — The Rosetta stone, drawn from the actual March run. Everything the rest of the series builds is a translation of these four shapes.

The two shapes worth staring at:

AssistantMessage.content is a list of blocks, not a string. One assistant message can carry prose (TextBlock) and an action (ToolUseBlock with a name, an input dict, and an id) side by side. Hold onto that: when we design the chat UI's data model in Part 3, assistant turns will be sequences of blocks for exactly this reason, and the design will feel inevitable instead of clever.

Tool results arrive inside a UserMessage. That surprises everyone once. From the model's point of view there are only two speakers: itself, and the world. Your typed question and a tool's output are both "the world said something", so the awk output rides back wearing a user badge, as a ToolResultBlock whose tool_use_id points at the ToolUseBlock that asked for it. That id-matching is how you'll pair spinners with results in the UI, two parts from now.

The is_error field on a tool result is True for failures like our wrong-path Read, and the model reads those failures like any other input. And the final ResultMessage is the receipt: num_turns, duration, usage token counts, the final text, and total_cost_usd. Which brings us to a habit.

The cost ritual

Every part of this series ends its runs by printing total_cost_usd, and every part quotes the real measured numbers. Here's today's ledger, from my actual runs:

Run	What	Cost
`hello.py`	default model, default toolbox	$0.3418
Wall run	no permission mode, 62 flailing turns	$0.3668
The March question	Haiku, five tools, nine turns	$0.0240

The spread is the lesson. Same SDK, same machine, same kind of question: a 14x difference, decided entirely by configuration. Model choice dominates, toolbox size compounds (every tool definition ships with every request), and failure modes burn turns. Agent cost is a product decision you make in code, which is why the habit starts on day one and why Part 13 graduates it from watching costs to enforcing budgets with max_budget_usd. The mechanics of how turns are billed live on the concepts page.

You know the version of this lesson from the wild: the demo that cost pennies all week, the "let's just run it on the full dataset" Friday, and the Saturday email from billing. Agents don't make that story worse, but they do make it faster. Print the receipt.

The diary the SDK keeps

One more discovery before we ship this part, and it costs nothing to look at. The SDK has been keeping records this whole time:

A dark terminal. ls ~/.claude/projects/ shows a folder named after the workspace directory path. Inside it, a JSONL file named with the session id from the March run. head shows the first JSON line containing the original question. A comment says every turn you've ever run is in here, and Part 5 cashes this in. — The agent's diary: one folder per working directory, one JSONL file per session, one JSON line per event. We didn't build this. It came with the engine.

Every session you've run today is on disk, as one JSONL file per session, filed under the working directory it ran in. That [session ...] id our script prints is the filename. We're not going to use the diary yet, but notice what it means: conversation persistence already exists, for free, before we've written a single line of storage code. Part 5 turns this folder into full conversation memory and a sessions sidebar, mostly by asking the SDK to read its own diary back.

Wrap it in git

Standard series ritual, from the beanline-analyst/ project root. The .gitignore matters more than usual because agents create files when you're not looking:

.gitignore

.venv/
__pycache__/
.env
.env.local
node_modules/
workspaces/
.DS_Store

workspaces/ (plural) doesn't exist yet; it's the Part 4 per-conversation desks, ignored in advance so no agent-generated scratch work ever lands in a commit. Then:

BASH

git init
git add .
git commit -m "part 1: a terminal analyst that answers from real data"

Glance at git status before that commit, today and every day: if .env or anything you didn't recognize is staged, stop and fix the ignore file first.

A real run of agent.py: the question goes in, tool calls scroll as the agent investigates, and the answer lands with the cost receipt.

What you built

Part 1

A working agent in 56 lines: query() plus ClaudeAgentOptions, with the think-act-observe loop, real tools, and context management all inherited from the Claude Code engine.
A configured toolbox: cwd as the agent's desk, tools cut to the five the job needs, MODEL pinned to Haiku, and the 14x cost spread that justifies all three.
The message stream, decoded: init handshake with a session_id, AssistantMessage as a list of blocks, tool results arriving in UserMessage wearing a user badge, and the ResultMessage receipt.
A working knowledge of the permission wall: what bypassPermissions trades away, why the default mode refuses a headless agent, and the scissors debt Parts 7 and 8 repay.
The cost ritual: print total_cost_usd every run, quote real numbers, never be surprised by a bill again.

Test yourself

Score ··

Your agent calls the Bash tool and the command's output comes back in the stream. What object carries it?

Run without any permission_mode, our headless agent produced a confident, wrong answer. What actually happened?

What's the difference between tools=['Read', ...] and bypassPermissions in ClaudeAgentOptions?

Where did the March conversation end up after the run finished?

hello.py cost $0.34 and the March analysis cost $0.024. What explains a trivial question costing 14x more than a real one?

Your analyst lives in a terminal and answers one question per process. Nothing streams, nothing remembers, and only you can use it. In Part 2 it goes behind a URL: FastAPI in front, and the message stream you decoded today translated into a server-sent-event vocabulary that the next twelve parts extend without ever breaking a client.

Every part of this series has a companion folder in the claude-agent-sdk-in-production repo: the complete, tested project exactly as it exists at the end of that part. This part's folder is part-01-first-agent. Code blocks with a GitHub icon in the header link straight to the exact file, and "View full file" shows the whole file in place with this section's lines highlighted.