Series · LangGraph from Scratch · Part 8 of 8

· 23 min read

LangGraph from Scratch, Part 8: Deploying to the Internet

Your laptop bot goes public. The frontend on Vercel, the backend on one always-awake Fly machine, and the link in your friend's phone, all in one sitting.

langgraph · fastapi · deployment · tutorial

Your bot is good now. It streams, it remembers your name, it reaches for a calculator and a search engine when it needs one. And exactly one person on Earth can use it: you, at this laptop, with two terminals running. Close the lid and it's gone.

By the end of this page that's over. The frontend lives on Vercel, the backend lives on Fly, and the whole thing answers from a URL you can text to a friend. They open it on their phone, on their couch, and your bot replies. Here's where we're headed.

Today's destination. The same chat you built on localhost, now at a public address, on a device that has never seen your laptop.

Two deployments, one quiet catch you'll meet at the end. Let's ship it.

Two homes for one app

The frontend and the backend want different things, so they go to different places. Your Next.js frontend is, after next build, a pile of static files: HTML, JavaScript, CSS. Files love a CDN, copied to servers near everyone, sitting asleep until a browser asks. That's Vercel's whole game, and the free tier is generous.

The backend is the opposite kind of thing. It's a Python process that has to stay running, hold a streaming connection open, and, the part that decides everything, keep your conversations in memory. Remember Part 7: InMemorySaver stores every thread in a plain dict inside the one running process. That memory only works if there's one process and it stays alive.

The browser loads the page once from Vercel, then talks straight to Fly for every message. Two different homes, and the backend is the one that has to stay awake.

That picture also explains a subtlety that bites people later. The browser loads the page from Vercel, but every chat message goes directly from the browser to Fly, not through Vercel. Two origins talking to each other. File that away; it's the reason a familiar error shows up near the finish line.

Getting the backend ready to ship

Right now your backend runs because your machine happens to have the right Python and the right packages installed. A server has none of that. You hand it three small files that say exactly how to build the thing from scratch, and a one-line change so it'll trust your real frontend.

First, pin your dependencies so the server installs the same versions you tested with. From inside backend/, with your virtual environment active, you could run pip freeze, but a hand-written list is cleaner for a tutorial. Create requirements.txt:

TEXT
fastapi==0.136.3
uvicorn[standard]==0.49.0
langgraph==1.2.5
langchain==1.3.9
langchain-openai==1.3.2
langchain-anthropic==1.4.6
langchain-tavily==0.2.18
python-dotenv==1.0.1

Next, a Dockerfile: a recipe a server follows to build a tiny Linux box with your app inside. It starts from a slim Python image, installs the requirements, copies your code, and runs uvicorn on port 8080.

DOCKERFILE
FROM python:3.12-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
EXPOSE 8080
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8080"]

Copying requirements.txt and installing before copying the rest of your code is a deliberate order: it lets the build cache the slow install step, so editing a Python file later rebuilds in seconds instead of reinstalling everything.

One more file, and it matters more than it looks. A .dockerignore keeps junk out of the image, including the two things that must never go in: your fat .venv folder and your .env full of secrets.

TEXT
.venv
__pycache__
*.pyc
.env

Last, the one code change. Back in Part 2 you told the backend to welcome http://localhost:3000, and the text promised that in Part 8 "the origin list gets pinned to your real frontend URL." Here's that promise. Instead of hard-coding the URL, read it from an environment variable so the same code runs locally and in production. Update the CORS block in main.py:

PYTHON
import os # at the top with the other imports
FRONTEND_ORIGIN = os.environ.get("FRONTEND_ORIGIN", "http://localhost:3000")
app.add_middleware(
CORSMiddleware,
allow_origins=["http://localhost:3000", FRONTEND_ORIGIN],
allow_methods=["*"],
allow_headers=["*"],
)

Locally, FRONTEND_ORIGIN isn't set, so it falls back to localhost:3000 and nothing about your dev setup changes. In production you'll set it to your Vercel URL, and the backend will welcome the live site. You don't know that URL yet, which turns out to be the whole shape of the next few steps.

Sending the backend to Fly

Install the Fly CLI (flyctl) and sign up if you haven't; their docs have the one-liner for your OS. Then, from inside backend/, create the app without deploying yet:

BASH
fly launch --no-deploy

fly launch notices your Dockerfile and uses it, asks you to name the app and pick a region, and offers to set up a database (say no, you don't need one). It writes a fly.toml, the config file for your app. Open it and make sure of three things, two of which are the difference between memory that works and memory that doesn't:

TOML
app = "langgraph-chatbot"
primary_region = "iad"
[build]
[http_service]
internal_port = 8080 # must match uvicorn's --port in the Dockerfile
force_https = true
auto_stop_machines = "off" # keep the one machine awake; memory lives in it
auto_start_machines = true
min_machines_running = 1 # never scale to zero
[[vm]]
memory = "512mb" # 256mb is too small once the model clients load
cpu_kind = "shared"
cpus = 1

The defaults Fly writes will try to stop your machine when it's idle and scale to zero to save you money. For almost any other app that's a gift. For yours it's a memory wipe: a stopped machine is a cleared dict, so a quiet hour would erase every conversation. auto_stop_machines = "off" and min_machines_running = 1 are you telling Fly, keep my one desk exactly where it is. This is "the backend must be a single long-lived process" written as four words of config.

Now give the server its keys. These never touch your repo or your image; they live encrypted as Fly secrets and get injected as environment variables when the app runs.

Secrets go to the platform, not the picture. Set the same keys your .env held locally; Fly hands them to the process at runtime.

With the keys staged, ship it:

BASH
fly deploy

Fly builds your image, pushes it, boots the machine, and waits until the app reports healthy. When it's done, it prints the address your backend now answers at.

The backend is live on the public internet. That fly.dev URL is the one fact the frontend is about to need.

Prove it before you move on. Open https://your-app.fly.dev/docs in a browser and you'll see the same FastAPI documentation page you first met back in Part 2, except now it's served from a machine you don't own, in a city you didn't pick, to anyone with the link. Copy that fly.dev URL. It's the bridge to the other half.

The order you deploy in is not optional

Here's the knot. The frontend needs to know the backend's URL, so it can call it. The backend needs to know the frontend's URL, so it can welcome it through CORS. Each one needs an address the other hasn't been given yet.

It looks circular, so you break the loop with order. Backend first for its URL, then the frontend pointed at it, then the backend pointed back.

You already broke the first link: the backend is live and you have its fly.dev URL. So the frontend goes next, carrying that URL, and the backend gets pointed back at the frontend last. Three steps, in that order, and the knot falls open.

Putting the frontend on Vercel

Your code has been a git repo since Part 1, so push it to GitHub if it isn't there already:

BASH
git remote add origin https://github.com/your-name/langgraph-tutorial.git
git push -u origin main

If git answers remote origin already exists, you've pointed at a repo before; swap add for set-url and push again.

Now go to Vercel, click Add New Project, and import that repo. One screen does all the work, and it has one field beginners miss. Your repo holds both backend/ and frontend/, so you have to tell Vercel the app lives in the frontend subfolder by setting Root Directory. Then add one environment variable: NEXT_PUBLIC_API_BASE_URL, set to your fly.dev URL.

The two settings that matter, both in accent: Root Directory pointed at frontend, and the backend's URL handed to the frontend as NEXT_PUBLIC_API_BASE_URL.

Click Deploy, watch the build run, and in a minute Vercel hands you a your-app.vercel.app URL. Open it. The page loads, the chat UI is right there, and it looks perfect. Now type a message.

The error that means you're almost there

You typed a message on the live site and nothing came back. Open the browser's developer console and there it is, in red.

The deliberate break, and an old friend. This is the exact CORS wall from Part 2, except now the two origins are real public URLs instead of two ports on your laptop.

Read it like a sentence and it's almost friendly. The browser tried to POST from https://your-app.vercel.app to your Fly backend, and the backend didn't send back an Access-Control-Allow-Origin header welcoming that origin. Of course it didn't: your allow_origins list still only knows about localhost. The backend is up and healthy; it has not been told to welcome this new address yet. This is the same lesson from Part 2, grown up. Back then it was :3000 knocking on :8000; now it's two URLs on the open internet.

The fix is the env var you wired earlier. Tell the backend its frontend's real origin, and Fly redeploys the moment you set it:

BASH
fly secrets set FRONTEND_ORIGIN=https://your-app.vercel.app

Setting a secret on a live app triggers a fresh deploy automatically, so a few seconds later your backend is running again with the Vercel URL in its allow_origins list. Go back to the live site and send your name. The reply streams in. It works.

It answers from anywhere now

Pull out your phone, leave your home Wi-Fi if you want, and open the vercel.app URL. Tell it your name, change the subject, ask it back. It remembers, on a device that has never once touched your laptop.

The payoff, and the same shot from the top of the page, now real. Streaming, tools, memory, all of it, served from two clouds to a phone on the couch.

That's the build. Eight parts ago you had a laptop and a blank folder; now you have an AI chatbot, on the internet, that streams and remembers and reaches for tools, and a link you can hand to anyone. Take the win. There's one honest footnote left, and it's the best teacher in the whole series.

Watch it forget you

Change anything in your backend, even a comment, and redeploy:

BASH
fly deploy

Now open your live site and ask the question you've asked all part: "What's my name?"

Gone. The redeploy started a brand-new machine with empty RAM, and every conversation went with the old one. This is Part 7's caveat, made visceral.

Blank. Not a bug; the whole truth of InMemorySaver in one moment. fly deploy doesn't restart your old machine, it builds and boots a fresh one, and a fresh machine has empty RAM. Every notebook in that dict went out with the old machine. The goldfish bowl from Part 7, the one labeled PROD, just tipped over in production exactly as promised.

Redeploy taken literally. Every fly deploy is a brand-new machine that has never met you, because the only place it kept you was the RAM you just replaced.

This isn't a flaw in what you built; it's the honest edge of the simplest possible memory, and now you've felt it. For a portfolio piece or a demo you share for an afternoon, an in-memory bot that resets on each deploy is completely fine. The day you want conversations to outlive a deploy is the day you reach for the first item below.

Where to go from here

You have a real, deployed, full-stack AI app. That's rarer than it sounds, and every direction from here is a weekend project that teaches one more real thing.

  • Make memory survive restarts. Swap InMemorySaver for SqliteSaver (one local file) or PostgresSaver (a real database). They share the checkpointer interface from Part 7, so it's mostly a one-line change to how you build checkpointer, plus a database connection. This is the direct fix for what you just watched happen.
  • Give each visitor their own threads. Add authentication with Clerk or NextAuth so two people don't share one pile of conversations. Right now every visitor shares the same localStorage-per-browser scheme; real users need real accounts.
  • Move slow work off the request. For long tool chains or background jobs, a queue like Celery or RQ lets the web process stay snappy while the heavy work runs elsewhere. This is the "graduate" architecture most production agents grow into.
  • Hand it more tools. You wrote two in Part 6; the same @tool pattern, or an MCP server, or a vector search over your own documents, all plug into the exact graph you already have.
  • Try a different brain. Swap the model in graph.py for the other commercial provider, or run one locally with Ollama. The MODEL constant means it's a one-line experiment.

What you built

Part 8
  • Two deployments that fit their jobs: the frontend as static files on Vercel's CDN, and the backend as one always-awake Fly machine, split because in-memory conversations need a single long-lived process.
  • A backend packaged for production: a pinned requirements.txt, a Dockerfile, and a .dockerignore that keeps your .venv and .env out of the image entirely.
  • The backend live on Fly: fly launch, keys set with fly secrets set (never in git), fly deploy, and a fly.toml tuned to keep one machine awake so memory survives.
  • The frontend live on Vercel: imported from GitHub with Root Directory frontend and NEXT_PUBLIC_API_BASE_URL pointed at the Fly URL, baked in at build time.
  • The loop closed: CORS pinned to your real Vercel origin with FRONTEND_ORIGIN, the chat working from your phone, and a clear-eyed grasp of why a redeploy wipes memory and what fixes it.

Test yourself

Score ··
01

Why does the backend go on a single always-awake Fly machine instead of a serverless platform that scales to zero?

02

The live frontend showed No 'Access-Control-Allow-Origin' header is present in the console. What caused it?

03

When does the value of NEXT_PUBLIC_API_BASE_URL actually get into your deployed frontend?

04

Why must your OpenAI or Anthropic API key never go in a NEXT_PUBLIC_ variable?

05

Everything worked, then a fly deploy made the bot forget your name. Why?

Commit the finale, from the project root:

BASH
git add .
git commit -m "part 8: deploy the frontend to Vercel and the backend to Fly"

Your bot is on the internet now, the same as any app you admire, served from two clouds to whoever you send the link. Eight parts ago it was an empty folder and a little fear of the word "agents." The next thing this bot learns, you're the one who gets to teach it.