Series · LangGraph from Scratch · Part 8 of 8
· 23 min read
LangGraph from Scratch, Part 8: Deploying to the Internet
Your laptop bot goes public. The frontend on Vercel, the backend on one always-awake Fly machine, and the link in your friend's phone, all in one sitting.
langgraph · fastapi · deployment · tutorial
Your bot is good now. It streams, it remembers your name, it reaches for a calculator and a search engine when it needs one. And exactly one person on Earth can use it: you, at this laptop, with two terminals running. Close the lid and it's gone.
By the end of this page that's over. The frontend lives on Vercel, the backend lives on Fly, and the whole thing answers from a URL you can text to a friend. They open it on their phone, on their couch, and your bot replies. Here's where we're headed.
Two deployments, one quiet catch you'll meet at the end. Let's ship it.
Two homes for one app
The frontend and the backend want different things, so they go to different places. Your Next.js frontend is, after next build, a pile of static files: HTML, JavaScript, CSS. Files love a CDN, copied to servers near everyone, sitting asleep until a browser asks. That's Vercel's whole game, and the free tier is generous.
The backend is the opposite kind of thing. It's a Python process that has to stay running, hold a streaming connection open, and, the part that decides everything, keep your conversations in memory. Remember Part 7: InMemorySaver stores every thread in a plain dict inside the one running process. That memory only works if there's one process and it stays alive.
That picture also explains a subtlety that bites people later. The browser loads the page from Vercel, but every chat message goes directly from the browser to Fly, not through Vercel. Two origins talking to each other. File that away; it's the reason a familiar error shows up near the finish line.
Getting the backend ready to ship
Right now your backend runs because your machine happens to have the right Python and the right packages installed. A server has none of that. You hand it three small files that say exactly how to build the thing from scratch, and a one-line change so it'll trust your real frontend.
First, pin your dependencies so the server installs the same versions you tested with. From inside backend/, with your virtual environment active, you could run pip freeze, but a hand-written list is cleaner for a tutorial. Create requirements.txt:
fastapi==0.136.3uvicorn[standard]==0.49.0langgraph==1.2.5langchain==1.3.9langchain-openai==1.3.2langchain-anthropic==1.4.6langchain-tavily==0.2.18python-dotenv==1.0.1Next, a Dockerfile: a recipe a server follows to build a tiny Linux box with your app inside. It starts from a slim Python image, installs the requirements, copies your code, and runs uvicorn on port 8080.
FROM python:3.12-slim
WORKDIR /app
COPY requirements.txt .RUN pip install --no-cache-dir -r requirements.txt
COPY . .
EXPOSE 8080CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8080"]Copying requirements.txt and installing before copying the rest of your code is a deliberate order: it lets the build cache the slow install step, so editing a Python file later rebuilds in seconds instead of reinstalling everything.
One more file, and it matters more than it looks. A .dockerignore keeps junk out of the image, including the two things that must never go in: your fat .venv folder and your .env full of secrets.
.venv__pycache__*.pyc.envLast, the one code change. Back in Part 2 you told the backend to welcome http://localhost:3000, and the text promised that in Part 8 "the origin list gets pinned to your real frontend URL." Here's that promise. Instead of hard-coding the URL, read it from an environment variable so the same code runs locally and in production. Update the CORS block in main.py:
import os # at the top with the other imports
FRONTEND_ORIGIN = os.environ.get("FRONTEND_ORIGIN", "http://localhost:3000")
app.add_middleware( CORSMiddleware, allow_origins=["http://localhost:3000", FRONTEND_ORIGIN], allow_methods=["*"], allow_headers=["*"],)Locally, FRONTEND_ORIGIN isn't set, so it falls back to localhost:3000 and nothing about your dev setup changes. In production you'll set it to your Vercel URL, and the backend will welcome the live site. You don't know that URL yet, which turns out to be the whole shape of the next few steps.
Sending the backend to Fly
Install the Fly CLI (flyctl) and sign up if you haven't; their docs have the one-liner for your OS. Then, from inside backend/, create the app without deploying yet:
fly launch --no-deployfly launch notices your Dockerfile and uses it, asks you to name the app and pick a region, and offers to set up a database (say no, you don't need one). It writes a fly.toml, the config file for your app. Open it and make sure of three things, two of which are the difference between memory that works and memory that doesn't:
app = "langgraph-chatbot"primary_region = "iad"
[build]
[http_service] internal_port = 8080 # must match uvicorn's --port in the Dockerfile force_https = true auto_stop_machines = "off" # keep the one machine awake; memory lives in it auto_start_machines = true min_machines_running = 1 # never scale to zero
[[vm]] memory = "512mb" # 256mb is too small once the model clients load cpu_kind = "shared" cpus = 1The defaults Fly writes will try to stop your machine when it's idle and scale to zero to save you money. For almost any other app that's a gift. For yours it's a memory wipe: a stopped machine is a cleared dict, so a quiet hour would erase every conversation. auto_stop_machines = "off" and min_machines_running = 1 are you telling Fly, keep my one desk exactly where it is. This is "the backend must be a single long-lived process" written as four words of config.
Now give the server its keys. These never touch your repo or your image; they live encrypted as Fly secrets and get injected as environment variables when the app runs.
With the keys staged, ship it:
fly deployFly builds your image, pushes it, boots the machine, and waits until the app reports healthy. When it's done, it prints the address your backend now answers at.
Prove it before you move on. Open https://your-app.fly.dev/docs in a browser and you'll see the same FastAPI documentation page you first met back in Part 2, except now it's served from a machine you don't own, in a city you didn't pick, to anyone with the link. Copy that fly.dev URL. It's the bridge to the other half.
The order you deploy in is not optional
Here's the knot. The frontend needs to know the backend's URL, so it can call it. The backend needs to know the frontend's URL, so it can welcome it through CORS. Each one needs an address the other hasn't been given yet.
You already broke the first link: the backend is live and you have its fly.dev URL. So the frontend goes next, carrying that URL, and the backend gets pointed back at the frontend last. Three steps, in that order, and the knot falls open.
Putting the frontend on Vercel
Your code has been a git repo since Part 1, so push it to GitHub if it isn't there already:
git remote add origin https://github.com/your-name/langgraph-tutorial.gitgit push -u origin mainIf git answers remote origin already exists, you've pointed at a repo before; swap add for set-url and push again.
Now go to Vercel, click Add New Project, and import that repo. One screen does all the work, and it has one field beginners miss. Your repo holds both backend/ and frontend/, so you have to tell Vercel the app lives in the frontend subfolder by setting Root Directory. Then add one environment variable: NEXT_PUBLIC_API_BASE_URL, set to your fly.dev URL.
Click Deploy, watch the build run, and in a minute Vercel hands you a your-app.vercel.app URL. Open it. The page loads, the chat UI is right there, and it looks perfect. Now type a message.
The error that means you're almost there
You typed a message on the live site and nothing came back. Open the browser's developer console and there it is, in red.
Read it like a sentence and it's almost friendly. The browser tried to POST from https://your-app.vercel.app to your Fly backend, and the backend didn't send back an Access-Control-Allow-Origin header welcoming that origin. Of course it didn't: your allow_origins list still only knows about localhost. The backend is up and healthy; it has not been told to welcome this new address yet. This is the same lesson from Part 2, grown up. Back then it was :3000 knocking on :8000; now it's two URLs on the open internet.
The fix is the env var you wired earlier. Tell the backend its frontend's real origin, and Fly redeploys the moment you set it:
fly secrets set FRONTEND_ORIGIN=https://your-app.vercel.appSetting a secret on a live app triggers a fresh deploy automatically, so a few seconds later your backend is running again with the Vercel URL in its allow_origins list. Go back to the live site and send your name. The reply streams in. It works.
It answers from anywhere now
Pull out your phone, leave your home Wi-Fi if you want, and open the vercel.app URL. Tell it your name, change the subject, ask it back. It remembers, on a device that has never once touched your laptop.
That's the build. Eight parts ago you had a laptop and a blank folder; now you have an AI chatbot, on the internet, that streams and remembers and reaches for tools, and a link you can hand to anyone. Take the win. There's one honest footnote left, and it's the best teacher in the whole series.
Watch it forget you
Change anything in your backend, even a comment, and redeploy:
fly deployNow open your live site and ask the question you've asked all part: "What's my name?"
Blank. Not a bug; the whole truth of InMemorySaver in one moment. fly deploy doesn't restart your old machine, it builds and boots a fresh one, and a fresh machine has empty RAM. Every notebook in that dict went out with the old machine. The goldfish bowl from Part 7, the one labeled PROD, just tipped over in production exactly as promised.
This isn't a flaw in what you built; it's the honest edge of the simplest possible memory, and now you've felt it. For a portfolio piece or a demo you share for an afternoon, an in-memory bot that resets on each deploy is completely fine. The day you want conversations to outlive a deploy is the day you reach for the first item below.
Where to go from here
You have a real, deployed, full-stack AI app. That's rarer than it sounds, and every direction from here is a weekend project that teaches one more real thing.
- Make memory survive restarts. Swap
InMemorySaverforSqliteSaver(one local file) orPostgresSaver(a real database). They share the checkpointer interface from Part 7, so it's mostly a one-line change to how you buildcheckpointer, plus a database connection. This is the direct fix for what you just watched happen. - Give each visitor their own threads. Add authentication with Clerk or NextAuth so two people don't share one pile of conversations. Right now every visitor shares the same
localStorage-per-browser scheme; real users need real accounts. - Move slow work off the request. For long tool chains or background jobs, a queue like Celery or RQ lets the web process stay snappy while the heavy work runs elsewhere. This is the "graduate" architecture most production agents grow into.
- Hand it more tools. You wrote two in Part 6; the same
@toolpattern, or an MCP server, or a vector search over your own documents, all plug into the exact graph you already have. - Try a different brain. Swap the model in
graph.pyfor the other commercial provider, or run one locally with Ollama. TheMODELconstant means it's a one-line experiment.
What you built
Part 8- Two deployments that fit their jobs: the frontend as static files on Vercel's CDN, and the backend as one always-awake Fly machine, split because in-memory conversations need a single long-lived process.
- A backend packaged for production: a pinned
requirements.txt, aDockerfile, and a.dockerignorethat keeps your.venvand.envout of the image entirely. - The backend live on Fly:
fly launch, keys set withfly secrets set(never in git),fly deploy, and afly.tomltuned to keep one machine awake so memory survives. - The frontend live on Vercel: imported from GitHub with Root Directory
frontendandNEXT_PUBLIC_API_BASE_URLpointed at the Fly URL, baked in at build time. - The loop closed: CORS pinned to your real Vercel origin with
FRONTEND_ORIGIN, the chat working from your phone, and a clear-eyed grasp of why a redeploy wipes memory and what fixes it.
Test yourself
Why does the backend go on a single always-awake Fly machine instead of a serverless platform that scales to zero?
The live frontend showed No 'Access-Control-Allow-Origin' header is present in the console. What caused it?
When does the value of NEXT_PUBLIC_API_BASE_URL actually get into your deployed frontend?
Why must your OpenAI or Anthropic API key never go in a NEXT_PUBLIC_ variable?
Everything worked, then a fly deploy made the bot forget your name. Why?
Commit the finale, from the project root:
git add .git commit -m "part 8: deploy the frontend to Vercel and the backend to Fly"Your bot is on the internet now, the same as any app you admire, served from two clouds to whoever you send the link. Eight parts ago it was an empty folder and a little fear of the word "agents." The next thing this bot learns, you're the one who gets to teach it.