LangGraph tutorial: your first multi-step agent for SaaS support automation

LangGraph tutorial: your first multi-step agent for SaaS support automation

May 16, 20267 minAI, LangGraph, Agents, Tutorial, Python

Short answer (60 seconds): LangGraph lets you build agents with persistent state and conditional flows in Python. You build it as a graph: nodes are functions (typically calling an LLM), edges are transitions (can be conditional). In this tutorial you build a support agent that classifies an incoming ticket, decides whether to auto-respond or escalate, drafts the response with Claude Sonnet, and allows human-in-the-loop for sensitive tickets. ~3 hours to implement, USD 30-60/month operating 5K tickets.

LangGraph was the fastest-growing framework in 2025-2026 for building AI agents in Python. The reason is practical: it combines the good parts of LangChain (integrations) with a more predictable execution model (explicit graphs vs implicit chains).

This tutorial builds something real: a SaaS support agent that replaces first-contact with customers. The structure translates to almost any multi-step use case (document processing, report generation, automated onboarding).

What you'll have at the end

A Python process that:

  1. Receives a support ticket via webhook.
  2. Classifies it with Claude Haiku (category + urgency).
  3. Decides whether to auto-respond or escalate to a human.
  4. If auto-responding, drafts with Claude Sonnet using internal docs as context.
  5. If escalating, opens a task with summary in Notion/Linear.
  6. Persists each step so the flow survives crashes.

Stack: Python 3.11+, LangGraph, Anthropic SDK, Postgres for checkpointing.

Setup

~
# create venv and install python -m venv .venv source .venv/bin/activate pip install langgraph langchain-anthropic psycopg python-dotenv # environment variables cat > .env <<'EOF' ANTHROPIC_API_KEY=sk-ant-... DATABASE_URL=postgresql://localhost/agent_dev EOF

Step 1 · Define the State

The State is what the graph accumulates as it runs. In LangGraph it's defined as a TypedDict:

~
# agent/state.py from typing import TypedDict, Annotated, Sequence from langchain_core.messages import BaseMessage import operator class TicketCategory(TypedDict): category: str # "billing" | "technical" | "general" | "urgent" confidence: float reasoning: str class SupportAgentState(TypedDict): # Input ticket_id: str ticket_text: str customer_email: str # Computed by the agent classification: TicketCategory | None action: str | None # "auto_respond" | "escalate_human" | "ask_clarification" response_draft: str | None response_sent: bool # History messages: Annotated[Sequence[BaseMessage], operator.add] # Safety iterations: int

Why TypedDict and not Pydantic: LangGraph expects dicts. Pydantic adds unnecessary overhead here. If you want stronger validation, use Pydantic in the handlers that receive the initial ticket, not in the internal state.

Step 2 · Classification node

Each node is a function that takes the current state and returns the fields it updated.

~
# agent/nodes/classify.py from langchain_anthropic import ChatAnthropic from langchain_core.messages import HumanMessage, SystemMessage import json from agent.state import SupportAgentState, TicketCategory CLASSIFY_LLM = ChatAnthropic( model="claude-3-5-haiku-20241022", temperature=0, max_tokens=300, ) SYSTEM = """You are a support ticket classifier. Given the ticket text, return JSON with: - category: one of "billing", "technical", "general", "urgent" - confidence: 0.0 to 1.0 - reasoning: short explanation (max 30 words) JSON only, no other text.""" def classify_ticket(state: SupportAgentState) -> dict: msg = CLASSIFY_LLM.invoke([ SystemMessage(content=SYSTEM), HumanMessage(content=state["ticket_text"]), ]) parsed: TicketCategory = json.loads(msg.content) return { "classification": parsed, "iterations": state.get("iterations", 0) + 1, }

Key detail: the function returns only the fields it changes, not the full state. LangGraph merges automatically.

Step 3 · Conditional routing

This is where LangGraph's magic happens. Define a function that decides the next transition:

~
# agent/routing.py from agent.state import SupportAgentState def route_after_classification(state: SupportAgentState) -> str: classification = state["classification"] cat = classification["category"] confidence = classification["confidence"] # Urgent tickets always go to human if cat == "urgent": return "escalate" # Low classification confidence → escalate if confidence < 0.7: return "escalate" # Complex billing → human (compliance) if cat == "billing": return "escalate" # Rest: auto-respond return "respond"

Step 4 · Response generation node

~
# agent/nodes/respond.py from langchain_anthropic import ChatAnthropic from langchain_core.messages import HumanMessage, SystemMessage from agent.state import SupportAgentState from agent.knowledge_base import retrieve_relevant_docs RESPONSE_LLM = ChatAnthropic( model="claude-3-5-sonnet-20241022", temperature=0.3, max_tokens=600, ) SYSTEM_TEMPLATE = """You are a {company} support agent. Tone: friendly, clear, concise. Use ONLY the information in the context below to answer. If you can't answer with what you have, say you'll escalate to a human and DO NOT make up information. Context: {context}""" def generate_response(state: SupportAgentState) -> dict: docs = retrieve_relevant_docs(state["ticket_text"], top_k=5) context = "\n\n".join([d.content for d in docs]) system = SYSTEM_TEMPLATE.format(company="YourSaaS", context=context) msg = RESPONSE_LLM.invoke([ SystemMessage(content=system), HumanMessage(content=state["ticket_text"]), ]) return { "response_draft": msg.content, "action": "auto_respond", }

Step 5 · Assemble the graph

Now you wire all the nodes into a StateGraph:

~
# agent/graph.py from langgraph.graph import StateGraph, END from langgraph.checkpoint.postgres import PostgresSaver from agent.state import SupportAgentState from agent.nodes.classify import classify_ticket from agent.nodes.respond import generate_response from agent.nodes.escalate import escalate_to_human from agent.nodes.send import send_response from agent.routing import route_after_classification def build_graph(checkpointer=None): builder = StateGraph(SupportAgentState) # Register nodes builder.add_node("classify", classify_ticket) builder.add_node("respond", generate_response) builder.add_node("escalate", escalate_to_human) builder.add_node("send", send_response) # Entry point builder.set_entry_point("classify") # Edges builder.add_conditional_edges( "classify", route_after_classification, {"respond": "respond", "escalate": "escalate"}, ) builder.add_edge("respond", "send") builder.add_edge("escalate", END) # human will take over builder.add_edge("send", END) return builder.compile(checkpointer=checkpointer)

Step 6 · Human-in-the-loop with interrupt

For sensitive tickets, we want a human to approve the response before sending. LangGraph supports this natively with interrupt_before:

~
# agent/graph.py (modified) def build_graph(checkpointer=None): # ... same code as above ... return builder.compile( checkpointer=checkpointer, interrupt_before=["send"], # pause before sending )

Now the graph pauses before the send node. Your API exposes two endpoints:

~
# api/main.py from fastapi import FastAPI from langgraph.checkpoint.postgres import PostgresSaver from agent.graph import build_graph app = FastAPI() checkpointer = PostgresSaver.from_conn_string(os.environ["DATABASE_URL"]) graph = build_graph(checkpointer=checkpointer) @app.post("/tickets/{ticket_id}/process") async def process_ticket(ticket_id: str, body: dict): config = {"configurable": {"thread_id": ticket_id}} # Run up to the interrupt result = await graph.ainvoke( {"ticket_id": ticket_id, "ticket_text": body["text"], ...}, config=config, ) if result.get("response_draft"): return {"status": "pending_approval", "draft": result["response_draft"]} return {"status": "escalated"} @app.post("/tickets/{ticket_id}/approve") async def approve_response(ticket_id: str, body: dict): config = {"configurable": {"thread_id": ticket_id}} # If the human edits the response, update the state if body.get("edited_draft"): await graph.aupdate_state(config, {"response_draft": body["edited_draft"]}) # Continue from the interrupt result = await graph.ainvoke(None, config=config) return {"status": "sent"}

Step 7 · Observability

Three minimum logs per execution:

~
# agent/observability.py import structlog logger = structlog.get_logger() def log_transition(state, node_name, decision=None): logger.info( "agent_transition", ticket_id=state["ticket_id"], node=node_name, decision=decision, iterations=state.get("iterations", 0), classification=state.get("classification"), )

And configure LangSmith for full tracing (free up to 5K traces/month):

~
export LANGCHAIN_TRACING_V2=true export LANGCHAIN_API_KEY=ls__... export LANGCHAIN_PROJECT=support-agent

Each execution leaves you with a visual timeline of every node, its input, output, latency, and tokens consumed. Essential for debugging.

Production costs

For 5,000 tickets/month with this pipeline:

ComponentEstimated monthly cost
Claude Haiku (classification, 5K calls)USD 5-10
Claude Sonnet (generation, ~3K calls — 60% auto-respond)USD 25-50
Python hosting (Modal/Railway)USD 10-30
Postgres (Supabase free tier usually fits)USD 0-25
TotalUSD 40-115/month

At 50K tickets/month, scale linearly to ~USD 400-1,000/month. If you hit that volume, it's worth starting to cache classifications and capping Sonnet output with max_tokens.

Pitfalls you'll hit

  1. The agent decides "respond" when it should "escalate" — tune the confidence threshold in routing. Start conservative (0.8) and lower only if you see many unnecessary escalations.
  2. Infinite loops when you add retrying nodes — always use recursion_limit and a counter in state.
  3. The checkpointer grows unbounded — add a job that cleans up completed threads after N days.
  4. LangSmith in production without sampling — at high volume, sampling at 10% keeps visibility without blowing the free tier.

Let's talk about your case

If you're considering building an AI agent for your SaaS and want to review architecture before committing 3-4 weeks of a dev, book a 30-minute call at no cost. 30 minutes usually clarifies whether LangGraph is the right tool or whether your case is better solved with a linear script or an n8n workflow.


Read also:

Frequently asked questions

LangGraph vs CrewAI vs building my own orchestrator?

LangGraph for agents with defined flows (classify → decide → act). CrewAI for multiple agents collaborating with roles. Custom code when the flow is simple (3-5 linear steps) or very specific. For SaaS support automation, LangGraph is the sweet spot — state management and human-in-the-loop come built-in.

Why Python and not TypeScript with LangChain.js?

LangGraph has functional parity between Python and JS but the community and plugin ecosystem are more mature in Python. If your SaaS is Next.js, expose the agent as a separate Python service (FastAPI/Modal) and call it from your API. The separation also helps you scale the agent independently.

How much does this agent cost to run in production?

For 5K tickets/month with classification + response: USD 30-60/month in API costs (Claude Haiku classification + Sonnet generation). For 50K tickets: USD 200-500. Hosting the Python process: USD 5-30/month on Modal/Railway/Fly.io. Postgres can be the one you already have.

How do I manage state between agent invocations?

LangGraph's checkpointer (Postgres or SQLite). Each time the graph runs, it persists state in a table with a thread_id. If the process crashes, you can resume from the last checkpoint. Essential for human-in-the-loop where the agent waits for human input for hours or days.

When is it NOT a good idea to use agents?

When a single LLM call solves the problem (simple classification, extraction). When the flow steps are always the same (a linear script is simpler and more debuggable). When you need synchronous response under 500ms (multi-step agents take 2-10 seconds). Agents shine when the flow branches conditionally.

How do I prevent the agent from infinite-looping?

Three safeguards: (1) recursion_limit in graph config (default 25 is usually enough); (2) an iteration counter in state that the agent checks before continuing; (3) total thread timeout (15-30 min max). Log when each fires to detect prompts that break.