LangChain Agents: The Gotchas Nobody Mentions

LangChain agents explained for production: how the loop works, building one with LangGraph, the real costs, and the gotchas that break demos at scale.

LangChain Agents: The Gotchas Nobody Mentions

Updated June 2026. Rewritten for the current LangChain, where the old AgentExecutor has given way to LangGraph, with working code and the production problems the tutorials skip.

LangChain agents are language models wired into a loop that lets them choose and call tools instead of only producing text. Give the model a goal and a set of tools, and it decides which tool to use, reads the result, and decides what to do next, until the job is done. That is the whole idea, and it is genuinely useful. It is also where a lot of teams lose weeks, because the gap between a working demo and a reliable production agent is wider than it looks.

We are Osher Digital, a Brisbane AI and automation consultancy, and we have shipped agent systems across recruitment, healthcare, finance, and professional services. We have built on LangChain, on the provider SDKs directly, and on LangGraph. This piece is the honest version: what LangChain agents are, how to build one with the current libraries, and the specific things that break once real traffic hits.

If you want the broader framework picture first, read our guide to LangChain in 2026. For the concept of agents independent of any framework, see what an AI agent is, and for the fully autonomous end, autonomous AI agents. This article is specifically about the agent abstraction inside LangChain.


What LangChain Agents Are

A plain language model answers questions. A LangChain agent answers questions and acts on the answers. The difference is the loop. You hand the model a set of tools, each one a function with a name, a description, and typed inputs. The model reads the goal, picks a tool, you run it, you feed the result back, and the model decides whether it is done or needs another step.

Three parts make it work, and they are worth naming precisely because people blame the wrong one when things go wrong. There is the model, which does the reasoning and the choosing. There are the tools, which are the only way the agent touches the outside world. And there is the runtime that drives the loop, runs the chosen tool, and passes the result back. In current LangChain that runtime is LangGraph; in older code it was the AgentExecutor, and the distinction matters more than it sounds, as we will get to.

The reason the abstraction is popular is that it is model-agnostic. The same agent code can run on Claude Sonnet 4.5, GPT-4.1, or an open model, and you can swap between them without rewriting the loop. That flexibility is real and it is the strongest argument for using LangChain rather than a single provider’s SDK.


How a LangChain Agent Works: the ReAct Loop

Under most LangChain agents is a pattern called ReAct, short for Reason and Act. The model alternates between a thought (what should I do next?) and an action (call this tool with these arguments), reading each result before the next thought. It is not magic. It is a structured loop that turns one big goal into a sequence of small, checkable steps.

Picture an agent asked to find a company’s latest filing and summarise the risk section. Thought: I need the filing, so I should search. Action: web search. Observation: a results list. Thought: the first result is the filing, I should fetch it. Action: fetch the document. Observation: the text. Thought: I have what I need, summarise the risk section. The loop ends when the model decides the goal is met. Each step is visible, which is exactly why ReAct is the pattern you can actually debug.

The catch hiding in that tidy description is that nothing forces the loop to terminate. An agent that keeps deciding it needs one more search will keep searching until you stop it. That is the first production problem, and we will come back to it.


Building a LangChain Agent in 2026

Here is a minimal but real LangChain agent using the current libraries. It uses create_react_agent from LangGraph’s prebuilt module, which is the supported way to build one today.

from langchain_anthropic import ChatAnthropic
from langchain_core.tools import tool
from langgraph.prebuilt import create_react_agent

@tool
def get_stock_price(ticker: str) -> str:
    """Return the latest stock price for an ASX ticker, e.g. 'CBA'."""
    # call your real financial data API here
    return fetch_price(ticker)

@tool
def search_news(query: str) -> str:
    """Search recent news articles and return the top headlines."""
    return tavily_search(query, max_results=5)

model = ChatAnthropic(model="claude-sonnet-4-5", temperature=0)

agent = create_react_agent(
    model=model,
    tools=[get_stock_price, search_news],
    prompt="You are a research assistant. Use tools to find facts. "
           "Never invent prices or headlines. If a tool fails, say so.",
)

result = agent.invoke(
    {"messages": [{"role": "user", "content": "Latest on Commonwealth Bank?"}]},
    config={"recursion_limit": 8},
)
print(result["messages"][-1].content)

A few things in that snippet are deliberate, not decoration. The tool descriptions are written for the model to read, because tool selection accuracy lives and dies on them. The prompt explicitly forbids inventing data, because a model with tools will still happily hallucinate if you let it. And recursion_limit is set, because without it the loop has no hard ceiling. That single config value is the difference between an agent that stops and one that runs your bill up overnight.

The build process we follow is short and boring on purpose:

  1. Define one narrow job. A focused agent beats a general one every time. “Research a company” is fine; “be a business assistant” is not.
  2. Write the tools and their descriptions carefully. The description is the interface the model reasons over. Vague descriptions cause wrong tool choices.
  3. Give it the smallest toolset that works. Every extra tool is another chance to pick the wrong one.
  4. Set a recursion limit and a timeout. Before anything else, cap the loop.
  5. Build an evaluation set. A dozen real inputs with known good outputs, run on every change, so you catch regressions when you tweak the prompt or switch models.

The official LangChain documentation and the LangGraph docs are kept current and are worth reading before you commit to a structure.


AgentExecutor Is Out, LangGraph Is In

If you are following an older tutorial, it almost certainly uses AgentExecutor and initialize_agent. Those are the legacy path now. The current LangChain steers you to LangGraph for anything beyond a toy, and the reason is sound: the old executor hid the loop inside a black box that was painful to customise, while LangGraph models the agent as an explicit graph of nodes and edges you can see, branch, and interrupt.

In practice this matters the first time you need a human-in-the-loop checkpoint, a custom branch when a tool fails, or persistent memory across turns. With the old executor those were fights. With LangGraph they are first-class features. If you are starting fresh, start on LangGraph. If you are maintaining an AgentExecutor system that works, you do not need to panic-migrate, but do not build anything new on it.


The LangChain Agent Gotchas Nobody Mentions

The demos always work. Production is where the character of these systems shows. The problems we hit most, in rough order of how often they bite:

  • Loops that will not end. An agent fixates on a tool and calls it again and again, or oscillates between two. Without a recursion limit it runs until something external kills it. We cap every agent and alert when one hits the cap, because hitting it usually means a deeper prompt or tool problem.
  • Wrong tool selection. Give an agent two tools with overlapping descriptions and it will sometimes grab the wrong one. The fix is almost never a smarter model; it is sharper, more distinct tool descriptions. We have measurably improved a struggling agent just by rewriting four tool descriptions.
  • Giving up at the first failure. A tool returns an error and the agent stops rather than trying an alternative. You have to design the recovery path explicitly; the model will not invent resilience on its own.
  • Confident hallucination with tools available. A model with a search tool will still occasionally answer from memory instead of searching. The prompt has to forbid it, and your evals have to test for it.

The single longest debugging session we have lost to an agent was none of the glamorous failures. It was a tool that returned an empty string on a particular input, which the model interpreted as “no data, try a different approach,” sending it down a branch that looked like a reasoning failure but was actually a silent tool bug. The lesson, learned the hard way: log every tool call and its raw result, because most “the model is being dumb” incidents are really “the tool returned something the model could not use.”


Cost, Latency, and the Hidden Bill

Every step of the loop is a model call, and every model call is money and time. A simple question that a single API call would answer in a second might take a five-step agent eight seconds and five times the tokens. That trade is worth it when the task genuinely needs tool use and multi-step reasoning. It is pure waste when a fixed script would have done the job.

Concrete numbers from our own work: a research agent doing three to five tool calls per run on Claude Sonnet 4.5 lands around 0.03 to 0.08 AUD per run depending on how much context it drags through the loop. That sounds trivial until it runs ten thousand times a day, at which point it is a real line item. We have caught a single misconfigured agent quietly running up a four-figure monthly bill because a loop limit was missing and a subset of inputs sent it spinning. Set spend alerts on day one, not after the invoice.

The levers that keep cost sane: a recursion limit, trimming tool results to the fields the model needs rather than passing whole API responses, and using a smaller model for simple routing while reserving the larger one for genuine reasoning. None of these are exotic. All of them get skipped in the rush to ship.


When Not to Use a LangChain Agent

The most useful thing we can tell you about agents is when to skip them. If the task follows a fixed sequence of steps, you do not need an agent; you need a workflow, and a deterministic one will be faster, cheaper, and far easier to debug. Building an agent to do a job a plain script could do is a common and expensive mistake.

Skip the agent when the steps are known in advance, when latency or cost per run is tight, or when a wrong action carries real consequences and you cannot put a human in the loop. Reserve agents for genuinely open-ended tasks where the right sequence of actions depends on what the agent discovers along the way. That is the narrow band where the loop earns its overhead. For the structured-workflow alternative, see our piece on building agents in Python, which covers when a simple loop beats a framework.


LangChain Agents vs Calling the SDK Directly

A fair question once you have built a few: do you even need LangChain? The Anthropic and OpenAI SDKs both support tool use natively, and the core ReAct loop is about thirty lines of code. For a single-provider, single-purpose agent, writing that loop yourself gives you full control and one fewer dependency on a fast-moving library.

LangChain and LangGraph earn their place when you want provider portability, when you need the orchestration features (memory, human-in-the-loop, multi-agent graphs) without building them, or when a team benefits from a shared, documented structure. Our usual path is to prototype on LangGraph and, if the use case turns out to be narrow and stable, sometimes replace it with a hand-written loop in production. There is no universally right answer, only the one that fits how much you value control versus convenience. We help clients make exactly that call in our AI agent development work, and we are happy to talk it through; you can book a call to do that.


A Checklist Before You Ship a LangChain Agent

Before a LangChain agent goes anywhere near real users or real money, run it against this. Each item maps to a failure we have actually cleaned up after.

  1. Recursion limit and timeout set. The loop has a hard ceiling and you alert when it is hit.
  2. Tool descriptions reviewed. Distinct, specific, written for the model to read, with no overlap that invites a wrong pick.
  3. Failure paths defined. Every tool can error; the agent has an explicit recovery branch rather than giving up.
  4. Full logging on. Every tool call and its raw result is logged with an identifier you can trace.
  5. Evaluation set and spend alert in place. A dozen real cases run on every change, and a billing alarm that fires before the invoice does.

Frequently Asked Questions

What are LangChain agents?

LangChain agents are language models placed inside a loop that lets them choose and call tools rather than only producing text. You give the model a goal and a set of tools; it picks a tool, reads the result, and decides the next step until the task is done. The model does the reasoning, the tools touch the outside world, and a runtime drives the loop.

What is the difference between a LangChain agent and a chain?

A chain runs a fixed sequence of steps in a set order, like a recipe. An agent decides the steps dynamically based on what it learns from each tool result, like a cook improvising toward a goal. Use a chain when the steps are known in advance; use an agent only when the right sequence genuinely depends on what it discovers.

What is the agent executor in LangChain?

The AgentExecutor was the older runtime that drove the agent loop: it took the model’s chosen action, ran the tool, fed the result back, and repeated. It is now the legacy path. Current LangChain uses LangGraph, which models the agent as an explicit, customisable graph and makes features like human-in-the-loop and persistent memory far easier.

Do LangChain agents work with any language model?

Largely yes; LangChain is model-agnostic, so the same agent code can run on Claude Sonnet 4.5, GPT-4.1, or an open model with a one-line change. Reliability of tool selection varies between models, so it is worth running your evaluation set against a couple of options rather than assuming they behave identically.

How much do LangChain agents cost to run?

Each step in the loop is a model call, so cost scales with the number of steps. A research agent doing three to five tool calls per run on Claude Sonnet 4.5 typically costs around 0.03 to 0.08 AUD per run. That is minor until volume is high, where a missing loop limit can quietly produce a four-figure monthly bill, so set spend alerts and a recursion cap from the start.

Why does my LangChain agent get stuck in a loop?

Usually because nothing forces the loop to end and the model keeps deciding it needs one more step. Set a recursion limit so the loop has a hard ceiling, and treat hitting that limit as a signal of a deeper problem, often an unclear prompt or a tool returning results the model cannot act on. Logging every tool call and its raw result is how you find the real cause.

Do I need LangChain to build an agent?

No. The Anthropic and OpenAI SDKs support tool use natively, and the core ReAct loop is around thirty lines of code. LangChain and LangGraph are worth it for provider portability and built-in orchestration like memory and human-in-the-loop. For a narrow, single-provider agent, a hand-written loop is often simpler and gives you more control.

Are LangChain agents ready for production?

Yes, with discipline. The framework is production-capable, but reliability comes from your engineering: recursion limits, sharp tool descriptions, explicit failure recovery, thorough logging, and an evaluation set you run on every change. The teams that struggle are usually the ones that shipped the demo and skipped that work.


LangChain agents are a sharp tool for genuinely open-ended tasks and an expensive way to do what a script could. The framework is not the hard part; the loop limits, tool descriptions, logging, and evals are. Get those right and an agent is reliable. Skip them and you have a demo that embarrasses you in production. If you want a second pair of eyes on an agent you are building, or help deciding whether you need one at all, get in touch with our team.

Ready to streamline your operations?

Get in touch for a free consultation to see how we can streamline your operations and increase your productivity.