Updated May 2026. Reworked to reflect what AgentKit looks like alongside the OpenAI Agents SDK, the Apps SDK, and the broader 2026 agent stack we deploy for clients.

OpenAI AgentKit landed in late 2025 as a visual-builder-plus-runtime for production agents. Six months in, the question we get most often is not “what is AgentKit”. It is “should I use AgentKit, or the Agents SDK, or just keep building agents the way we already do”. Those are three different decisions.

We are Osher Digital, a Brisbane-based AI consultancy that ships agents into production for clients in healthcare, recruitment, and professional services. We have built agents on the Agents SDK, on AgentKit’s drag-and-drop builder, on LangGraph, and on raw API calls behind a FastAPI server. This is the comparison guide we wish we’d had when AgentKit shipped.

For background on the broader pattern, our piece on what an AI agent actually is covers the vocabulary. For an alternative implementation path, see our walkthrough of building an AI agent with Claude, which uses raw SDK code.

What AgentKit actually is in 2026

AgentKit is three things bundled together. A web-based visual builder called Agent Builder where you drag nodes onto a canvas to define an agent’s behaviour. A managed runtime that hosts the agents you build and exposes them via API or a hosted chat surface. And a set of evaluator and observability tools that record traces, score outputs, and let you compare versions.

The visual builder is the most talked-about piece. It lets non-developers compose agents with tools, guardrails, and routing logic in a UI that looks a lot like an n8n workflow editor. Underneath, it generates the same primitives the OpenAI Agents SDK exposes in code: an agent definition, a list of tools, handoff rules between agents, and a set of input and output guardrails.

That is the important detail. AgentKit and the Agents SDK are not separate things. AgentKit is a visual editor and a hosted runtime on top of the same primitives the Agents SDK gives you in Python or TypeScript. If you build something in Agent Builder, you can export it; if you build it in code, you can debug the runs in the same dashboard. The choice is not what you build with, it is what surface you want to build on.

Where AgentKit replaces the old stack

For three jobs, AgentKit is now the obvious answer.

First, prototyping. The visual builder gets you from “I want an agent that does X” to a working prototype in roughly twenty minutes if X is well-bounded. We use it constantly for client discovery sessions. You can sit in a meeting, listen to a process, build an agent that approximates it, and demo it before the meeting ends. Try doing that in code.

Second, replacing the old Assistants API. The Assistants API was always a slightly awkward middle ground: more managed than raw chat completions, less flexible than building your own loop. OpenAI is sunsetting it. If you still have Assistants API code in production, AgentKit or the Agents SDK is your migration target.

Third, agents that need OpenAI’s hosted tools. The web search, file search, and code interpreter tools come baked in and run on OpenAI infrastructure. If you were stitching together SerpAPI plus a Python sandbox plus a vector store yourself, you can collapse a lot of that into AgentKit and stop maintaining the glue.

Where it falls short for production work

AgentKit is genuinely useful. It is also not the right answer for every agent we build, and pretending otherwise has burned us once or twice.

The visual builder hides complexity until it can’t. For a single agent with five or six tools, the canvas is clear. By the time you have three agents handing off to each other, four guardrails, and a branching router, the diagram becomes harder to read than the equivalent code. Worse, version control on a visual workflow is fragile. Diffing two canvas snapshots is awful. We move anything past a certain complexity threshold from Agent Builder into Agents SDK code, where we can review pull requests like normal humans.

Lock-in matters more than the marketing implies. The Agents SDK is open source and runs anywhere. AgentKit’s hosted runtime is OpenAI infrastructure. If you build there, you have committed to OpenAI for both the model and the orchestration layer. We have one client where regulatory constraints meant we could not host the orchestrator outside our own VPC, and AgentKit was a non-starter for that reason alone.

Multi-provider routing is awkward. If your agent should use Claude for one task and GPT-4.1 for another (which we do constantly, because claude-sonnet-4-5 is materially better at structured extraction and gpt-4.1 is faster for simple reasoning), AgentKit fights you. The Agents SDK is more accommodating, and a custom build is the cleanest of all.

AgentKit vs the Agents SDK vs build it yourself

The framing we use with clients is roughly this. Agent Builder is for prototypes, internal tools, and agents that a non-developer needs to maintain. The Agents SDK is for production agents your engineering team will own. A custom build with raw SDK calls and your own orchestration is for agents with unusual requirements: multi-provider, on-prem, weird state machines, or anything where you need to debug the orchestration loop itself.

For the talent marketplace platform we built last year, we used the Agents SDK in TypeScript with about forty lines of glue around the loop and a Postgres-backed memory layer. The visual builder would have made the first day faster and made every day after that slower. For an internal “summarise overnight ticket activity” agent we shipped for a healthcare client’s operations team, Agent Builder was perfect. The agent has three tools, runs once a day, and the operations lead can edit it without filing a developer ticket.

A working example: the kind of agent worth building in AgentKit

The agents that thrive in AgentKit share a shape. They are bounded, they don’t depend on systems OpenAI can’t reach, and the people who own them benefit from being able to edit them without a developer.

Our reference example is a research-and-summarise agent. The user gives it a topic. The agent uses OpenAI’s hosted web search tool to find five recent articles, the file search tool to check whether the company has any internal documents on the same topic, and a custom HTTP tool that posts a structured summary into a Slack channel. Total build time the first time we did this: under an hour in Agent Builder. Total maintenance over six months: a single change when the Slack tool URL moved.

If you want to build something equivalent in code instead, the same pattern in the Agents SDK looks roughly like this:

from agents import Agent, Runner, WebSearchTool, FileSearchTool, function_tool
import httpx

@function_tool
def post_to_slack(channel: str, summary: str) -> str:
    """Post a research summary to a Slack channel."""
    httpx.post(
        "https://hooks.slack.com/services/...",
        json={"channel": channel, "text": summary},
        timeout=10,
    )
    return "posted"

research_agent = Agent(
    name="research_agent",
    model="gpt-4.1",
    instructions=(
        "You research a topic, summarise five recent sources, "
        "check internal docs for context, and post the result to Slack."
    ),
    tools=[
        WebSearchTool(),
        FileSearchTool(vector_store_ids=["vs_internal_docs"]),
        post_to_slack,
    ],
)

result = Runner.run_sync(
    research_agent,
    "Summarise this week's developments in EU AI Act enforcement.",
)
print(result.final_output)

Same primitives, same observability. The only thing you give up by writing it in code is the ability to hand the canvas to a non-developer. If that’s not a constraint you have, the SDK is the better long-term home.

Things that broke for us in production

Three patterns we have hit and fixed.

Tool descriptions matter more than the docs imply. Two agents we built had subtle accuracy regressions when we shortened a tool description from a paragraph to a sentence. The model uses the description to decide when to call the tool, and a vague description is worse than no description. Our rule now: every tool gets a description that includes a one-line purpose, the input shape, and at least one example of when to use it.

Long-running tool calls eat your runtime quotas. AgentKit’s hosted runtime has timeout behaviour that is reasonable for synchronous agents and harsh for anything that calls a tool which takes more than thirty seconds. We had a workflow that called a third-party enrichment API which sometimes took ninety seconds. The agent looked like it was hanging. The fix was to break the long call into a queued background job and have the agent poll for the result. Worth designing for upfront.

Cost runaway is real if you let agents loop without bounds. Set a maximum step count on every agent. We default to twelve. An agent that genuinely needs more than twelve steps almost always has a tool design problem, not a step-count problem.

When AgentKit is not the right choice

Skip it if you need to mix providers. We use Claude for most extraction work because it is materially better at structured outputs over messy real-world documents. AgentKit is OpenAI-only at the model layer.

Skip it if your agent has to run inside your own VPC. Healthcare clients with My Health Records data, regulated finance work under APRA CPS 234, and anyone with a strict data residency requirement that names ap-southeast-2 specifically will need to host the orchestration themselves.

Skip it if your agent is really an integration workflow with one LLM step in the middle. n8n or Make is the right tool there. We cover the boundary between the two in our piece on building AI workflows in n8n. AgentKit shines when the LLM is making most of the decisions; it is overkill when it is just doing one classification call inside a deterministic pipeline.

Cost and pricing notes

AgentKit itself is included in your OpenAI usage. You pay for model tokens, hosted-tool calls (web search, file search, code interpreter each have their own per-call pricing), and any storage you use for vector stores or files. There is no separate runtime subscription on top.

For a typical small-to-medium agent doing a few hundred runs per day, expect monthly costs in the $80 to $400 USD range, which is roughly $120 to $620 AUD at current exchange. The hosted tools are the expensive bit. A web search call is materially more expensive than the chat completion that triggered it. We tend to cache search results aggressively for any agent that runs more than a few times a day.

If your agent volume is high enough that costs become a real line item, multi-provider routing through the Agents SDK or a custom build is worth the extra engineering work. We have moved a couple of clients off pure OpenAI for the routing parts of their pipeline and saved 30 to 60 percent on the cognitive workload portion of their bill.

Getting started without overbuilding

The advice we give every team starting out: build the smallest possible agent that does one useful thing. Not three things. One. Ship it. Use it for two weeks. Then add the next capability.

What you need to begin: an OpenAI account with API access, a payment method on file, and a clear bounded problem. The bounded problem is the part most people get wrong. “An agent that handles customer support” is not bounded. “An agent that drafts a reply to refund-request emails for review by a human” is bounded. Start there.

If you would like a hand picking the right first agent for your team, book a call and we will work through the candidates with you.

Frequently Asked Questions

Is OpenAI AgentKit open source?

Parts of it are. The Agents SDK underneath AgentKit is open source and runs on your own infrastructure. The visual Agent Builder and the hosted runtime are OpenAI-managed services and are not open source. If “open source AgentKit” is what you searched for, what you actually want is the Agents SDK on GitHub under the openai organisation.

How is AgentKit different from the Assistants API?

The Assistants API was OpenAI’s earlier managed agent surface. AgentKit replaces it with a more flexible runtime, a visual builder, and the Agents SDK underneath. Anything you built on Assistants API should migrate. The migration path is documented in the Agents SDK docs and is usually a few days of work for a single agent.

Do I need to be a developer to use AgentKit?

For Agent Builder, no. The visual canvas is genuinely usable by people who have never written code, provided they understand the agent’s task well. For anything beyond a small prototype, you will eventually want a developer to wire up custom tools, set up evaluations, and own the deployment lifecycle. Treat Agent Builder as a tool that lowers the floor, not one that removes the ceiling.

What does running an AgentKit-powered agent cost?

You pay for model tokens, hosted tool calls, and storage. For a small agent running a few hundred times a day, that lands in the $80 to $400 USD per month range (roughly $120 to $620 AUD). High-volume agents and any agent that uses web search heavily will run higher. Set a step limit and a daily spend cap before you go to production.

Should I use AgentKit instead of LangChain or LangGraph?

Different problems. LangChain and LangGraph are open-source orchestration frameworks that work with any provider. AgentKit is OpenAI-specific. If you are committed to the OpenAI stack, AgentKit is simpler. If you want provider portability, want to host the orchestrator yourself, or have complex graph-shaped workflows, LangGraph is the more honest choice. We use both, on different projects.

Can AgentKit use models other than OpenAI’s?

The hosted runtime is OpenAI-only at the model layer. The underlying Agents SDK has community-maintained adapters for other providers, but support is uneven and we would not bet a production system on it. If you need multi-provider routing, build on the SDK directly or use a different orchestration layer.

Does AgentKit work for organisations with Australian data residency requirements?

OpenAI offers data residency in selected regions but not currently in Sydney for the AgentKit runtime. If you have a hard requirement to keep data in ap-southeast-2 or similar, host the orchestration yourself using the Agents SDK and call OpenAI models from there. For anything genuinely sensitive (health data under My Health Records, APRA-regulated workloads), this is the route we recommend.

When is AgentKit the right tool and when is it overkill?

Right tool: agents where the LLM is making most of the decisions, where the task is bounded, and where the people maintaining it benefit from a visual surface. Overkill: workflows that are 90 percent integration plumbing with one classification step in the middle. For those, an integration platform like n8n with an LLM call inside one node is simpler, cheaper, and easier to debug.

Where to from here

AgentKit is a real improvement on the older Assistants API and a credible competitor to the open-source agent frameworks for OpenAI-committed teams. It is not the right answer for every agent. Match the tool to the shape of the problem and the team that has to maintain it.

If you want help working out which agents are worth building first and which ones are worth not building at all, get in touch. We have built enough of these to know which ones earn their keep.

OpenAI AgentKit Explained: What It Replaces and What It Doesn’t