How to Build an AI Agent with Claude AI: A Practical Guide for Australian Businesses
Learn the step-by-step process of creating a powerful AI agent using Claude.ai. Discover key concepts, best practices, and practical applications for your AI projects.
Updated February 2026. This article has been reviewed and updated to reflect the latest information.
AI agents are no longer experimental. They are running in production right now, processing documents, handling customer queries, and automating workflows inside real businesses. We know because we have built them.
At Osher Digital, we are a Brisbane-based AI consultancy that has deployed Claude-based agents across healthcare, recruitment, finance, and professional services. This guide comes from that hands-on experience. It covers the architecture decisions, code patterns, and hard-won lessons that separate a demo from a production system.
This guide is aimed at developers evaluating Claude for a project and technical decision-makers weighing options for AI agent development. It should give you a practical foundation to build on.
What Are AI Agents and Why They Matter
An AI agent is software that can perceive its environment, reason about what to do, and take autonomous action to achieve a goal. That sounds academic, so here is the practical version: an agent is an AI system that can actually do things, rather than only answering questions.
A chatbot tells you about your return policy. An agent reads the customer’s order history, checks whether the item is eligible for return, initiates the refund, updates the CRM, and sends the confirmation email. It reasons through the problem and takes action across multiple systems.
For Australian businesses, agents matter because they automate the work that sits between your existing systems: judgment calls, data entry, routing decisions, document review. Work that is too complex for simple rule-based automation but too repetitive for your best people to be doing all day.
We have seen agents cut application processing from 30 minutes to under 30 seconds. We have seen document classification agents give healthcare staff hours back every day that were previously spent on manual data entry. These are measured results from custom AI development projects we have delivered for Australian organisations, not hypotheticals.
Why Claude Is Well-Suited for Agent Development
There are several capable foundation models available for building agents. We have built agents on Claude, GPT-4, and open-source models. We keep coming back to Claude for agent work, and here is why.
Long Context Window
Claude supports a 200,000 token context window. For agent development, this matters more than you might think. Your agent can hold an entire document, a full conversation history, and a set of tool definitions in context at once, without needing complex chunking strategies. When we build document processing agents, being able to feed in a 50-page PDF and have the model reason about it in a single pass simplifies the architecture considerably.
Native Tool Use
Claude’s tool use (function calling) capability is solid and well-integrated. The model is trained to understand when to call a tool, what arguments to pass, and how to interpret the result. This is the foundation of agentic behaviour, and Claude handles it reliably enough that we trust it for workloads of thousands of requests a day.
Extended Thinking
Claude’s extended thinking feature lets the model reason through complex, multi-step problems before responding. For agents that need to plan a sequence of actions or handle ambiguous inputs, extended thinking produces noticeably better results than a standard single-pass response.
Safety and Controllability
Claude is built to follow instructions precisely and to refuse actions that fall outside its defined scope. In an agent context, this means you can set clear boundaries for what the agent should and should not do, and expect it to respect them. When an agent has access to your CRM, your database, and your email system, that kind of controllability matters a lot.
Claude’s Tool Use: The Foundation of Agentic Behaviour
Tool use is what turns a language model into an agent. Instead of only generating text, the model can call functions that interact with external systems. Here is how it works in Claude.
You define tools as JSON schemas that describe what each tool does, what parameters it accepts, and what it returns. When you send a message to Claude along with these tool definitions, the model decides whether to call a tool, selects the right one, and provides the arguments.
import anthropic
client = anthropic.Anthropic()
tools = [
{
"name": "lookup_customer",
"description": "Look up a customer record by email address. Returns customer details including name, account status, and recent orders.",
"input_schema": {
"type": "object",
"properties": {
"email": {
"type": "string",
"description": "The customer's email address"
}
},
"required": ["email"]
}
},
{
"name": "create_support_ticket",
"description": "Create a new support ticket in the helpdesk system.",
"input_schema": {
"type": "object",
"properties": {
"customer_email": {
"type": "string",
"description": "The customer's email address"
},
"subject": {
"type": "string",
"description": "Ticket subject line"
},
"priority": {
"type": "string",
"enum": ["low", "medium", "high", "urgent"],
"description": "Ticket priority level"
},
"description": {
"type": "string",
"description": "Detailed description of the issue"
}
},
"required": ["customer_email", "subject", "priority", "description"]
}
}
]
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
tools=tools,
messages=[
{
"role": "user",
"content": "Customer [email protected] is reporting that her last order arrived damaged. She's quite upset. Can you look up her account and create a ticket?"
}
]
)
Claude will respond with a tool_use content block, selecting lookup_customer first and providing the email as an argument. You execute the function, return the result, and Claude continues reasoning, likely calling create_support_ticket next with appropriate details including a high priority given the customer’s sentiment.
The point is that Claude decides the sequence. You define the tools. The model figures out the plan.
Architecture Patterns for Claude-Based Agents
There are several established patterns for structuring AI agents. The right choice depends on your use case complexity and reliability requirements.
ReAct (Reason + Act)
The ReAct pattern is the workhorse of agent architectures. The agent operates in a loop: observe the current state, reason about what to do next, take an action (call a tool), observe the result, and repeat until the task is complete.
def react_agent(task: str, tools: list, max_steps: int = 10):
messages = [{"role": "user", "content": task}]
for step in range(max_steps):
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
tools=tools,
messages=messages,
)
# Check if the agent wants to use a tool
if response.stop_reason == "tool_use":
# Extract tool call details
tool_block = next(
b for b in response.content if b.type == "tool_use"
)
# Execute the tool
tool_result = execute_tool(tool_block.name, tool_block.input)
# Feed the result back into the conversation
messages.append({"role": "assistant", "content": response.content})
messages.append({
"role": "user",
"content": [
{
"type": "tool_result",
"tool_use_id": tool_block.id,
"content": str(tool_result),
}
],
})
else:
# Agent has finished; return the final response
return response.content
return "Agent reached maximum steps without completing the task."
This is the pattern we use for most single-purpose agents. It is simple, it works well, and it is easy to debug.
Plan-and-Execute
For more complex tasks, we use a plan-and-execute pattern. The agent first creates an explicit plan (a list of steps), then executes each step in sequence. This works well when the task is well-defined and the agent needs to coordinate multiple actions.
The advantage is visibility. You can log the plan, let a human review it before execution, and retry individual steps if something fails. We use this pattern for document processing pipelines where the agent needs to extract, validate, and transform data before loading it into multiple systems.
Multi-Agent Orchestration
For complex workflows, we deploy multiple specialised agents that collaborate. A router agent receives the incoming request and delegates to the appropriate specialist. A document extraction agent handles the parsing. A validation agent checks results against business rules, and a human-in-the-loop agent manages escalations.
Each agent has a focused role, a limited tool set, and clear boundaries. This separation of concerns makes the system easier to test and debug. It also means you can use different models for different agents: Claude Opus for complex reasoning, Sonnet for high-volume processing, Haiku for simple routing.
Practical Example: Building a Document Processing Agent
Document processing is one of the most common agent use cases we build for Australian businesses. Here is a simplified version of the architecture we deploy.
import anthropic
import json
client = anthropic.Anthropic()
DOCUMENT_TOOLS = [
{
"name": "extract_text_from_pdf",
"description": "Extract raw text content from a PDF document.",
"input_schema": {
"type": "object",
"properties": {
"file_path": {"type": "string", "description": "Path to the PDF file"}
},
"required": ["file_path"]
}
},
{
"name": "classify_document",
"description": "Classify the document type based on its content. Returns one of: invoice, contract, medical_record, resume, correspondence, other.",
"input_schema": {
"type": "object",
"properties": {
"text_content": {"type": "string", "description": "The extracted text from the document"}
},
"required": ["text_content"]
}
},
{
"name": "extract_structured_data",
"description": "Extract structured fields from a document based on its type.",
"input_schema": {
"type": "object",
"properties": {
"text_content": {"type": "string"},
"document_type": {"type": "string"},
"required_fields": {
"type": "array",
"items": {"type": "string"},
"description": "List of fields to extract"
}
},
"required": ["text_content", "document_type", "required_fields"]
}
},
{
"name": "save_to_database",
"description": "Save the extracted structured data to the business database.",
"input_schema": {
"type": "object",
"properties": {
"document_type": {"type": "string"},
"extracted_data": {"type": "object"},
"confidence_score": {"type": "number"}
},
"required": ["document_type", "extracted_data", "confidence_score"]
}
}
]
SYSTEM_PROMPT = """You are a document processing agent for an Australian healthcare provider.
Your job is to process incoming documents by:
1. Extracting text from the PDF
2. Classifying the document type
3. Extracting the relevant structured data based on the document type
4. Saving the results to the database
For medical records, extract: patient_name, date_of_birth, medicare_number, diagnosis, treatment_plan.
For invoices, extract: vendor_name, abn, invoice_number, amount, due_date.
If confidence is below 0.85, flag the document for human review instead of saving directly.
Always use Australian date format (DD/MM/YYYY) when processing dates."""
This agent handles the full pipeline: read, classify, extract, store. The system prompt defines the business rules, including the confidence threshold that triggers human review. In production, we add error handling, retries, logging, and audit trails, but the core pattern stays the same.
We deployed a similar architecture for a healthcare client where the agent classifies incoming patient documents and routes them into the correct workflow. Staff who were spending hours on manual sorting now spend minutes reviewing the agent’s flagged exceptions.
Practical Example: Building a Customer Support Agent
Customer support agents are the other use case we deploy most frequently. The architecture is similar but with a focus on conversational flow and system integration.
What matters is giving the agent access to the right tools: customer lookup, order history, knowledge base search, ticket creation, and escalation. The system prompt defines the agent’s personality, scope, and escalation rules.
SUPPORT_SYSTEM_PROMPT = """You are a customer support agent for [Company Name].
You have access to the customer database, order system, and knowledge base.
Rules:
- Always verify the customer's identity before accessing account details.
- You can process refunds up to $200 AUD without escalation.
- Refunds over $200 AUD must be escalated to a human agent.
- Never share one customer's data with another customer.
- If you are unsure about a policy, search the knowledge base before guessing.
- Use Australian English and be professional but warm.
- If the customer is frustrated, acknowledge their frustration before problem-solving."""
The important detail is the guardrails. The agent has a clear dollar limit for autonomous action. It knows when to escalate. It has a defined verification process. These boundaries are what make an agent production-ready, not just a demo.
Integrating Claude Agents with Business Systems
An agent is only as useful as the systems it can interact with. In practice, this means integrating with CRMs (HubSpot, Salesforce), databases (PostgreSQL, Supabase), communication tools (Slack, email), and internal APIs.
We handle integrations through a tool layer that abstracts the external system behind a clean interface. The agent calls lookup_customer, and the tool layer handles authentication, error handling, rate limiting, and data transformation for the specific CRM underneath. This means you can swap out a CRM without retraining or reconfiguring the agent itself.
Model Context Protocol (MCP) has simplified this considerably in 2026. MCP provides a standardised way for agents to connect to external data sources and tools. Instead of writing custom integration code for every system, you can use MCP servers that expose databases, APIs, and business applications through a consistent interface. Claude has native MCP support, so your agent can discover and use tools dynamically.
Using Claude Agents with n8n
For many of our clients, agents do not operate in isolation. They are triggered by events in business workflows. A new document arrives in a shared folder. A customer submits a form. An invoice is received via email.
We use n8n to orchestrate these triggers and connect them to Claude-based agents. n8n handles the workflow layer (watching for events, routing data, managing retries) while Claude handles the intelligence layer (reasoning, extraction, decision-making).
This separation works well because n8n is good at connecting systems and managing workflow state, while Claude handles the parts that require understanding and judgment. You get the reliability of a workflow engine paired with the intelligence of a reasoning model.
Error Handling and Fallback Strategies
Agents fail. APIs time out. Models hallucinate. External systems return unexpected data. Building a reliable agent means planning for failure at every step.
Our standard error handling pattern includes:
- Retry with exponential backoff for transient API failures. Anthropic’s API occasionally returns 529 (overloaded) errors. A simple retry with backoff handles this gracefully.
- Fallback models. If your primary model (Opus) is unavailable or hitting rate limits, fall back to Sonnet. The task may be handled with slightly less sophistication, but it does not stall.
- Confidence thresholds. The agent should assess its own confidence and escalate to a human when it is unsure. This is especially important for high-stakes decisions like financial processing or medical document classification.
- Structured output validation. When the agent extracts data, validate it against a schema before writing it to your database. Catch malformed outputs before they cause downstream problems.
- Circuit breakers. If an external system is consistently failing, stop calling it and alert your team rather than burning through API credits on retries that will not succeed.
import time
from anthropic import APIError, RateLimitError
def call_claude_with_retry(messages, tools, max_retries=3):
for attempt in range(max_retries):
try:
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
tools=tools,
messages=messages,
)
return response
except RateLimitError:
wait_time = 2 ** attempt
time.sleep(wait_time)
except APIError as e:
if attempt == max_retries - 1:
raise
time.sleep(2 ** attempt)
raise Exception("Max retries exceeded calling Claude API")
Cost Management
Claude API costs can surprise you if you are not paying attention. Here is how we keep costs under control for production agents.
Model selection matters. Not every task needs Opus. We use a tiered approach: Haiku for routing and classification, Sonnet for most agent tasks, Opus only when the reasoning genuinely demands it. The cost difference between Haiku and Opus is large, and for many tasks the output quality is comparable.
Prompt caching reduces costs by caching the system prompt and tool definitions across requests. If your agent handles hundreds of requests per day with the same system prompt, caching can meaningfully cut your input token costs. Anthropic’s prompt caching is straightforward to set up and the savings add up fast.
Token management. Be deliberate about what you put in the context window. Summarise long conversation histories rather than including every message verbatim. Trim tool results to the relevant fields rather than passing back entire API responses. Every unnecessary token costs money at scale.
Batching. For non-time-sensitive workloads like overnight document processing, use Anthropic’s message batches API. Batch processing offers a 50% discount on token costs compared to real-time requests.
Testing and Evaluation
Testing agents is harder than testing traditional software because outputs are non-deterministic. Here is the approach we use.
Unit test your tools. The tool layer (the functions your agent calls) should be tested like any other code. Mock the external systems and verify that your tool functions handle edge cases correctly.
Evaluation datasets. Build a set of representative inputs with expected outputs. Run your agent against these regularly and measure accuracy, tool selection correctness, and task completion rate. We track these metrics over time to catch regressions when we update prompts or switch models.
Human review loops. For the first few weeks of any production deployment, have a human review a sample of the agent’s outputs. This catches issues that automated tests miss and builds confidence in the system before you fully trust it.
Adversarial testing. Deliberately try to break the agent. Feed it ambiguous inputs, conflicting instructions, and edge cases. Verify that it fails gracefully and escalates appropriately rather than producing confident but wrong outputs.
Production Deployment Considerations
Moving an agent from a Jupyter notebook to production involves a few things that are easy to overlook.
Logging and observability. Log every agent interaction: the full message history, tool calls, tool results, and final output. When something goes wrong in production (and it will), you need to be able to reconstruct exactly what happened.
Rate limiting. Anthropic enforces rate limits on their API. Design your system to handle rate limits gracefully, especially if you have multiple agents or workflows calling the API concurrently. A queue-based architecture works well here.
Latency budgets. An agentic loop that makes four tool calls in sequence will be noticeably slower than a single API call. Set expectations with stakeholders and design your UX around this. For customer-facing agents, streaming the response helps manage perceived wait times.
Version control for prompts. Treat your system prompts and tool definitions like code. Version them, review changes, and deploy them through your standard CI/CD pipeline. A small prompt change can completely alter agent behaviour.
Australian Data Processing Considerations
If you are building agents for Australian businesses, data handling is something you need to get right.
Anthropic’s API processes data on servers in the United States. For many use cases this is fine, but if you are handling sensitive personal information, health data, or government data, you need to consider the implications under the Privacy Act 1988, the Australian Privacy Principles, and any industry-specific regulations like the My Health Records Act.
Practical approaches we use:
- Data minimisation. Only send the data the agent needs to do its job. If you are classifying a document, you may not need to send the entire document to the API. Extract and send only the relevant sections.
- Anonymisation before API calls. Strip personally identifiable information before sending data to Claude. Have the agent work with de-identified data and re-associate it with the original records on your side.
- Self-hosted processing for sensitive data. For clients with strict data sovereignty requirements, we build hybrid architectures where sensitive data is processed locally (using open-source models or on-premise infrastructure) and Claude handles only the non-sensitive reasoning tasks.
- Data processing agreements. Anthropic offers a Data Processing Addendum (DPA) for enterprise customers. If you are processing personal information, make sure you have this in place.
These are not theoretical concerns. We work with healthcare providers and financial services firms in Australia where getting data handling wrong has real regulatory consequences.
When to Build Custom vs Use Pre-Built Frameworks
The AI agent ecosystem has matured quickly. There are now several frameworks (LangChain, LangGraph, CrewAI, Anthropic’s own agent SDK) that provide scaffolding for building agents. Should you use them?
Use a framework when:
- You want to move fast and the framework’s architecture matches your use case.
- You need built-in features like conversation memory, agent orchestration, or tool management that would take significant effort to build from scratch.
- Your team is new to agent development and the framework provides useful guardrails and patterns.
Build custom when:
- You need precise control over the agent loop, error handling, and retry logic.
- Your use case has specific performance or cost requirements that a general framework adds unnecessary overhead to.
- You are building a production system that will run for years and you want to minimise dependencies on fast-moving open-source projects.
In our experience, most production agents end up somewhere in between. We often start with a framework for prototyping, then progressively replace framework components with custom code as we optimise for the specific use case. The core agentic loop (the ReAct pattern above) is simple enough that building it from scratch is often less work than learning and working around a framework’s opinions.
Getting Started
If you are ready to build your first Claude-based agent, here is where to start:
- Define a narrow, high-value use case. Do not try to build an agent that does everything. Pick one repetitive, time-consuming task and automate it well.
- Map the tools. What systems does the agent need to interact with? What data does it need to read and write? Define your tool set before you write any agent code.
- Start with Sonnet. It is the best balance of capability and cost for most agent tasks. Only move to Opus if Sonnet genuinely cannot handle the reasoning complexity.
- Build the evaluation set first. Know how you will measure success before you start building. This saves enormous time later.
- Plan for failure. Every tool call can fail. Every API can time out. Build error handling from day one.
If you want help building production-grade AI agents for your business, book a call with our team. We have deployed Claude-based agents across healthcare, recruitment, property, and professional services in Australia, and we can help you identify the right use case and architecture for your needs.
Frequently Asked Questions
How much does it cost to run a Claude-based AI agent in production?
Costs vary a lot depending on the model, volume, and complexity. As a rough guide, a document processing agent handling 500 documents per day using Claude Sonnet with prompt caching typically costs between $50 and $200 AUD per month in API fees. High-volume customer support agents can range from $200 to $1,000 AUD per month depending on conversation length and tool usage. We help clients bring costs down through model selection, caching, and batching. Talk to our team about estimating costs for your specific use case.
Can Claude AI agents handle Australian-specific requirements like Medicare numbers and ABNs?
Yes. Claude handles Australian data formats well, including Medicare numbers, ABNs, ACNs, Australian phone numbers, addresses, and date formats (DD/MM/YYYY). We build validation layers into our agents that verify these formats against known patterns and, where applicable, check digits. For healthcare and financial services clients, we also implement data handling practices that align with the Privacy Act and relevant industry regulations.
How long does it take to build and deploy a Claude-based AI agent?
A focused, single-purpose agent (for example, a document classification agent) can be built and deployed in two to four weeks. More complex multi-agent systems with multiple integrations typically take six to twelve weeks. The timeline depends on the complexity of the business rules, the number of systems being integrated, and how much testing and validation is required. Our AI agent development process includes discovery, prototyping, evaluation, and production hardening.
Is it safe to send business data to the Claude API?
Anthropic does not train on data sent through the API by default. Data is encrypted in transit and at rest, and Anthropic provides a Data Processing Addendum for enterprise customers. That said, API data is processed on servers in the United States, which matters if you have data sovereignty requirements. For sensitive data, we use anonymisation techniques and hybrid architectures that keep sensitive information on your infrastructure while using Claude for the reasoning tasks. We help Australian businesses work through these considerations as part of our AI consulting services.
What is the difference between an AI agent and a chatbot?
A chatbot responds to messages. An AI agent takes action. A chatbot might tell a customer about the return policy. An agent reads the customer’s order, checks eligibility, processes the refund, updates the database, and sends the confirmation, all without human intervention. Agents use tool calling to interact with external systems, reasoning to plan multi-step workflows, and feedback loops to recover from errors. They are a different category of software from chatbots, and they are what we focus on at Osher Digital.
Can Claude agents integrate with our existing business systems?
Yes. We build Claude agents that integrate with CRMs (HubSpot, Salesforce), databases (PostgreSQL, Supabase, MySQL), communication platforms (Slack, Microsoft Teams, email), workflow tools (n8n, Make), and custom internal APIs. MCP has standardised a lot of this integration work, and we have built integration layers for many different systems across Australian businesses.
Building AI agents that actually work in production takes more than good prompts. You need the right architecture, solid error handling, and a thorough understanding of the business process you are automating. If you want to explore what a Claude-based agent could do for your organisation, get in touch with our team. We are based in Brisbane and work with businesses across Australia.
Jump to a section
Ready to streamline your operations?
Get in touch for a free consultation to see how we can streamline your operations and increase your productivity.