Build AI Workflows in n8n: Patterns That Work in Production
Six AI workflow patterns we run in n8n every day: classify and route, extract and validate, conversational agents, RAG, scheduled research, human-in-the-loop. Real configs, real costs.
Updated May 2026. Rewritten around the AI Agent node, vector store nodes, and the six workflow patterns we build for clients.
n8n’s AI nodes have moved from interesting to essential. Two years ago we built AI workflows in n8n by chaining HTTP Request nodes to raw OpenAI calls. Today the AI Agent node, the vector store nodes, and native model providers cover most of what we used to wire by hand.
At Osher Digital, we are a Brisbane-based AI and automation consultancy. We run n8n in production for clients in healthcare, recruitment, finance, and professional services. This guide is the working list of AI workflow patterns we ship for them, plus the configs that have survived contact with real users.
If you are new to n8n itself, the official docs are excellent. If you are new to AI agents in general, our guide to what an AI agent is covers the basics. This article is the next step: practical patterns you can copy.
What an AI Workflow Looks Like in n8n
An AI workflow in n8n is a normal workflow with one or more nodes that call a language model. The model adds judgment to a pipeline that would otherwise be pure rules. Where a classical n8n workflow says “if subject contains ‘invoice’ then forward to accounts”, an AI workflow says “ask the model what kind of email this is, then route based on the answer”.
Two things have changed since 2024. First, the AI Agent node lets you give a model a set of tools (other n8n nodes) and let it decide which to call. This is real agentic behaviour inside the workflow, not a chained pipeline. Second, the vector store nodes (Pinecone, Qdrant, PGVector, Supabase Vector) give you retrieval-augmented generation without leaving n8n.
Across our client work, AI workflows fall into six patterns. Almost everything we build is a variation on one of them.
Pattern 1: Classify and Route
The most common AI workflow we build. An item arrives (email, form submission, document, support ticket). The model classifies it. The workflow routes it.
Concrete example: a healthcare client receives 200 patient documents per day via email and shared folders. The workflow watches the inbox, runs each attachment through gpt-4o-mini with a system prompt listing 14 document types, and writes the result to the right queue in their case management system.
The shape of the workflow:
Trigger (IMAP / Webhook / Schedule) →
Extract Text (PDF Extract or Code) →
OpenAI (classify with structured output) →
Switch (route by classification) →
[Branches for each type]
Use the OpenAI node’s “Structured Output” mode and define a JSON schema with an enum of allowed classifications. This stops the model returning “It looks like a medical record” and forces a clean machine-readable answer. Confidence below 0.85 routes to a human review queue, which catches the genuinely ambiguous cases without flooding the queue with normal items.
Cost at 200 documents per day on gpt-4o-mini: about $4 USD per month. Throughput: 500 docs per hour comfortably, limited by IMAP polling more than the model.
Pattern 2: Extract and Validate
Pull structured data out of unstructured text or images. Validate it against business rules. Write it to a database.
Real example: an accounting practice processes 500+ invoices per month from supplier PDFs. The workflow uses gpt-4.1 with vision to extract invoice number, ABN, line items, GST, and due date, then validates the ABN against the ABR lookup API before writing to Xero.
Trigger (Folder watch / Email) →
Convert to PDF Image (if needed) →
OpenAI Chat (vision, structured output schema) →
Code (validate fields, run ABR lookup) →
Switch (valid → write to Xero, invalid → human review)
The validation step is where most workflows fail in production. Models hallucinate ABNs occasionally. Date formats drift. Line item totals do not always match the invoice total. We always run a Code node after the model that checks the obvious arithmetic, formats, and any external references. Anything failing validation goes to a queue, never directly into the source-of-truth system.
Hit rate on automatic processing in production: about 88-92% straight-through with the remaining 8-12% needing a human glance. That percentage will only fall as models improve, but plan for human review on day one.
Pattern 3: Conversational Agent with Tools
Use the AI Agent node to build a chatbot or assistant that can call other n8n nodes as tools. This is where n8n stops being a workflow tool and starts being an agent runtime.
Real example: a recruitment client built an internal assistant that consultants chat with via Slack. It can search the candidate database (Postgres tool), read the job spec (Google Drive tool), check calendar availability (Google Calendar tool), and draft outreach emails (Gmail tool). The agent decides which tools to call.
Slack Trigger (message in channel) →
AI Agent (with attached tools below) →
- Postgres tool: search candidates
- Google Drive tool: read documents
- Google Calendar tool: check availability
- Gmail tool: draft outreach
→ Slack Send (response with attachments)
The system prompt is where the work is. We spent two days iterating on it for that client. Specific rules (“Never send an email without showing me the draft first”, “When a candidate has the same name as another, always disambiguate by ID”) matter more than model choice. Use the Window Buffer Memory node attached to the agent for conversation continuity.
The biggest gotcha: tool descriptions. n8n auto-generates tool descriptions from the node’s purpose, but they are often too vague. Override the description for every tool with a specific sentence about when to use it and what it returns. Without this, the agent calls the wrong tool 30% of the time.
Pattern 4: RAG over Internal Documents
Index a corpus of documents, retrieve the relevant chunks for a question, and answer using only the retrieved context. Retrieval-augmented generation, all inside n8n.
Real example: a legal firm with thousands of past matter documents. Lawyers ask “have we drafted a clause like this before?” via a custom UI. The workflow embeds the question, retrieves the top 8 matching clauses from Qdrant, and asks gpt-4.1 to synthesise an answer with citations to source documents.
Two workflows are involved. One indexes documents (runs nightly): read documents from source, chunk them, embed each chunk with text-embedding-3-large, write to vector store. The other answers questions (runs on demand): embed the query, fetch top-K matches, pass to AI Agent with the retrieved context.
# Indexing workflow
Schedule Trigger →
Google Drive (list files modified since last run) →
Loop →
Read Binary →
Default Data Loader (chunk) →
Embeddings OpenAI →
Vector Store Qdrant (insert)
# Query workflow
Webhook Trigger →
Vector Store Qdrant (similarity search, top 8) →
AI Agent (with retrieved context in system message) →
Respond to Webhook
Chunk size matters. Default 1000 tokens with 200 overlap is fine for prose. For code or structured data, smaller chunks (300-500 tokens) work better. We always run a recall test after indexing: ask 20 questions we know the answers to and check that the right chunks come back.
Pattern 5: Scheduled Research and Reporting
Run on a schedule, gather information from external sources, summarise, send a report. The unsexy pattern that produces the most ROI per hour of build time.
Real example: a financial services client tracks 40 competitors’ websites and announcement feeds. Every Monday at 7am, n8n fetches each, diffs against last week’s snapshot, summarises significant changes with gpt-4.1, and sends a Slack digest to their strategy team. Build time was three days. They had been paying a consultant six figures a year to do roughly the same thing.
Cron (Mon 0700) →
Read snapshots from S3 →
Loop over 40 sites →
HTTP Request →
Code (diff vs last snapshot) →
OpenAI (summarise material changes) →
Aggregate →
Slack Send
The trick is the diff step before the model sees anything. Sending the full page to the model every week burns money on summarising things that did not change. We hash sections of each page and only feed the model the sections whose hash differs from last run. Cost drops by 90%.
Pattern 6: Human in the Loop
The model proposes. The human disposes. Use this any time the consequences of a wrong action are bigger than the inconvenience of a manual approval.
Real example: a property management client wanted AI to draft tenant communications (rent reminders, maintenance follow-ups, lease renewals) but the operations manager wanted veto power. The workflow drafts the email, writes it to a Slack channel as a thread, and waits for a thumbs-up reaction before sending. Drafts that are not approved within 4 hours fall through to a human-write queue.
Trigger →
AI Agent (draft message) →
Slack Send (post draft for review) →
Wait (Slack reaction with timeout) →
Switch (approved / rejected / timeout) →
[Send / discard / queue]
The Wait node’s “wait for webhook” mode is what makes this work. Slack reactions trigger a webhook back into n8n that resumes the original execution. Done right, the operations manager approves 30 emails in 4 minutes a day and the AI handles the drafting work that used to take her two hours.
The Building Blocks
The n8n nodes you will reach for most often when building AI workflows:
Model choice in 2026: gpt-4.1 and claude-sonnet-4-5 for quality, gpt-4o-mini for high-volume cheap work, claude-haiku-4-5 when you want something cheaper than 4o-mini for simple tasks. We default to claude-sonnet-4-5 for agent work because its tool selection accuracy is the best we have measured. For pure summarisation, gpt-4o-mini wins on cost per useful output.
When n8n Is Not the Right Tool
Not every AI workload belongs in n8n. The honest list:
High concurrency real-time chat. n8n’s execution model is fine for hundreds of concurrent workflow runs but not thousands. For a public-facing chatbot at scale, write the agent in Python or TypeScript and run it on a proper application server. Our Python guide covers that path.
Sub-second latency requirements. n8n adds 100-300ms of overhead per workflow run on top of the model latency. For interactive UIs where 2-second response is the floor, that overhead matters.
Deeply custom retrieval pipelines. If your RAG needs reranking, hybrid search, query rewriting, multiple retrievers, and complex reranking logic, you will fight n8n’s vector store nodes. LangChain, LlamaIndex, or a hand-built pipeline gives you more control.
Workflows you cannot version. n8n workflow JSON is exportable but messy. For workflows that need rigorous version control, code review, and CI/CD, a code-first agent framework is honest about that intent. n8n’s Git integration helps but does not eliminate the gap.
For most business AI workflows, n8n is the right tool. We pick it for any workflow that needs to integrate with multiple business systems, where the visual canvas helps non-developers understand what is happening, and where the operational footprint is reasonable. That covers about 80% of what we build.
Costs and Hosting
For a moderate AI workflow workload (a few thousand model calls per day across all workflows), realistic monthly costs in AUD:
Total for a small business running 4-6 AI workflows: $150-$450 AUD per month all-in. The biggest variable is model spend; high-volume vision tasks (invoice OCR, document classification) drive cost up faster than anything else. Build a cost dashboard in n8n itself if this matters to you; it takes a Schedule node, an API call to your provider’s usage endpoint, and a Postgres write. We have that running for every client over $200 a month in API spend.
If you want help choosing the right pattern and getting the first workflow shipped, book a call with our team. We typically deliver a first working AI workflow in 2-3 weeks of focused work.
Frequently Asked Questions
How do I build an AI workflow in n8n?
Start with a trigger (webhook, schedule, email, form), add a model node (OpenAI Chat Model or AI Agent), wire its output into the rest of your workflow logic. For simple sequential pipelines, chain the model node with Set, Switch, and HTTP Request nodes. For multi-step reasoning, use the AI Agent node with attached tools. Pick the pattern that matches your problem from the six listed in this article and adapt the structure.
Can I use agentic AI with n8n?
Yes. The AI Agent node is purpose-built for this. Attach a chat model, give the agent a set of tools (which can be any other n8n node, plus Postgres, HTTP Request, vector store search, custom code), and the model decides which tools to call to answer a query. We use this pattern for internal assistants, customer-support agents, and research agents that browse a known document corpus.
Can AI generate n8n workflows for me?
Partially. Tools that turn natural language into n8n workflow JSON exist (n8n’s own AI assistant included) but the output usually needs human cleanup. They handle the obvious nodes well and miss the production details (error handling, rate limiting, validation steps). We treat them as scaffolding generators, not finished work. Faster than starting from scratch; not yet a substitute for understanding the workflow you are building.
How much does running AI workflows in n8n cost?
For a small business running a handful of AI workflows: $150-$450 AUD per month all-in, mostly driven by API costs to OpenAI or Anthropic. Self-hosting n8n on a $30 AUD per month VPS keeps the platform cost flat. Higher volume workloads (10K+ model calls per day) push that into the low thousands monthly. Cost per useful output is almost always lower than human handling once you cross 50 items per day.
Should I use the AI Agent node or chain nodes manually?
Use the AI Agent node when the model needs to decide which actions to take. Chain nodes manually when the sequence is fixed and known. Classification, extraction, and summarisation are usually fixed-sequence workflows where chaining is cleaner. Conversational agents and research agents are dynamic and need the AI Agent node. The wrong choice usually shows up as either over-engineered (Agent node when a Switch would do) or under-engineered (manual chain that cannot handle the variation in inputs).
Which vector store should I use with n8n?
If you already run Postgres, use PGVector. Zero extra infrastructure and good enough for tens of millions of vectors. If you want a dedicated vector store with strong filtering, Qdrant self-hosted is excellent and runs in a single Docker container. Pinecone is the easiest managed option but you pay for the convenience. We pick PGVector for most clients and Qdrant when the corpus exceeds a few million chunks.
How do I monitor AI workflow costs in n8n?
Add a small Code node after each model call that calculates approximate cost from the token usage in the response and writes it to a Postgres or Google Sheets log. Schedule a daily summary workflow that aggregates the log and sends it to Slack. Provider dashboards show total spend but not per-workflow breakdown, which is what you actually want when one workflow blows up the bill.
How do I handle errors in AI workflows?
Three layers. First, set “Continue on Error” on any node that calls an external service so one failure does not stop the run. Second, validate model outputs in a Code node and route invalid outputs to a human review queue. Third, wire an Error Trigger workflow that catches workflow-level failures and posts to Slack with execution context. The third one is the difference between finding out about a broken workflow at 9am and finding out three days later when the user complains.
If you want help building production-grade AI workflows in n8n that handle real volume, integrate with your existing systems, and survive the messy edge cases, get in touch with our team. We are based in Brisbane and ship n8n + AI workflows for businesses across Australia.
Jump to a section
Ready to streamline your operations?
Get in touch for a free consultation to see how we can streamline your operations and increase your productivity.