Updated May 2026. Rewritten as a practitioner view of where AI extraction has replaced classical RPA for invoice automation, with the stack we actually deploy and the parts that still break.

If you searched for “automate invoice processing with RPA” two years ago, the right answer was a UiPath or Automation Anywhere bot doing OCR plus form-filling. In 2026 that answer is mostly wrong. The honest version of this guide is that pure rule-based RPA for invoice processing is the worst part of a modern AP automation stack. AI extraction does the hard work. RPA, where it survives at all, handles the boring last-mile clicks into a legacy accounting system.

At Osher Digital, we are a Brisbane-based AI and automation consultancy that builds and operates invoice automation for AP teams in healthcare, professional services, and recruitment. The systems we ship now look nothing like the screenshots in vendor decks. They are smaller, cheaper, and they handle the unstructured PDF that broke every classical bot we inherited.

This is the guide we wish existed when clients ask “should we use RPA for invoice processing?”. We will cover where pure RPA still earns a place, the AI-first stack we deploy, the production numbers we see, the things that will absolutely break, and the migration path off a UiPath bot that nobody on the AP team trusts anymore. If you want our broader take on the agent side of this stack, see our Claude AI agent guide. For the workflow tooling we glue this together with, see our n8n consultants page.

What Classical RPA Got Right, and What It Misses Now

Classical RPA tools (UiPath, Automation Anywhere, Blue Prism) earned their reputation because they did one thing better than people: they typed into systems that had no API. If your invoice workflow ended in MYOB AccountRight 2014, or Sage 50, or a thirty-year-old ERP screen with no integration surface, an RPA bot was the only thing that could close the loop without a six-figure middleware project.

The bit RPA never did well was the extraction. OCR plus regex plus templated zonal extraction works for one supplier with one invoice format. It falls apart the moment a vendor changes their layout, sends a multi-page invoice with a remittance advice, or invoices you from a different entity. Every AP automation rollout we have seen in the wild has the same scar tissue: a “happy path” bot that hits exception queues on twenty to forty percent of invoices.

The Claude Sonnet 4.5 and GPT-4.1 generation of multimodal models killed the extraction problem. You hand the model the PDF, ask for structured JSON with vendor name, invoice number, date, line items, GST, and total, and it gives you back the right answer on a layout it has never seen before. The improvement is not marginal. We measure it. Pure-RPA pipelines we replaced were hitting 65 to 75 percent straight-through. AI-first pipelines on the same supplier mix hit 88 to 94 percent.

The AI-First Invoice Processing Stack We Deploy

The shape of the stack has settled. It looks roughly the same across the clients we ship for, with the model choice and the destination ERP being the two main variables.

Inbound capture. A shared mailbox (invoices@) and a supplier portal scraper. We pull PDFs out of email attachments. For suppliers who still print, we run incoming mail through a desk scanner with auto-OCR. EDI feeds from larger suppliers come straight in via a separate path.

Pre-processing. Convert everything to PDF. Split multi-invoice attachments into single-invoice files using a classifier model. Reject obvious non-invoices (statements, marketing) before they cost extraction tokens.

Extraction. Claude Sonnet 4.5 (default) or GPT-4.1 hits the PDF with a structured-output schema. We define the schema with Pydantic on the Python side or JSON Schema if we are running this from an n8n AI node. The model returns a typed object. No regex, no templates, no per-supplier configuration.

Validation. Three layers. Schema validation catches type mismatches. Business-rule validation reconciles line items against the stated total, checks GST is 10 percent of the GST-eligible portion, and confirms the vendor exists in the supplier master. Cross-record checks catch duplicate invoice numbers from the same vendor.

Match. Three-way match against the PO and goods receipt where available. PO number lookup, line-item matching, tolerance check on price and quantity. This is the part where the AI is doing real reasoning, not just extraction, and it is where the quality gap between a $5,000 product and an in-house build is smallest.

Approval routing. A workflow tool (we use n8n for most clients, native ERP workflow for Xero shops) sends the invoice to the right approver based on cost centre, amount, and PO match status. Approvers see the original PDF, the extracted fields, and any validation flags.

Post to ERP. If the ERP has an API (Xero, MYOB Business, NetSuite, modern Dynamics), this is a clean HTTP call. If it does not, this is where an RPA bot still earns a paycheck.

Where Classical RPA Still Earns Its Place

The legitimate use cases for a UiPath or Power Automate Desktop bot in 2026:

The final write into a legacy ERP with no API. MYOB Premier Classic, old Pronto, older versions of SAP B1 without RFC enabled, custom in-house systems built before REST existed.
Driving a banking portal that does not support open banking or CSV upload (rarer every quarter, but it happens).
Coexisting with a still-running RPA program that has political momentum. We do not start fights we cannot win. If there is an RPA centre of excellence, we plug AI extraction in front of the bots and let them keep their last-mile click work.

What we do not do anymore: write a UiPath bot to extract fields off an invoice PDF. That work is gone. A 200-line Python script using the Anthropic SDK plus Pydantic will beat any zonal-OCR template solution we have ever seen, on any supplier mix, with no per-vendor tuning.

A Working Pipeline (Python and Claude)

This is the shape of the extraction step. It is not the whole pipeline, but it is the bit most “how to” articles skip.

from anthropic import Anthropic
from pydantic import BaseModel, Field, field_validator
from decimal import Decimal
import base64

class LineItem(BaseModel):
    description: str
    quantity: Decimal
    unit_price: Decimal
    line_total: Decimal

class Invoice(BaseModel):
    supplier_name: str
    supplier_abn: str | None
    invoice_number: str
    invoice_date: str # ISO 8601
    due_date: str | None
    line_items: list[LineItem]
    subtotal: Decimal
    gst: Decimal
    total: Decimal
    currency: str = "AUD"

    @field_validator("total")
    @classmethod
    def total_matches_lines(cls, v, info):
        lines = sum(li.line_total for li in info.data["line_items"])
        if abs(v - (lines + info.data["gst"])) > Decimal("0.05"):
            raise ValueError("Total does not reconcile to lines + GST")
        return v

def extract(pdf_bytes: bytes) -> Invoice:
    client = Anthropic()
    pdf_b64 = base64.standard_b64encode(pdf_bytes).decode()

    response = client.messages.create(
        model="claude-sonnet-4-5",
        max_tokens=2000,
        messages=[{
            "role": "user",
            "content": [
                {"type": "document", "source": {
                    "type": "base64",
                    "media_type": "application/pdf",
                    "data": pdf_b64
                }},
                {"type": "text", "text": (
                    "Extract this invoice into the schema. "
                    "If a field is not present, use null. "
                    "Return only JSON matching the schema."
                )}
            ]
        }],
        tools=[{
            "name": "record_invoice",
            "input_schema": Invoice.model_json_schema(),
        }],
        tool_choice={"type": "tool", "name": "record_invoice"},
    )

    tool_use = next(b for b in response.content if b.type == "tool_use")
    return Invoice.model_validate(tool_use.input)

The reconciling validator is the bit that separates a working system from a demo. The model will hallucinate a total occasionally. The validator catches it. Failed invoices go to a human queue. Everything else flows through.

Cost: What This Stack Actually Costs to Run

The number that surprises people the most. For an AP team processing 500 invoices per day, the runtime cost looks like this in 2026:

Claude Sonnet 4.5 extraction: about 8,000 input tokens and 600 output tokens per invoice. At current pricing that is around $0.04 USD per invoice. 500 per day is roughly $20 USD per day, or $600 USD per month. Call it $930 AUD per month at today’s exchange.
Workflow tool (self-hosted n8n on a small Sydney VPS): $40 AUD per month all in.
Optional supplier portal scrapers (Apify or similar) if you cannot get invoices into the mailbox: $50 to $200 AUD per month.
Total runtime: $1,000 to $1,200 AUD per month at 500 invoices per day. That is around $0.07 per invoice in compute and tooling.

Compare to a classical RPA seat. UiPath Studio plus unattended robot licensing typically lands in the $15,000 to $40,000 AUD per year range once you account for orchestrator, scaling, and support. Plus the developer cost to maintain per-vendor templates. Plus the AP staff time on the exception queue, which on a pure-RPA pipeline tends to be the dominant operational cost.

Build cost for the AI-first pipeline runs $25,000 to $60,000 AUD depending on ERP complexity and how many edge cases you want covered before launch. We have done it cheaper for clients with one clean ERP and one mailbox. We have spent more on clients with five entities and a Pronto install.

Accuracy: The Numbers We See in Production

One client, mid-size professional services firm. 400 to 600 invoices per day, mixed suppliers, mostly tax invoices in AUD, some USD and SGD for software and contractors.

Straight-through processing (no human touch): 89 percent.
Sent to exception queue for human review: 11 percent.
Of the straight-through, error rate post-audit (random sample): 0.4 percent.
Of the exception queue, real exceptions (vs. false flags): about 70 percent.

The exception queue is non-negotiable. Anyone selling you a 99 percent straight-through automation is either lying or has cherry-picked a vendor with one invoice format. Real-world invoice automation has weird stuff. A handwritten amendment. A credit note attached to an invoice. A multi-page invoice where page two of three is the remittance advice. You design for the exception queue, not against it.

Things That Will Break in Production

This is the list of debugging sessions we have actually had, in approximate order of how much time they cost us.

Duplicate invoice numbers across entities. Two of our clients have multiple legal entities. Same supplier sends invoice number 4521 to entity A in January and to entity B in February. The naive dedup check flags the second one. Fix is to dedup on (supplier_abn, invoice_number, billed_entity), not just the first two.

GST-inclusive vs GST-exclusive totals. Some suppliers list the line total as GST-inclusive. Some list it exclusive and add GST as a separate line. The model gets this right most of the time but not always. The reconciliation validator catches the wrong ones.

Foreign currency invoices. A USD invoice landing in an AUD-default AP queue. The fix is explicit currency detection in the schema. Default does not work because the supplier’s address country and the billing currency do not always agree.

PDF rendering failures. About one in 500 PDFs is malformed in a way that the model cannot parse. We catch these and fall back to a Tesseract OCR pass plus a text-only extraction prompt. About 80 percent of those then succeed.

Tone-of-voice exceptions from suppliers. A handful of suppliers send PDFs that are actually scanned photographs of printed invoices, with shadows. The model handles these surprisingly well now. Two years ago they were unrecoverable.

The accounting system going down. Xero has scheduled maintenance. The pipeline has to handle “post to ERP” failing gracefully, queueing for retry, alerting if the queue grows. This is operational plumbing, not AI work, and it is where most fresh builds fall over in week three.

When Not to Automate Invoice Processing at All

Some AP volumes do not justify the build. Under 50 invoices per month, the math does not work. A bookkeeper paid for two hours a week will handle this cheaper and better than any automation we could ship. If you are at 100 invoices a month with a single ERP and a single mailbox, you are probably better off with a packaged product (Hubdoc, Dext, native Xero AP capture) than a custom build.

The sweet spot for a custom build is around 300 invoices per day and up, especially if the AP team is currently three or more FTEs. Below that, the off-the-shelf SaaS products are good enough and an order of magnitude cheaper to start.

One more case: if your supplier base sends 95 percent of invoices through a single supplier portal that already supports EDI or API integration, that integration will beat anything you build. The work then is the last 5 percent, not the whole thing.

Migration: Moving Off a Failed RPA Bot

The most common engagement we walk into. A previous program built a UiPath or Power Automate Desktop bot for invoice extraction two years ago. The AP team has stopped trusting it. They double-check every invoice anyway. The bot is technically running but it is not saving anyone time.

The migration we run looks like this. Week one: stand up the AI extraction pipeline in parallel, feeding the same mailbox. Output goes to a shadow queue. We compare extracted fields against the bot’s output, side by side, for 200 to 400 invoices. The AP team sees both. This is where buy-in happens, because the AI extraction is visibly better in the first 50 invoices.

Week two to three: AI extraction becomes the primary. The legacy bot keeps running for the “post to ERP” step only, until we either replace it with an API call or accept that the bot’s last-mile click work is fine and leave it alone. Week four: shut down the bot’s extraction logic. Update the runbook. Hand the operations to the AP manager. If you want help walking through the same kind of migration, book a call and we can scope it.

Frequently Asked Questions

Is RPA still the right tool for invoice processing in 2026?

For extraction, no. AI extraction with Claude Sonnet 4.5 or GPT-4.1 produces materially better results than any classical RPA tool we have evaluated, with no per-vendor template work. RPA is still useful as the last-mile mechanism for posting into a legacy ERP that has no API. If your accounting system is on the modern web (Xero, MYOB Business, NetSuite, Dynamics 365 Business Central) you do not need RPA at all.

How much does automated invoice processing cost in 2026?

For a custom AI-first build, $25,000 to $60,000 AUD to deliver, plus $1,000 to $1,500 AUD per month runtime for a 500-invoice-per-day pipeline. For off-the-shelf SaaS at small volume, products like Hubdoc, Dext, or native Xero AP capture cost $30 to $80 AUD per user per month and cover most needs under 1,000 invoices a month. The custom build pays back somewhere between 300 invoices per day and 1 FTE saved on the AP team.

What accuracy can we expect from AI invoice extraction?

Field-level accuracy on a mixed supplier base sits around 96 to 98 percent for the headline fields (vendor, invoice number, date, total). Line items are noisier, around 92 to 95 percent. Straight-through processing, which is the whole-invoice metric you actually care about, lands in the 85 to 92 percent range in our production deployments. The remaining 8 to 15 percent hits an exception queue for human review. Anyone promising 99 percent straight-through is selling.

Can this handle multi-page invoices and remittance advices?

Yes, and this was the biggest single weak point of classical OCR-template RPA. Modern multimodal models read the whole document and reason about which page is the invoice and which is the remittance. We still pre-split obviously separate documents in the same attachment, but the model handles the mixed-page case well. Test it on your worst suppliers before signing off.

How do we handle foreign currency invoices?

The extraction schema explicitly captures the invoice currency rather than assuming AUD. The validator checks the total reconciles in the invoice currency. The ERP write step looks up an FX rate (RBA daily rate is fine for accounting purposes; banks have their own when paying). The trap is letting the AP system default everything to AUD; that produces correct-looking but financially wrong records.

What about three-way match against purchase orders?

This is where AI helps materially over classical RPA. The model can reason about partial deliveries, price changes from a quote, and minor description mismatches between the PO and the invoice line. We still write business rules around tolerance (typically 5 percent or $50, whichever is lower). The model produces a match recommendation; the rules decide whether it goes through or hits the exception queue.

How long does it take to deploy?

Four to eight weeks for a clean rollout, longer if the destination ERP needs API work or there are multiple entities. The first two weeks are scoping, extraction prototype, and validator design. Weeks three to five are integration and the shadow-mode parallel run. Weeks six to eight are cutover, exception-queue tuning, and handover to AP. The exception-queue tuning is the bit that takes longer than people expect.

What about data residency for Australian businesses?

The hosted Anthropic and OpenAI APIs process the document in the US. For businesses bound by stricter data residency requirements (regulated healthcare, some government work, APP-sensitive workloads), the options are AWS Bedrock in ap-southeast-2 with Claude available in the region, or self-hosted Llama 3.3 70B vision-capable variants on an Australian GPU instance. The accuracy gap on invoice extraction is real but not huge in 2026; expect a couple of percentage points of straight-through difference.

If you are weighing an AP automation build, or trying to figure out what to do with a UiPath bot that nobody on the AP team trusts anymore, get in touch. We build and operate AI-first invoice automation for AP teams across healthcare, professional services, and recruitment, and we are happy to scope what your stack should actually look like.

Automate Invoice Processing: Why AI Extraction Beats RPA in 2026