Operations

Your AI Finance Pilot Has No Owner After Launch

Phil Bolton · May 20, 2026 · 3 min read

A founder I work with runs a 38-person logistics platform at $9M revenue. Last September she funded an AP automation pilot. Three vendors evaluated, one selected, ninety-day rollout. Day 91 the agent was live, categorizing invoices, routing approvals, posting to her ledger at 94% accuracy. Her controller, who ran the pilot, moved back to month-end close and forecasting. The agent kept running.

By February the accuracy was at 71%. New vendors weren't in the training set. A pricing change at one supplier shifted the invoice format and the extractor started picking up the wrong total field. Nobody owned the model. Her controller didn't have a budgeted hour for it. In March they turned the agent off and went back to manual entry.

What the pilot didn't include

A March 2026 survey of 650 enterprise technology leaders found 78% have AI agent pilots running and only 14% have reached production scale. IDC put the pilot-to-production failure rate at 88%. Deloitte ran a separate count at 89%. The numbers are consistent enough that this is a structural problem, not a vendor problem.

The story is the same at every company size. A pilot has a named owner, a budget line, a timeline, and a success metric. The day the pilot ends, all four of those go away. The agent becomes ambient software that nobody is responsible for, running against a business that keeps changing.

For a growing company, the gap shows up faster than it does at enterprise scale. A 38-person company adds five new vendors a quarter. Each one breaks a piece of the model that was tuned on the original supplier list. There's no team to absorb the drift.

Where the work actually lives

Production AI agents in finance need three categories of ongoing work that don't fit in any existing job description.

Prompt and rule curation. When the chart of accounts changes or a new vendor type enters AP, somebody needs to update the agent's instructions. Most growing companies treat this as a one-time configuration. It isn't.

Eval maintenance. The agent should be graded weekly against a fresh sample of its own output. Without that loop, accuracy degrades quietly until somebody downstream notices a problem that's already three months old.

Exception triage. Every agent flags decisions it's not sure about. Those flags need a human within twenty-four hours, not at the next month-end close. If the queue isn't worked, the agent learns the wrong thing from silence.

A finance AI pilot is funded like a software project. The agent in production behaves like a junior team member who needs weekly feedback. Companies that don't staff for the second one lose the first one within a year.

What to budget for before the pilot starts

Two questions to answer in writing before any finance AI pilot kicks off.

Who owns the agent on day 92, and how many hours a week is that costed into their role. A controller pulling four hours a week for curation, evals, exception triage is realistic for a single agent. Six agents on the same controller is not.

What the kill criteria are. Define the accuracy floor and the response-time floor before the pilot ends. If the agent drops below either, the budget for re-training kicks in automatically or the agent gets turned off. Drift without a decision rule is the most expensive mode of failure.

A finance AI agent isn't a piece of software you install. It's a process that needs a curator. The pilot funded the install. Nobody funded the curator. That's the 86% gap.

Phil Bolton

Founder & Principal at Manitou Advisory

Your AI Finance Pilot Has No Owner After Launch

What the pilot didn't include

Where the work actually lives

What to budget for before the pilot starts

More from the blog

The Board Wants AI. Your Books Aren't Ready.

The AI Forecast Won't Flag Your Bad Data

Your Compute Bill Comes Due Before the Invoice Clears

Want to talk about your finance setup?