Operations

The 90% Discount Your AI Product Isn't Taking

Phil Bolton · June 25, 2026 · 3 min read

A founder I work with runs a software company, around $6M in revenue, with an AI feature he bolted on last year. His model bill kept climbing and he'd made peace with it as the price of adoption. We pulled the numbers. His cache hit rate was close to zero. Every call was paying full freight for the same block of context the product ships on every request. Fixing that one thing cut his model spend by about two-thirds. Nothing about the feature changed. The bill did.

The biggest variable cost in the product is set by people finance never talks to

When you call a model, you pay per token. Most AI features send a large, repetitive chunk of context every time: instructions, examples, formatting rules, the stuff that never changes between requests. Cache that chunk and the provider charges you a sliver to read it again. With Anthropic, a cached input token reads at $0.30 per million against $3.00 for fresh computation. That's a 90% cut on the part of the bill that repeats most.

One catch. Caching only works if the prompt is built for it, with the stable content up front and the variable content at the end. Build it the other way and nothing caches. Two teams can ship an identical feature, and the one that ordered its prompts wrong pays roughly ten times as much to run it. That decision lives in engineering. The cost lands in COGS.

Finance is reading the symptom, not the cause

A rising model bill looks exactly like growth. More usage, more spend, margin gives up a point or two, everyone nods and moves on. The line gets waved through month after month as the cost of doing AI business.

But your chart of accounts has no row for cache hit rate. Hosting and API charges drop into one COGS bucket, and that bucket can't tell you whether you paid $3.00 or $0.30 for the same work. You watch the total rise. You don't see that most of the increase was avoidable. The number that would explain it sits in a usage dashboard your finance team has never opened.

Gross margin is an engineering output now, not only an accounting one. The single biggest lever on your AI cost line is a design choice made by someone who has never seen your P&L.

Put it on the close

You don't need to learn prompt engineering. You need one figure on the monthly margin review: cache hit rate, pulled from the model provider's console, sitting next to the COGS line it explains. Ask engineering for it. If it's low, you've found margin that costs nothing to recover. If it's high, you've ruled out the cheapest fix and can stop guessing at the rest.

This is the kind of thing that hides for a year because no one owns the seam between the model bill and the income statement. Engineering sees a latency number. Finance sees a cost going up. Neither sees the 90% sitting on the table between them.

Your cheapest token is the one you've already paid for. Most AI products buy it twice.

Phil Bolton

Founder & Principal at Manitou Advisory

More from the blog

Strategy

Your Best Customer Is About to Buy Fewer Seats

Per-seat pricing assumes one human equals one unit of value. AI agents break that assumption, and the contraction hides inside your healthiest accounts.

Phil BoltonJun 26, 20263 min read

Operations

Nobody Approved the Spend. It Came Four Cents at a Time.

Approval thresholds were built to catch the big invoice. Agent-driven spend fragments into thousands of sub-dollar charges that never cross the line.

Phil BoltonJun 24, 20263 min read

Operations

Your Platform Credit Line Shrinks the Week You Need It

Embedded working capital grows when your business is strong and contracts when it weakens. That's backwards from how a credit line should behave, and most founders don't notice until the offer is gone.

Phil BoltonJun 23, 20262 min read

Want to talk about your finance setup?

We help growing companies build the right finance function.

Book a Call →