Business Viability and AI Economics

The cannibalisation paradox, the margin trap, inference cost modelling, and pricing strategies that survive the shift from SaaS to Service-as-a-Software.

TL;DR

AI inference costs dropped 10–50x since early 2025. The constraint shifted from "can we afford it?" to "can we make it reliable?"
Per-seat pricing creates a cannibalisation paradox: the better your AI agents work, the fewer seats your customer buys. The fix is pricing per outcome, not per login.
Bolting AI onto a legacy platform as an optional copilot adds COGS without removing workflow cost. If the AI is optional, it's a margin trap.

The AI economics shift

Two years ago, the first question in every AI business case was "can we afford the inference?" That question is largely retired.

Inference costs dropped 10–50x between early 2025 and late 2026. Prompt caching (which avoids reprocessing repeated context) delivers an additional 90% cost reduction on cached tokens. Open-weight models from Llama, Mistral, and DeepSeek closed the capability gap with proprietary APIs for most production tasks. Fine-tuning on proprietary data became accessible to teams without dedicated ML engineers. For some teams, AI token spend now exceeds personnel costs, inverting the traditional software budget structure entirely.

The cost barrier to adding AI features disappeared for most use cases. What replaced it is harder: reliability. A feature that works 92% of the time sounds impressive until you calculate the damage of the other 8% at scale. The viable AI product is the one that handles failure gracefully, not the one with the lowest inference bill.

This changes the viability question. Market sizing and TAM/SAM/SOM analysis still matter (any good product management resource covers the mechanics), but the AI-specific viability questions are different. They centre on data, margins, pricing, and build strategy.

Data feasibility and the data flywheel

In traditional software, feasibility risk is "can our engineers build it?" For AI products, the gating question is "do we have the data to build it?" You can always find engineers. You cannot always find data.

The data feasibility report

Before an AI bet proceeds, document answers to four questions:

Acquisition and quality. Do you have this data? Is it sufficient in volume and representative (unbiased)? What's the plan to clean, label, and maintain quality over time?

Provenance and legality. Where did this data come from? Do you have the legal rights, licences, and permissions to use it for training a commercial model?

Privacy and ethics. Does this data contain PII or other sensitive information? What's the plan for anonymisation, de-identification, and aggregation? The AI governance chapter covers PII handling, data residency, and compliance requirements in depth.

Bias audit. What inherent biases exist in this data? What's the mitigation plan to ensure the model doesn't amplify them?

This report is the central artefact that unlocks the rest of the business case. You cannot choose a pricing model or build strategy until you've de-risked data acquisition.

The data flywheel

Data feasibility answers whether you can build the product. The data flywheel determines whether you can defend it.

Each user interaction generates signal: corrections, selections, feedback, usage patterns. That signal improves the model. A better model improves the product. A better product attracts more users. More users generate more signal. This is the AI-era moat, and it compounds.

Companies with a functioning data flywheel get better at a rate their competitors cannot match by spending more on compute alone. The flywheel is why first-mover advantage matters more in AI products than in traditional SaaS: you're not just acquiring users, you're acquiring training data.

When assessing viability, ask two flywheel questions:

Does usage generate data that improves the model? If your product's AI operates on static data with no feedback loop, you have no flywheel. Competitors with identical models can replicate your product.
Is the improvement measurable and compounding? Track model accuracy or output quality against usage volume. If the curve is flat, your flywheel is broken. If it's climbing, you're building a defensible position.

The margin trap

Most established software companies are adding AI the wrong way. They bolt a "copilot" sidebar onto a platform architected a decade ago. Every product demo has a chatbot. Every roadmap has an "AI Assistant" workstream. Every earnings call mentions generative AI fifteen times.

The problem: they've added inference cost without removing workflow cost. The user still navigates the same screens, fills out the same forms, follows the same multi-step processes. The copilot auto-fills a few fields. Margins shrink. The feature looks modern. The P&L looks worse.

This is the margin trap. If the AI is optional (a sidebar the user can ignore), you've added COGS to your platform without changing the value equation. The user experience is incrementally better. The unit economics are materially worse.

The distinction that matters:

Approach	Value proposition	Cost impact	Margin effect
Copilot (assistance)	Do work faster	Inference added on top of existing platform cost	Margin compression
Agent (replacement)	Do work for you	Inference replaces platform + human cost	Margin expansion

A copilot helps you fill out a form faster. An agent eliminates the form. One adds a cost layer. The other removes a cost layer. On a per-seat pricing model, that distinction is the difference between steady margin compression over time and genuine margin expansion.

The litmus test: remove the AI from your product. Does the product still work? If yes, the AI is a feature, not a strategy. Now imagine the inverse: build the product where the AI is the product, where removing it means the product doesn't function. That's the difference between a margin trap and a viable AI business.

The cannibalisation paradox

Per-seat pricing powered a generation of billion-dollar SaaS companies. The logic was elegant: customer hires more humans, you sell more seats, revenue grows. Headcount and ARR moved in lockstep.

Agentic AI inverts the logic. You build autonomous agents. The customer needs fewer humans. You sell fewer seats. Revenue shrinks. The more successful your AI product, the more it cannibalises your seat-based revenue.

This is the Cannibalisation Paradox. Every efficiency gain you ship is a seat your customer no longer needs. The better your product gets, the less they pay you. No product leader wants to present that slide at a board meeting, but if you're building agentic capabilities on a per-seat model, that's the trajectory.

The fix: stop selling access and start selling outcomes. Shift from Software-as-a-Service to Service-as-a-Software.

When you sell a tool (Salesforce, Jira, Figma), you charge for the login. When you sell a result (the work itself, completed autonomously), you charge for the completed task. Tickets resolved. Contracts reviewed. Reports generated. Leads qualified. The pricing unit should be the smallest meaningful outcome your agent delivers, something the customer already understands and already values.

Price anchor: what did this task cost when a human did it? Your price should be meaningfully less than human cost, meaningfully more than inference cost. The spread is your margin, and unlike seat-based margin, it scales with volume.

Pricing for AI products

The COGS formula

Before setting any price, model the cost per query. Prompt caching changes the math significantly for products with repeated context (system prompts, document templates, recurring workflows).

COGS per Query = (Input Tokens × Cost/Token × Cache Miss Rate) + (Input Tokens × Cached Cost/Token × Cache Hit Rate) + (Output Tokens × Cost/Token) + Infrastructure Overhead

For a product with 80% cache hit rate and a model charging $3/M input tokens, $0.30/M cached tokens, and $15/M output tokens:

2,000 input tokens: (400 × $0.000003) + (1,600 × $0.0000003) + output cost
The cached portion costs 90% less than uncached

At scale, prompt caching can reduce your input token costs by 70–90%. If your viability model doesn't account for caching, you're overestimating COGS and potentially killing features that are commercially viable.

AI pricing models

Model	How it works	When it fits	Risk
Pure usage-based	Charge per unit (API call, token, query)	Developer tools, infrastructure products	Customer anxiety over unpredictable bills; revenue volatility
Outcome-based (Service-as-a-Software)	Charge per successful result (ticket resolved, report generated)	Agentic products replacing human work	Defining "success" is hard; you absorb the cost of failures
Stand-alone add-on	AI features sold as a separate subscription tier	Quick monetisation of AI on an existing platform	Creates adoption friction; risks becoming the "optional copilot"
Hybrid: platform fee + metered outcomes	Flat base fee for access, metered charge for autonomous work done	Most agentic products transitioning from SaaS	More complex to build and communicate
Hybrid: seat + credit pool	Each seat contributes to a shared pool of AI usage credits	Teams transitioning gradually from per-seat	Power users exhaust the pool; doesn't solve the cannibalisation paradox long-term

For most agentic products, the hybrid platform fee + metered outcomes model is the right starting point. The platform fee protects baseline ARR and gives finance predictable revenue. The metered layer captures value as the agent handles more volume. Example: "Includes 500 autonomous ticket resolutions per month. $2 per resolution thereafter."

This model solves the cannibalisation paradox because revenue grows with agent output, not headcount. It also creates natural expansion revenue: as agents prove themselves, customers route more volume through them without a sales conversation.

The internal alignment problem

Pricing changes fail when internal incentives don't follow. If engineering builds features that reduce human workload while sales is incentivised to increase seat count, your company is at war with itself.

Realign three things simultaneously:

Sales incentives. Commission on consumption revenue and platform expansion, not seat count.
Customer success metrics. Measure outcomes delivered, not daily active users and logins.
Product metrics. Track work completed autonomously. A user who spends less time in your product because the agent handled everything is a success, not a churn risk.

The audit tax

Multi-agent architectures (where a manager model audits worker model outputs) introduce a cost multiplier that most teams don't model until it's too late. The agentic AI patterns chapter covers the architectural side; here, the focus is on the economics.

The math

Typical 2026 cost structure:

Worker model (small, efficient, task-specific): ~$0.20/M tokens
Manager model (reasoning-heavy, validates outputs): ~$5.00/M tokens

The manager is 25x more expensive. If you audit every worker output ("micromanager architecture"), the per-task cost increase is roughly 2,500%.

Audit rate	Cost per task	Relative cost
0% (worker only)	$0.0002	1x
100% (every output)	$0.0052	26x
20% (spot-check)	$0.0012	6x

At 100,000 tasks per day, the difference between 100% audit and spot-check is $520 versus $120. Annualised, that's a $146,000 margin difference on a single workflow.

The spot-check architecture

Route high-confidence outputs directly to the user or next step. Only escalate low-confidence outputs to the manager model.

If your worker produces high-confidence outputs 80% of the time:

Blended cost = (0.80 × $0.0002) + (0.20 × $0.0052) = $0.0012 per task

That's a 75% cost reduction from full audit, with minimal reliability loss if your confidence scoring is well-calibrated.

Four approaches to confidence scoring (combine them):

Model-native confidence. Ask the worker to rate uncertainty, or generate multiple candidates and measure agreement.
Rule-based validation. For structured outputs, validate against known constraints. Nearly free.
Historical calibration. Track actual accuracy against confidence scores over time. Adjust thresholds based on observed performance.
Domain heuristics. Route known hard inputs (long documents, ambiguous language) to the manager proactively.

Pricing reliability as a tier

Enterprise customers who need 99%+ accuracy are asking for a higher audit rate, which means higher inference cost. Offer configurable reliability: let the customer set the confidence threshold based on their risk tolerance and budget. Low-risk use cases get the cheap tier. High-stakes use cases pay for the premium audit rate. A single reliability tier at a single price means you're either overcharging low-risk customers or subsidising high-risk ones.

Buy, build, or route

The traditional buy/build/partner framework assumed a binary choice. In 2026, the right framing is "route," because most production AI products use multiple models for different tasks.

What changed

Open-weight models closed the gap. Llama, Mistral, and DeepSeek deliver production-quality results for most tasks. You no longer need a proprietary API for everything.

Prompt caching changed the economics. A 90% cost reduction on cached tokens makes API-based approaches viable at scales where self-hosting was previously the only option.

Fine-tuning became accessible. You don't need a team of ML engineers to fine-tune a model on proprietary data. The tooling matured. The cost dropped. A product team can run a fine-tuning job in an afternoon.

The routing layer emerged. Instead of choosing one model, production systems route different tasks to different models based on complexity, cost, and latency requirements. Simple classification goes to a small, fast model. Complex reasoning goes to a frontier model. The routing layer is the architectural decision, not the model selection. I cover this in detail in the chapter on multi-model orchestration.

The 2026 decision matrix

Factor	API (proprietary)	Open-weight (self-hosted)	Routed (multi-model)
Time to market	Days	Weeks (infra setup)	Weeks (routing logic)
Cost at low volume	Low (pay per token)	High (GPU allocation)	Medium
Cost at high volume	High (no volume ceiling on spend)	Low (amortised infra)	Lowest (optimised per task)
Data privacy	Data leaves your environment	Data stays internal	Configurable per route
Customisation	Limited to prompting	Full (fine-tuning, architecture)	Full per model in the stack
Vendor lock-in	High	None	Low (models are swappable)
Strategic moat	None (competitors use same API)	Moderate (fine-tuned on your data)	High (routing logic + fine-tuned models + data flywheel)

The default recommendation for most products: Start with a proprietary API to validate the use case fast. Build routing logic early (even if you only have one model behind it) so you can swap and add models without re-architecting. Fine-tune an open-weight model on your proprietary data once you have enough usage data to make it worthwhile. The routing layer lets you migrate incrementally rather than making a single high-stakes bet.

The strategic moat isn't the model. It's the combination of routing logic, fine-tuned models trained on proprietary data, and the data flywheel that makes both better over time.

What commercially rigorous AI PMs look like

Behaviour	What it looks like in practice
Models COGS before features	Runs the cost-per-query formula before writing the PRD, not after launch
Treats reliability as a pricing lever	Offers tiered audit rates rather than promising blanket accuracy
Monitors the flywheel	Tracks model accuracy against usage volume to prove the compounding loop
Stress-tests the cannibalisation math	Models what happens to revenue when the agent handles 50%, then 80% of the workload
Prices on outcomes, not access	Defines the work unit the customer values and builds pricing around it
Builds for replacement, not assistance	Asks "should this workflow exist?" before asking "how do we add AI to this workflow?"
Designs the routing layer early	Architects for multi-model from day one, even if shipping with a single model

The anti-pattern: viability theatre

The PM who runs a TAM analysis, picks a pricing model from a textbook, and calls the business case "validated." No COGS modelling. No cannibalisation analysis. No data feasibility report. No understanding of whether the AI feature adds cost on top of the existing platform or replaces cost within it.

Viability theatre produces impressive slide decks and catastrophic P&L surprises. The AI feature launches, usage grows, inference costs spike, margins collapse, and leadership pulls funding. The feature was never commercially unviable. It was never commercially analysed.

Do the math first. Then build.