Business Viability and AI Economics
The cannibalisation paradox, the margin trap, inference cost modelling, and pricing strategies that survive the shift from SaaS to Service-as-a-Software.
TL;DR
- AI inference costs dropped 10–50x since early 2025. The constraint shifted from "can we afford it?" to "can we make it reliable?"
- Per-seat pricing creates a cannibalisation paradox: the better your AI agents work, the fewer seats your customer buys. The fix is pricing per outcome, not per login.
- Bolting AI onto a legacy platform as an optional copilot adds COGS without removing workflow cost. If the AI is optional, it's a margin trap.
The AI economics shift
Two years ago, the first question in every AI business case was "can we afford the inference?" That question is largely retired.
Inference costs dropped 10–50x between early 2025 and late 2026. Prompt caching (which avoids reprocessing repeated context) delivers an additional 90% cost reduction on cached tokens. Open-weight models from Llama, Mistral, and DeepSeek closed the capability gap with proprietary APIs for most production tasks. Fine-tuning on proprietary data became accessible to teams without dedicated ML engineers.
The cost barrier to adding AI features disappeared for most use cases. What replaced it is harder: reliability. A feature that works 92% of the time sounds impressive until you calculate the damage of the other 8% at scale. The viable AI product is the one that handles failure gracefully, not the one with the lowest inference bill.
This changes the viability question. Market sizing and TAM/SAM/SOM analysis still matter (any good product management resource covers the mechanics), but the AI-specific viability questions are different. They centre on data, margins, pricing, and build strategy.
Data feasibility and the data flywheel
In traditional software, feasibility risk is "can our engineers build it?" For AI products, the gating question is "do we have the data to build it?" You can always find engineers. You cannot always find data.
The data feasibility report
Before an AI bet proceeds, document answers to four questions:
Acquisition and quality. Do you have this data? Is it sufficient in volume and representative (unbiased)? What's the plan to clean, label, and maintain quality over time?
Provenance and legality. Where did this data come from? Do you have the legal rights, licences, and permissions to use it for training a commercial model?
Privacy and ethics. Does this data contain PII or other sensitive information? What's the plan for anonymisation, de-identification, and aggregation?
Bias audit. What inherent biases exist in this data? What's the mitigation plan to ensure the model doesn't amplify them?
This report is the central artefact that unlocks the rest of the business case. You cannot choose a pricing model or build strategy until you've de-risked data acquisition.
The data flywheel
Data feasibility answers whether you can build the product. The data flywheel determines whether you can defend it.
Each user interaction generates signal: corrections, selections, feedback, usage patterns. That signal improves the model. A better model improves the product. A better product attracts more users. More users generate more signal. This is the AI-era moat, and it compounds.
Companies with a functioning data flywheel get better at a rate their competitors cannot match by spending more on compute alone. The flywheel is why first-mover advantage matters more in AI products than in traditional SaaS: you're not just acquiring users, you're acquiring training data.
When assessing viability, ask two flywheel questions:
- Does usage generate data that improves the model? If your product's AI operates on static data with no feedback loop, you have no flywheel. Competitors with identical models can replicate your product.
- Is the improvement measurable and compounding? Track model accuracy or output quality against usage volume. If the curve is flat, your flywheel is broken. If it's climbing, you're building a defensible position.
The margin trap
Most established software companies are adding AI the wrong way. They bolt a "copilot" sidebar onto a platform architected a decade ago. Every product demo has a chatbot. Every roadmap has an "AI Assistant" workstream. Every earnings call mentions generative AI fifteen times.
The problem: they've added inference cost without removing workflow cost. The user still navigates the same screens, fills out the same forms, follows the same multi-step processes. The copilot auto-fills a few fields. Margins shrink. The feature looks modern. The P&L looks worse.
This is the margin trap. If the AI is optional (a sidebar the user can ignore), you've added COGS to your platform without changing the value equation. The user experience is incrementally better. The unit economics are materially worse.
The distinction that matters:
| Approach | Value proposition | Cost impact | Margin effect |
|---|---|---|---|
| Copilot (assistance) | Do work faster | Inference added on top of existing platform cost | Margin compression |
| Agent (replacement) | Do work for you | Inference replaces platform + human cost | Margin expansion |
A copilot helps you fill out a form faster. An agent eliminates the form. One adds a cost layer. The other removes a cost layer.
The litmus test: remove the AI from your product. Does the product still work? If yes, the AI is a feature, not a strategy. Now imagine the inverse: build the product where the AI is the product, where removing it means the product doesn't function. That's the difference between a margin trap and a viable AI business.
The cannibalisation paradox
Per-seat pricing powered a generation of billion-dollar SaaS companies. The logic was elegant: customer hires more humans, you sell more seats, revenue grows. Headcount and ARR moved in lockstep.
Agentic AI inverts the logic. You build autonomous agents. The customer needs fewer humans. You sell fewer seats. Revenue shrinks. The more successful your AI product, the more it cannibalises your seat-based revenue.
This is the Cannibalisation Paradox. Every efficiency gain you ship is a seat your customer no longer needs. The better your product gets, the less they pay you. No product leader wants to present that slide at a board meeting, but if you're building agentic capabilities on a per-seat model, that's the trajectory.
The fix: stop selling access and start selling outcomes. Shift from Software-as-a-Service to Service-as-a-Software.
When you sell a tool (Salesforce, Jira, Figma), you charge for the login. When you sell a result (the work itself, completed autonomously), you charge for the completed task. Tickets resolved. Contracts reviewed. Reports generated. Leads qualified. The pricing unit should be the smallest meaningful outcome your agent delivers, something the customer already understands and already values.
Price anchor: what did this task cost when a human did it? Your price should be meaningfully less than human cost, meaningfully more than inference cost. The spread is your margin, and unlike seat-based margin, it scales with volume.
Pricing for AI products
The COGS formula
Before setting any price, model the cost per query. Prompt caching changes the math significantly for products with repeated context (system prompts, document templates, recurring workflows).
COGS per Query = (Input Tokens × Cost/Token × Cache Miss Rate) + (Input Tokens × Cached Cost/Token × Cache Hit Rate) + (Output Tokens × Cost/Token) + Infrastructure Overhead
For a product with 80% cache hit rate and a model charging $3/M input tokens, $0.30/M cached tokens, and $15/M output tokens:
- 2,000 input tokens: (400 × $0.000003) + (1,600 × $0.0000003) + output cost
- The cached portion costs 90% less than uncached
At scale, prompt caching can reduce your input token costs by 70–90%. If your viability model doesn't account for caching, you're overestimating COGS and potentially killing features that are commercially viable.
AI pricing models
| Model | How it works | When it fits | Risk |
|---|---|---|---|
| Pure usage-based | Charge per unit (API call, token, query) | Developer tools, infrastructure products | Customer anxiety over unpredictable bills; revenue volatility |
| Outcome-based (Service-as-a-Software) | Charge per successful result (ticket resolved, report generated) | Agentic products replacing human work | Defining "success" is hard; you absorb the cost of failures |
| Stand-alone add-on | AI features sold as a separate subscription tier | Quick monetisation of AI on an existing platform | Creates adoption friction; risks becoming the "optional copilot" |
| Hybrid: platform fee + metered outcomes | Flat base fee for access, metered charge for autonomous work done | Most agentic products transitioning from SaaS | More complex to build and communicate |
| Hybrid: seat + credit pool | Each seat contributes to a shared pool of AI usage credits | Teams transitioning gradually from per-seat | Power users exhaust the pool; doesn't solve the cannibalisation paradox long-term |
For most agentic products, the hybrid platform fee + metered outcomes model is the right starting point. The platform fee protects baseline ARR and gives finance predictable revenue. The metered layer captures value as the agent handles more volume. Example: "Includes 500 autonomous ticket resolutions per month. $2 per resolution thereafter."
This model solves the cannibalisation paradox because revenue grows with agent output, not headcount. It also creates natural expansion revenue: as agents prove themselves, customers route more volume through them without a sales conversation.
The internal alignment problem
Pricing changes fail when internal incentives don't follow. If engineering builds features that reduce human workload while sales is incentivised to increase seat count, your company is at war with itself.
Realign three things simultaneously:
- Sales incentives. Commission on consumption revenue and platform expansion, not seat count.
- Customer success metrics. Measure outcomes delivered, not daily active users and logins.
- Product metrics. Track work completed autonomously. A user who spends less time in your product because the agent handled everything is a success, not a churn risk.
The audit tax
Multi-agent architectures (where a manager model audits worker model outputs) introduce a cost multiplier that most teams don't model until it's too late.
The math
Typical 2026 cost structure:
- Worker model (small, efficient, task-specific): ~$0.20/M tokens
- Manager model (reasoning-heavy, validates outputs): ~$5.00/M tokens
The manager is 25x more expensive. If you audit every worker output ("micromanager architecture"), the per-task cost increase is roughly 2,500%.
| Audit rate | Cost per task | Relative cost |
|---|---|---|
| 0% (worker only) | $0.0002 | 1x |
| 100% (every output) | $0.0052 | 26x |
| 20% (spot-check) | $0.0012 | 6x |
At 100,000 tasks per day, the difference between 100% audit and spot-check is $520 versus $120. Annualised, that's a $146,000 margin difference on a single workflow.
The spot-check architecture
Route high-confidence outputs directly to the user or next step. Only escalate low-confidence outputs to the manager model.
If your worker produces high-confidence outputs 80% of the time:
Blended cost = (0.80 × $0.0002) + (0.20 × $0.0052) = $0.0012 per task
That's a 75% cost reduction from full audit, with minimal reliability loss if your confidence scoring is well-calibrated.
Four approaches to confidence scoring (combine them):
- Model-native confidence. Ask the worker to rate uncertainty, or generate multiple candidates and measure agreement.
- Rule-based validation. For structured outputs, validate against known constraints. Nearly free.
- Historical calibration. Track actual accuracy against confidence scores over time. Adjust thresholds based on observed performance.
- Domain heuristics. Route known hard inputs (long documents, ambiguous language) to the manager proactively.
Pricing reliability as a tier
Enterprise customers who need 99%+ accuracy are asking for a higher audit rate, which means higher inference cost. Offer configurable reliability: let the customer set the confidence threshold based on their risk tolerance and budget. Low-risk use cases get the cheap tier. High-stakes use cases pay for the premium audit rate. A single reliability tier at a single price means you're either overcharging low-risk customers or subsidising high-risk ones.
Buy, build, or route
The traditional buy/build/partner framework assumed a binary choice. In 2026, the right framing is "route," because most production AI products use multiple models for different tasks.
What changed
Open-weight models closed the gap. Llama, Mistral, and DeepSeek deliver production-quality results for most tasks. You no longer need a proprietary API for everything.
Prompt caching changed the economics. A 90% cost reduction on cached tokens makes API-based approaches viable at scales where self-hosting was previously the only option.
Fine-tuning became accessible. You don't need a team of ML engineers to fine-tune a model on proprietary data. The tooling matured. The cost dropped. A product team can run a fine-tuning job in an afternoon.
The routing layer emerged. Instead of choosing one model, production systems route different tasks to different models based on complexity, cost, and latency requirements. Simple classification goes to a small, fast model. Complex reasoning goes to a frontier model. The routing layer is the architectural decision, not the model selection.
The 2026 decision matrix
| Factor | API (proprietary) | Open-weight (self-hosted) | Routed (multi-model) |
|---|---|---|---|
| Time to market | Days | Weeks (infra setup) | Weeks (routing logic) |
| Cost at low volume | Low (pay per token) | High (GPU allocation) | Medium |
| Cost at high volume | High (no volume ceiling on spend) | Low (amortised infra) | Lowest (optimised per task) |
| Data privacy | Data leaves your environment | Data stays internal | Configurable per route |
| Customisation | Limited to prompting | Full (fine-tuning, architecture) | Full per model in the stack |
| Vendor lock-in | High | None | Low (models are swappable) |
| Strategic moat | None (competitors use same API) | Moderate (fine-tuned on your data) | High (routing logic + fine-tuned models + data flywheel) |
The default recommendation for most products: Start with a proprietary API to validate the use case fast. Build routing logic early (even if you only have one model behind it) so you can swap and add models without re-architecting. Fine-tune an open-weight model on your proprietary data once you have enough usage data to make it worthwhile. The routing layer lets you migrate incrementally rather than making a single high-stakes bet.
The strategic moat isn't the model. It's the combination of routing logic, fine-tuned models trained on proprietary data, and the data flywheel that makes both better over time.
What commercially rigorous AI PMs look like
| Behaviour | What it looks like in practice |
|---|---|
| Models COGS before features | Runs the cost-per-query formula before writing the PRD, not after launch |
| Treats reliability as a pricing lever | Offers tiered audit rates rather than promising blanket accuracy |
| Monitors the flywheel | Tracks model accuracy against usage volume to prove the compounding loop |
| Stress-tests the cannibalisation math | Models what happens to revenue when the agent handles 50%, then 80% of the workload |
| Prices on outcomes, not access | Defines the work unit the customer values and builds pricing around it |
| Builds for replacement, not assistance | Asks "should this workflow exist?" before asking "how do we add AI to this workflow?" |
| Designs the routing layer early | Architects for multi-model from day one, even if shipping with a single model |
The anti-pattern: viability theatre
The PM who runs a TAM analysis, picks a pricing model from a textbook, and calls the business case "validated." No COGS modelling. No cannibalisation analysis. No data feasibility report. No understanding of whether the AI feature adds cost on top of the existing platform or replaces cost within it.
Viability theatre produces impressive slide decks and catastrophic P&L surprises. The AI feature launches, usage grows, inference costs spike, margins collapse, and leadership pulls funding. The feature was never commercially unviable. It was never commercially analysed.
Do the math first. Then build.