Build for the Model That Doesn't Exist Yet

TL;DR

AI products built for today's model capabilities will be obsolete or over-engineered within six months
The winning strategy is to build product architecture that improves automatically as models improve, not architecture that compensates for model weaknesses
I built two production platforms on this principle and watched features go from "barely works" to "reliable" without changing a line of product code

Most product development assumes a stable technology foundation. You know what the database can do. You know what the framework supports. You design features within those constraints and ship them.

AI product development doesn't work that way. The foundation moves under you every three to six months, and it moves in one direction: up. The model you're building on today will be meaningfully worse than the model available when your product ships. And the model available at ship will be meaningfully worse than the model your users are running on six months later.

This creates a product design challenge with no historical precedent. You're building on a capability curve, not a capability snapshot.

What building on a moving target looks like

When I started building OpenChair in late 2025, the models available were good enough for basic tasks and unreliable for complex ones. Multi-step workflows with tool use would derail after four or five steps. Structured output was inconsistent. Long-context performance degraded noticeably past 30,000 tokens.

I had two choices. Build elaborate scaffolding to compensate for these weaknesses (retry logic, output parsers, step-by-step orchestration, fallback chains). Or build minimal scaffolding and bet that the weaknesses would be resolved by model improvements within months.

I chose the second path, and it was uncomfortable.

In the early weeks, features that relied on multi-step reasoning failed maybe 30% of the time. The AI appointment booking system would sometimes lose context mid-conversation. The business analytics summaries occasionally produced nonsensical interpretations. It wasn't production quality. I shipped it anyway, with appropriate guardrails, because I was building for the model six months out, not the model of December 2025.

By March 2026, without changing my product code, those same features had failure rates under 5%. The model improved. My architecture was designed to benefit from that improvement automatically, because I hadn't built compensating mechanisms that would now fight the model's native capabilities.

Why does compensating for current model weaknesses create technical debt?

The alternative approach, heavy scaffolding, creates a specific problem: it works well today and becomes technical debt tomorrow.

If you build a retry chain that catches malformed JSON output and re-prompts the model with stricter formatting instructions, that chain solves a real problem with current models. But when the next model generates valid JSON 99.5% of the time, your retry chain is now unnecessary complexity that adds latency, cost, and maintenance burden. Worse, the retry logic might actually degrade performance if the re-prompt pattern conflicts with the model's improved native capabilities.

I've seen this play out across multi-model orchestration architectures. Teams build elaborate routing logic to send simple tasks to cheaper models and complex tasks to expensive ones. That routing layer is a snapshot of today's cost-capability tradeoffs. Six months later, the cheap model handles the complex tasks fine, but the routing layer still exists, adding latency and operational complexity.

The principle: every piece of scaffolding you build to compensate for model weaknesses is a bet against model improvement. Some of those bets will be correct (models won't solve everything). Most of them won't.

Timeline with model capability snapshots at intervals, each larger, product architecture adapting

How to build for the future model

The practical question is: how do you build architecture that improves with the model instead of fighting it?

Give the model tools, not instructions. Instead of hard-coding a workflow ("first query the database, then summarise the results, then format the output"), give the model a database query tool, a summarisation prompt, and a formatting template, and let it decide the order and approach. Today's model might fumble the orchestration. Tomorrow's model will nail it. Your architecture benefits either way because you're not locked into a specific execution pattern.

Design for capability graduation. Some features will be marginal with current models and strong with future ones. Build them anyway, but gate them appropriately. I built an AI-powered business insights feature for OpenChair that initially required human review before surfacing to the business owner. As the model improved, I didn't need to rebuild the feature. I just adjusted the confidence threshold for automated surfacing. The architecture supported graduation from human-in-the-loop to fully automated without structural changes.

Keep your prompt layer thin. Long, detailed system prompts that specify exact behaviours are another form of scaffolding. They work with the model they were written for and often break with the next model because the new model interprets instructions differently. Keep system prompts focused on role definition and constraints, not step-by-step procedures. Let the model bring its own reasoning to the task.

Use evals as your early warning system. If you're building for a future model, you need to know when the future arrives. Eval suites that test your product across model versions tell you exactly when a capability has graduated from unreliable to dependable. Run your eval suite against new models the day they drop. The features that suddenly pass are features ready for promotion.

The uncomfortable middle

Building for the future model means living with a product that's not fully polished today. This is deeply uncomfortable for product people trained in the "ship when it's ready" school.

The resolution is appropriate guardrails rather than delayed shipping. For the features that work today, ship them with confidence. For the features that are close but not quite there, ship them with human-in-the-loop checkpoints, clear user expectations, and monitoring that tells you when the model has caught up.

At Cotality, I learned this principle in a different context. We shipped property valuation tools that initially required manual valuer review for every output. Over time, as the models and data improved, we graduated to automated confidence scoring with human review only for edge cases. The architecture was designed for graduation from day one, even though the initial user experience included the manual step.

AI products should follow the same pattern. Design the autopilot. Ship the copilot. Graduate as the model earns trust.

What to bet on

If you're building an AI product today and want to bet on the model six months from now, here are the capability trajectories that seem most reliable:

Tool use will get significantly better. Models are improving rapidly at selecting the right tool, using tools in sequence, and recovering from tool use errors. Build features that rely on tool use even if today's reliability isn't perfect.

Long-running tasks will become viable. The duration a model can operate autonomously before going off-track is extending from minutes to hours. Build architectures that support long-running agent tasks even if you currently need to break them into shorter segments.

Multi-modal understanding will converge. Text, image, audio, and code understanding are converging in capability. Build products that pass multiple modalities to the model even if the model currently handles them unevenly.

Structured output will become reliable. JSON, function calling, and schema-conformant output are all improving rapidly. Don't build elaborate output parsing. Trust the model to conform to structure and add minimal validation for safety.

The products that win won't be the ones that solved today's model limitations most cleverly. They'll be the ones whose architecture lets tomorrow's model do its best work. Build the platform. Let the model grow into it.

Key takeaways

AI product features that had 30% failure rates in December 2025 dropped to under 5% failure by March 2026 without any product code changes, because the architecture was designed to benefit from model improvements automatically
Every piece of scaffolding built to compensate for current model weaknesses is a bet against model improvement, and most of those bets will lose as models improve on 3–6 month cycles
The practical approach is to give models tools rather than hard-coded workflows, design for capability graduation (human-in-the-loop to fully automated), and keep system prompts focused on role and constraints rather than step-by-step procedures
Eval suites tested against new models on release day are the early warning system that tells you when a feature has graduated from unreliable to dependable

Frequently Asked Questions

How do you know which model improvements to bet on?

Follow the scaling laws. Capabilities that are improving on benchmark curves (tool use, long-context reasoning, structured output) are safe bets. Capabilities that are plateau-ing or inconsistently measured are riskier bets. When in doubt, read the research papers from the major labs, specifically their capability evaluations across model sizes.

What if the model doesn't improve in the direction you bet on?

That's the risk. Mitigate it by keeping your architecture modular so you can add scaffolding later if the model doesn't graduate. The cost of adding scaffolding to a clean architecture is much lower than the cost of removing scaffolding from a cluttered one.

Does this mean you should never optimise for current model performance?

Optimise your prompts and tool descriptions for the current model. Don't optimise your architecture for the current model's weaknesses. Prompts are cheap to update. Architecture is expensive to restructure.