The Data Moat Illusion: Why Incumbent Proptech Companies Are Vulnerable

TL;DR

The proptech industry has operated on a simple premise for decades: whoever holds the most historical, structured property data wins.
Generative AI models can now instantly extract high-fidelity structured data from public, unstructured sources like listing photos and satellite imagery.
Data hoarding is no longer a sustainable moat. The new moat is workflow orchestration, speed, and deep integration into user decision-making processes.

Spend any time in the proptech data industry and the competitive logic becomes clear fast: moats were built on data scarcity. I learned this during my years at Cotality (formerly CoreLogic), watching how incumbents accumulated decades of structured property records that competitors couldn't replicate.

If you wanted to build an Automated Valuation Model (AVM) or a predictive market tool, you needed a massive proprietary database. You needed a small army of researchers, complex municipal data agreements, and decades of historical sales records to know how many bedrooms a house had, when it was last renovated, and what it sold for in 2011.

That structured data was hard to assemble, which made it an impenetrable moat against new entrants.

That moat is now an illusion. Multi-modal AI models can extract the same structured intelligence from public listing photos and satellite imagery that incumbents spent decades assembling. We have moved from data scarcity to data abundance, and every competitive assumption built on the old model needs rethinking.

Unstructured Data is the New Database

The structured databases that incumbents spent decades building are now being commoditised by visual intelligence.

Today, you don't need a historical database to know a property's current condition. A foundational multi-modal model can process a typical real estate listing's 30 photos and instantly generate a structured dataset richer than what exists in legacy municipal records.

The AI can detect that the roof is slate and likely needs repairs soon. It can assess that the kitchen renovation is "builder grade" rather than premium. It can cross-reference street view and satellite imagery to calculate the actual tree canopy coverage over the backyard, or estimate the ambient noise level based on proximity to main roads.

This is the evaporation of the data moat. When unstructured public visual data becomes functionally identical to proprietary structured databases via AI extraction algorithms, the barrier to entry collapses. Startups no longer need to spend a decade acquiring municipal property records to build an accurate AVM; they just need access to the current listing photos and a strong reasoning model.

The Shift to Workflow Orchestration

When data becomes a commodity, value moves up the stack to workflow.

The winner in the next decade of proptech won't be the company with the biggest database. It will be the company that embeds real-time contextual intelligence directly into the user's actual tasks.

As I wrote previously, code is a commodity, and your moat is trust. When every platform has access to the exact same high-fidelity property insights, the only way to retain users is to execute tasks on their behalf.

This means shifting from being a system of record to becoming an agentic workflow. If a buyer's agent is using your software, they don't just want a list of recent sales; they want the software to draft the competitive offer strategy based on the visual condition of the home compared to the neighbourhood baseline.

Incumbents are sitting on vast reservoirs of proprietary data, feeling secure. Data that can be inferred from a JPEG is not a moat. It's a head start. And head starts evaporate quickly when startups are rewriting architectures in days.

The Incumbent Response Playbook

When incumbents feel disruption approaching, they follow a predictable sequence: add a feature first, announce a partnership second, acquire a startup third, then wonder why none of it shifted their competitive position.

The acquisition is usually the most expensive mistake. Buying a $50m AI startup and folding it into a 15-year-old codebase doesn't make you AI-native. It makes you an incumbent with a new line in the org chart. The underlying database architecture (designed for structured records rather than vector embeddings, batch queries rather than real-time inference) doesn't get replaced in an integration. It gets worked around.

Partnerships are more honest about this constraint. Redfin contracting Sierra AI to power its conversational layer is effectively saying: our architecture can't do this natively. That's rational. It's also not a moat. Any portal can sign the same contract.

The companies that survive the next five years in proptech won't be the ones that added the most AI features to their existing products. They'll be the ones that treated AI as a foundation layer and rebuilt from there. Or the startups that never had legacy architecture to protect in the first place.

What the New Moat Actually Looks Like

The new moat in proptech has three layers.

The first is workflow depth. Genuine integration into the daily tasks of buyers' agents, lenders, and conveyancers, not a chat interface bolted over a map. The platform that sits inside the workflow, generating offer strategies, flagging valuation risks, and drafting disclosure summaries, becomes far harder to remove than a database subscription. Every workflow that runs through the platform trains it to be better at that workflow. Raw property data can't generate that compounding value.

The second is preference calibration. Platforms that accumulate a rich model of user preferences (what buyers respond to visually, how they trade off proximity against space, what price ranges they actually engage with versus filter for) hold an asset no structured property database can replicate. Visual preferences in particular are difficult to articulate but easy to observe. A platform that watches 10,000 buyers browse for six months knows more about housing taste than any attribute schema.

The third is trust in model outputs. When the AI's condition assessment consistently outperforms the traditional AVM, lenders stop second-guessing it. That trust is earned in small increments over months. It compounds. Once a lender trusts your model, switching to a competitor's means rebuilding that trust from scratch.

That's the new data moat. Not records. Relationships, calibration, and validated model trust.

The Data Moat Illusion: Why Incumbent Proptech Companies Are Vulnerable

TL;DR

Unstructured Data is the New Database

The Shift to Workflow Orchestration

The Incumbent Response Playbook

What the New Moat Actually Looks Like

Continue reading

SaaS Isn't Dead. Hollow SaaS Is.

The Search Box Is Dead: How AI Will Eat the Property Portal

Google's Real Estate Listings: Why Aggregators Should Worry