Discovery, Feedback, and the Problem Queue

Why the backlog is a problem collection not a task queue, continuous discovery, feedback channels, lighthouse users, and AI-native risk assessment.

TL;DR

The product backlog is a collection of problems, not a task queue. Its job is to make the next bet obvious.
The delivery backlog doesn't need management. It needs to be short. When you can ship an item in an afternoon, grooming a two-week queue of sub-tasks is pure overhead.
Discovery is not a phase. It runs continuously in parallel with delivery.
Feedback from multiple channels, triangulated and synthesised, is the raw material for every good product decision.

The problem with most backlogs isn't that they mix strategy with execution. It's that they treat bets like tasks.

A backlog item should answer a high-level question: what problem are we solving next? Not: what ticket should I pick up? When you collapse those two things into one list, you get the graveyard: feature requests, half-formed ideas, urgent bugs, and strategic initiatives all competing for attention in the same queue. Decisions get made by gut feel or whoever shouts loudest.

The fix is structural: separate strategy from execution with two distinct backlogs. But there's a more important point underneath that. When an AI-native team can close multiple backlog items in an afternoon, the delivery backlog stops needing active management. What demands attention is the product backlog: keeping it sharp enough that the next problem is always obvious without a meeting.

The product backlog: the "what" and "why"

The product backlog is your long-term, ever-evolving space for strategic thinking. Owned by the product manager, it answers: "What should we invest in and why?"

It contains:

Ideas – vague, one-line summaries capturing an initial thought or piece of feedback
Problems and opportunities – validated customer problems the team is considering
Solutions and hypotheses – potential solutions and experiments designed to test assumptions
Insights – evidence linked to each item, sourced from user feedback, customer conversations, research, and product analytics

Every item functions as a central object for collecting the evidence that supports it. This creates a clear, defensible rationale for decisions.

Organising the product backlog

Two pragmatic frameworks:

Boulders, rocks, and pebbles. Balance investment across work sizes. Boulders are large strategic bets with high potential and uncertainty. Rocks are medium investments like new features. Pebbles are small, straightforward tasks.

Long, medium, and short lists. Organise by maturity. The long list is "someday, maybe." The medium list is attractive opportunities you could legitimately invest in. The short list is what the team has committed to exploring, and this forms the roadmap.

One practical note on pebbles: if an item takes less than a day to ship, it probably shouldn't live in the product backlog. It needs a decision and execution, not a ticket. The backlog earns its keep as a home for boulders and rocks. Tracking pebbles there is a sign you're managing a to-do list, not a strategy.

The delivery backlog: the "how" and "when"

The delivery backlog is a short-term, tactical tool owned by the engineering team. It answers one question: what are we doing right now and this week?

That scope is intentional. In an AI-native team, the gap between "problem identified" and "shipped" can be a single afternoon. You can close multiple delivery items before the next planning session. A deep, elaborately groomed delivery backlog is a sign you've spent more time planning items than building them.

A healthy delivery backlog contains at most two weeks of itemised work. If it's longer, something has gone wrong: either your items are too granular, or you're grooming a queue that execution will outpace.

For complex, multi-sprint initiatives, the traditional structure still applies: epics, stories, tasks, sub-tasks, bugs. For smaller bets, a problem statement and a clear acceptance criterion are enough to start. Don't write three layers of tickets for something that will ship tomorrow.

Discovery: finding the right thing to build

Discovery is the continuous process of deeply understanding customers and their problems. It's the most critical stage of the product lifecycle and the antidote to risk: validating an idea with a simple prototype is orders of magnitude cheaper than building something nobody uses.

Discovery is not a one-off activity. It's a constant, iterative loop running in parallel with delivery.

Three discovery principles

Minimise waste. Time and resources are your most valuable assets. Validate to build, not build to validate. The faster you can fail, the faster you can learn.
De-risk every idea. Before committing resources, assess product risks early and continuously against key risk categories.
Embrace rapid experimentation. Discovery is not about debate. It's about evidence. Use prototypes to test ideas quickly with users and stakeholders.

The risk framework

For any initiative, assess risk across two dimensions:

Traditional software risks:

Value risk – Will the customer buy or choose to use it?
Usability risk – Can the user figure out how to use it?
Feasibility risk – Can engineers build it with the time, technology, and skills available?
Viability risk – Will this solution work for the business?

AI-native risks (mandatory for AI-first features):

Probabilistic risk – The model is non-deterministic and will be wrong. It can produce plausible-sounding falsehoods. Test for frequency and impact.
Reliability risk – Compound accuracy degrades fast in agentic workflows (the 95% trap). Map the full chain and test end-to-end, not step-by-step.
Cost risk – Inference is not free. A feature that works brilliantly in a demo can be economically unviable at 10,000 daily users. Model the per-query cost at projected scale before committing to a solution.
Integration risk – AI features rarely exist in isolation. They need tool access (MCP), data connectivity, and orchestration across systems. Test that the model can reliably call the right tools, handle errors from external systems, and maintain context across multi-step workflows.
Ethical and bias risk – The model will reproduce biases in its training data. Test for and mitigate harmful biases.
Adversarial and security risk – The model is a new attack surface. Test for vulnerabilities like prompt injection.

The idea lifecycle

Every idea moves through four stages. Teams move between them as they learn. Many ideas get abandoned. That's the point.

1. Wonder: Is this problem worth solving?

Problem framing. Use the Working Backwards press release to articulate the customer's pain and the value of solving it before writing code. Use the Five Whys to uncover root causes. Often the strongest signals come from watching users misuse your product to do something it wasn't designed for. That mismatch between intended use and actual use is latent demand, and it's the most reliable source of new-product ideas.

2. Explore: Have we found the right solution?

Assumption testing and prototyping. Use lightweight methods to de-risk:

Prototypes, from sketches to interactive mockups
Concierge MVPs, where you manually perform the service before automating
Wizard of Oz tests, where you fake automation with a human backend

For AI-native products, the explore phase has compressed dramatically. Add:

AI Wizard of Oz tests – Pipe a user's request to an off-the-shelf LLM API to test value proposition and usability for minimal cost. In 2026, this is a core discovery technique. A product builder with access to AI coding tools can vibe-code a working prototype in hours, not weeks. The prototype won't be production-grade, but it will answer the two questions that matter: "Do users want this?" and "Can an LLM deliver acceptable quality?" This collapses the gap between wonder and explore. You can go from "interesting problem" to "users testing a working prototype" in a single day. Prototyping has replaced research as the fastest path through discovery.
AI red teaming – A cross-functional group actively tries to break the prototype model before heavy engineering begins.
LLM evaluation ("evals") framework – A prompt set covering common and edge cases plus a rubric defining what "good" looks like. This becomes the acceptance criteria for AI features. See evaluation frameworks for the full methodology.

3. Make: Are we building it right?

Development work. Connect solutions to delivery tickets and track progress.

4. Impact: Is it delivering value?

Measuring results and iterating continuously. A solution is never truly "done."

Discovery rituals that work

Engineers in customer interviews. For any significant initiative, engineers join customer interviews alongside the PM and designer. When an engineer hears the customer's raw pain directly, they can identify cheaper or faster alternatives that achieve the same user outcome.

The eleven-star framework. Before converging on a solution, imagine the full spectrum of the user experience from 1-star (disaster) to 11-stars (magical, impossible). From this ambitious vision, come backward to define a realistic but exceptional product.

Separate exploration from convergence. Use a "design critique" forum (designers only, psychologically safe space for work-in-progress) and an "alignment review" forum (full cross-functional team, formal gate from Explore to Make).

Continuous feedback

Discovery gives you direction. Feedback keeps you honest. Without continuous feedback, you prioritise based on gut feel or whoever shouts loudest. With it, you make evidence-based decisions that compound over time.

Feedback channels

A balanced view requires multiple sources. No single channel tells the whole story.

Inbound user feedback. Use an in-app feedback collector or a dedicated portal for unstructured feedback. This creates a managed space, prevents backlog clutter, and lets you close the loop by notifying users when you ship something based on their input.

Internal feedback channels. Sales, support, and customer success teams hear things the product team never will. Use dedicated Slack or Teams channels where stakeholders share customer feedback, product gaps, and competitive intelligence. This prevents the product team from drowning in direct messages and keeps feedback organised.

Surveys. Target specific user cohorts based on segmentation or product usage. Use recurring surveys (monthly CSAT) and one-off surveys for specific questions. Surveys often produce more thoughtful feedback than ad-hoc conversations.

The triangulation pitfall. Over-relying on any single channel creates blind spots. Enterprise sales feedback skews toward the needs of your largest accounts. Support tickets skew toward friction and bugs. In-app feedback skews toward power users. You must use all channels together. Any insight validated by only one source should be treated as a hypothesis, not a fact.

Lighthouse users

Lighthouse users are a small, dedicated group who act as co-creators of your solutions. Working closely with them lets you move faster and get richer, more contextual insights than trying to please thousands of users at once.

Selecting lighthouse users. The best candidates share these attributes:

Clear communicators who can articulate their problems and experiences
In your target segment, representing the customer you're building for
Strongly affected by the problem, with real skin in the game
Open to new ways of working, willing to try new approaches rather than just requesting specific features
Comfortable with early-stage products, tolerating bugs in exchange for influence

Working with them. Treat them as partners, not guinea pigs. Set up dedicated communication channels. Conduct regular check-ins. Share early designs and prototypes before writing code. Provide early access to new features for rapid learning. Close the feedback loop by showing them how their input shaped the product.

The relationship is reciprocal. They get early access and influence over a product they depend on. You get signal quality that no survey or analytics dashboard can match.

Turning feedback into insight

Raw feedback is not insight. It becomes insight through a deliberate process:

Aggregate – group similar feedback to identify patterns
Triangulate – cross-reference across channels to validate
Quantify – attach frequency and impact data to qualitative themes
Synthesise – distil patterns into actionable problem statements
Prioritise – feed the strongest insights into your product backlog with evidence attached

The goal is a continuous flow of evidence-based insight that directly informs planning and prioritisation.