What are the three agent handoff topologies?

The three primary handoff topologies are hub-and-spoke (a central orchestrator routes all conversations, used by Microsoft Copilot Studio), mesh (agents communicate directly with peers, resilient to individual failures), and swarm (lightweight peer-to-peer handoffs with minimal overhead, used by OpenAI Agents SDK). Each topology has different trade-offs for control, resilience, and latency.

How does context persistence work in AI agent systems?

Context persistence operates across three memory layers. Short-term memory holds the active conversation state and working scratchpad. Long-term memory stores user profiles, preferences, and historical patterns across sessions. Episodic memory enables semantic retrieval of specific past interactions. Google Vertex AI Agent Builder implements both short-term and long-term memory natively through Sessions and Memory Bank.

What is the conversation state machine pattern?

The conversation state machine models agent interactions as a four-phase loop: Perception (agent receives and interprets user input), Reasoning (agent plans its response using context and tools), Action (agent executes the chosen action), and Observation (agent evaluates the result and determines the next step). This loop runs continuously until a stop condition is met, such as task completion or escalation.

How do you prevent context degradation in long conversations?

Context degradation occurs when conversations exceed the model context window or accumulated errors compound. Prevention strategies include conversation summarization with entity extraction, state checkpointing at key decision points, context rewind (Google Vertex AI supports rewinding to any conversation point to remove polluted context), and spawning fresh subagents with clean contexts while maintaining continuity through structured handoffs.

What is mindful handoff design?

Mindful handoff design ensures that when an agent transfers a conversation to another agent or a human, the full context travels with it. Intercom Fin uses internal notes and context panels that carry full conversation history. Microsoft Copilot Studio maintains complete conversational context during agent-to-human handoffs. The key principle is that the user should never have to repeat information after a handoff.

How do production platforms implement multi-turn conversations?

Salesforce Agentforce uses hybrid reasoning combining deterministic workflows with flexible LLM reasoning via the Atlas Reasoning Engine, and tests conversations turn by turn with cumulative history. Microsoft Copilot Studio uses generative orchestration where an LLM planning layer interprets intent and selects tools. Google Vertex AI Agent Builder provides short-term and long-term memory, state recovery, and conversation rewind. Amazon Bedrock uses executionId for conversation-level state tracking.

Published: March 11, 2026•18 min read

Conversation Flow Architecture [4 Design Layers]

LLMs lose 39% accuracy after 5 turns. Learn 4 conversation flow layers: state machines, context persistence, handoff topologies, and state recovery for production agents.

by Lloyd Pilapil

Geometric illustration of interconnected conversation bubbles flowing between human figures and AI agents in a hub-and-spoke pattern, representing multi-turn conversation flow architecture

39%

accuracy drop in multi-turn vs single-turn LLM conversations, with average failure at turn 9

Source: Microsoft Research / Salesforce Research

Your Agent Forgets Everything After Turn 9

Here is the number that should terrify every team building agentic products: LLMs exhibit a 39% average performance drop in multi-turn conversations versus single-turn interactions.

That finding comes from Microsoft Research and Salesforce Research, who tested every major LLM (GPT-4o, Claude, Gemini, DeepSeek) and found the same pattern. Single-turn accuracy of 95% collapses in real multi-turn flows. The average time to failure mode is 9.21 turns.

The three failure modes are consistent across all models:

Premature assumption: The agent jumps to conclusions based on partial context instead of asking clarifying questions
Prior response anchoring: The agent over-relies on its own previous answers, compounding early errors
Correction failure: When users try to correct the agent, it cannot properly integrate the correction into its reasoning

Reasoning models like o3 and DeepSeek-R1 degrade equally. Known remediations (agent-like concatenation, lower temperature) are ineffective. This is not a model problem. It is an architecture problem.

TL;DR

LLMs lose 39% accuracy in multi-turn vs single-turn conversations, with average failure at turn 9.21 (Microsoft Research + Salesforce Research, 2025)
The three failure modes: premature assumption, prior response anchoring, and correction failure. All models affected equally.
Context persistence requires three memory layers: short-term (active session), long-term (cross-session profiles), and episodic (retrievable past interactions)
Three handoff topologies serve different needs: hub-and-spoke (control), mesh (resilience), swarm (speed). Choose based on your failure tolerance.
Google Vertex AI now supports conversation rewind to any point, removing polluted context. This is the state recovery pattern production systems need.
Over 40% of agentic AI projects will be canceled by end of 2027 due to escalating costs and inadequate architecture (Gartner, 2025)

Multi-turn conversation is the hardest unsolved problem in agentic AI. The solution is not better models. It is better architecture: state machines, layered memory, structured handoffs, and context recovery.

The Multi-Turn Problem in Numbers

Before designing solutions, understand the scale of what breaks when conversations go beyond a single exchange.

The Multi-Turn Problem in Numbers

Why conversation architecture is not optional

39%

average performance drop in multi-turn vs single-turn

Microsoft Research + Salesforce, 2025

9.21

average turns before failure mode triggers

Psychology Today, Feb 2026

40%+

of agentic AI projects will be canceled by 2027

Gartner, June 2025

1m 38s

average chatbot-only conversation length

Fullview, 2025

The Three Failure Modes

Premature Assumption

Agent jumps to conclusions from partial context

Prior Response Anchoring

Agent over-relies on its own previous answers

Correction Failure

Agent cannot integrate user corrections

The Gartner prediction is stark: over 40% of agentic AI projects will be canceled by end of 2027 due to escalating costs, unclear value, or inadequate risk controls. Multi-turn failure is a primary driver. When your agent loses context at turn 9, every subsequent interaction costs more (in tokens, latency, and user trust) while delivering less.

Meanwhile, the average chatbot-only conversation lasts 1 minute 38 seconds. With live chat handover, it extends to 15 minutes 21 seconds. Users abandon agents that lose context. They stay with systems that remember.

“All top LLMs exhibit a 39% average performance drop in multi-turn conversations. Reasoning models degrade equally. Known remediations are ineffective.”

Microsoft Research + Salesforce Research, 2025

The Conversation State Machine

Every production agent system, whether it is Salesforce Agentforce, Microsoft Copilot Studio, or a custom build, follows the same fundamental architecture: a perception, reasoning, action, observation loop that runs until a stop condition is met.

The Conversation State Machine

Every production agent follows this loop until a stop condition is met

Perception

Interpret user input in full conversation context

Salesforce Atlas Reasoning Engine

Reasoning

Plan response using tools, knowledge, and history

Copilot Studio Generative Orchestration

Action

Execute the chosen action (gated by trust patterns)

Intent Preview from Part 2

Observation

Evaluate result, decide next step or stop condition

State checkpoint + Action Audit

Loop continues until: task complete, user exit, or escalation trigger

Perception: Reading the Full Signal

Perception is not just parsing the user's latest message. It is interpreting that message in the context of everything that came before: previous turns, user profile data, active task state, and environmental signals (time of day, device, location).

The design challenge is deciding what context to include. Too little context and the agent makes premature assumptions (failure mode 1). Too much context and you hit token limits, increasing latency and cost.

Production pattern: Salesforce Agentforce solves this with hybrid reasoning, combining deterministic workflows (scripted paths for known scenarios) with flexible LLM reasoning via the Atlas Reasoning Engine. The script handles structured context; the LLM handles ambiguity.

Reasoning: Planning Before Acting

The reasoning phase is where the agent decides what to do. In single-turn interactions, this is straightforward. In multi-turn flows, reasoning must account for accumulated context, unresolved threads, and potential contradictions between earlier and later user statements.

Production pattern: Microsoft Copilot Studio's generative orchestration uses an LLM planning layer that interprets intent, breaks down complex requests into steps, selects appropriate tools, and executes multi-step plans with guardrails. The planner operates at a higher abstraction level than the executor.

Action: Executing with Reversibility

The action phase is where the agent changes the world: sending an email, updating a record, making an API call. In Part 2, we covered the Intent Preview pattern that gates actions through user approval. In multi-turn flows, the additional challenge is ensuring that actions from turn 3 remain coherent with context established in turn 1.

Observation: Closing the Loop

After acting, the agent observes the result and decides: is the task complete, should I continue, or should I escalate? This phase is where prior response anchoring (failure mode 2) causes the most damage. The agent observes its own action and over-indexes on it in subsequent reasoning.

Design principle: Treat each observation as a checkpoint. Log the state, the action taken, and the result. This log becomes the Action Audit trail from Part 2 and the foundation for state recovery (covered below).

Context Persistence: The Three Memory Layers

The difference between an agent that works for one turn and an agent that works for fifty turns is memory architecture. Production systems need three distinct memory layers, each serving a different purpose.

The Three Memory Layers

Production context persistence requires all three

Short-Term Memory

Active session

Structured scratchpad for the current conversation

Current goal

Active entities

Unresolved questions

Rolling summary

Google Agent Engine Sessions

Long-Term Memory

Cross-session

User profiles and historical patterns that persist

User preferences

Trust level

Past decisions

Communication style

Amazon Bedrock AgentCore Memory

Episodic Memory

Semantic retrieval

Retrievable past interactions organized by topic

Past conversations by topic

Relevant precedents

Historical recommendations

Resolution patterns

Google Memory Bank

Short-Term Memory: The Working Scratchpad

Short-term memory holds the active conversation state: what the user said, what the agent planned, what actions were taken, and what remains unresolved. This is the context window in its most direct form.

The problem: Naive implementations dump the entire conversation transcript into the context window. As GetMaxim's research on context management shows, naive truncation (dropping oldest messages) discards still-relevant information, creating experiences where agents "forget" previously discussed topics even when the conversation is well within token limits.

Production pattern: Instead of raw transcript, maintain a structured scratchpad: current goal, active entities, unresolved questions, and a rolling summary of prior turns. This is what Google's Agent Engine implements through its Sessions feature (now GA), keeping working context separate from raw conversation history.

Long-Term Memory: The User Profile

Long-term memory persists across sessions. It stores user preferences, historical patterns, past decisions, and relationship context. When a user returns after a week, the agent should know their communication style, their project context, and their trust level (from the progressive autonomy model in Part 2).

Production pattern: Amazon Bedrock AgentCore Memory provides a managed memory service that eliminates the infrastructure complexity of building cross-session persistence. It separates storage from retrieval, letting agents access long-term context without manual state management.

Episodic Memory: The Retrievable Past

Episodic memory is the most sophisticated layer. It enables the agent to semantically search past interactions for relevant precedents. "Last time you asked about pricing, I recommended the enterprise tier based on your team size. Has that changed?"

This is not keyword search. It is semantic retrieval that understands the meaning of past conversations and surfaces relevant episodes based on the current context.

Production pattern: Google's Memory Bank uses a topic-based approach where agents organize and recall past interactions by subject matter rather than chronology. This means an agent can pull relevant context from a conversation that happened three months ago if the topic matches.

“The average chatbot-only conversation lasts 1 minute 38 seconds. With live chat handover: 15 minutes 21 seconds. Users abandon agents that lose context.”

Fullview AI Chatbot Statistics, 2025

Handoff Topologies: How Agents Pass Conversations

When a single agent cannot handle a conversation alone (it needs specialized knowledge, human judgment, or simply reaches its capability boundary), the conversation must transfer. The architecture of that transfer determines whether the user experiences a seamless continuation or a frustrating restart.

Handoff Topologies

How agents transfer conversations to other agents or humans

Hub-and-Spoke

Central orchestrator routes all conversations

Complete visibilityEasy to auditConsistent context

Microsoft Copilot Studio

Mesh

Agents communicate directly with peers

No single point of failureLower latencyScales horizontally

Anthropic MCP

Swarm

Lightweight peer-to-peer handoffs

Minimal overheadFastest handoffsSimple mental model

OpenAI Agents SDK

The Handoff Contract: Conversation summary + active entities + current goal + trust level + action history

Hub-and-Spoke: The Central Orchestrator

A central orchestrator agent receives all conversations and routes them to specialized agents or human operators. The hub maintains the master context and ensures nothing is lost in transit.

Strengths: Complete visibility, consistent context management, easy to audit and monitor. Weaknesses: Single point of failure, potential bottleneck at scale, latency for every routing decision.

Production example: Microsoft Copilot Studio's multi-agent orchestration uses this pattern. A primary agent delegates tasks to specialized agents across Azure AI Agents Service, Microsoft Fabric, and M365. The orchestrator maintains the conversation thread.

Mesh: Direct Agent-to-Agent

In a mesh topology, agents communicate directly with each other without a central coordinator. Each agent knows which peers can handle which capabilities and routes conversations accordingly.

Strengths: No single point of failure, lower latency for direct handoffs, scales horizontally. Weaknesses: Harder to maintain consistent context across the mesh, more complex to audit, potential for circular routing.

Production example: Anthropic's MCP (Model Context Protocol), now under the Linux Foundation, enables this pattern by providing a standardized protocol for agent-to-agent communication and tool use. Agents track inputs, tool outputs, and intermediate states across interactions through the protocol rather than a central hub.

Swarm: Lightweight Peer Handoffs

The swarm pattern uses minimal-overhead handoffs between peer agents. There is no central orchestrator and no complex routing logic. An agent that recognizes it cannot handle a request simply transfers the conversation (with full context) to the appropriate peer.

Strengths: Minimal overhead, fastest handoff latency, simple mental model for developers. Weaknesses: Limited global visibility, context preservation depends on each agent implementing the protocol correctly.

Production example: OpenAI's Agents SDK (released March 2025, production successor to Swarm) implements this as a core primitive. A TriageAgent analyzes the query and calls transfer_to_support() or transfer_to_sales(), passing the full conversation context in the handoff.

Topology	Control	Resilience	Latency	Best For
Hub-and-Spoke	Highest	Lowest (single point of failure)	Higher	Regulated industries, audit requirements
Mesh	Medium	Highest (no single point of failure)	Medium	Complex multi-domain systems, resilience-critical
Swarm	Lowest	Medium	Lowest	Speed-critical, developer-friendly, rapid iteration

The Handoff Contract

Regardless of topology, every handoff must include what we call the handoff contract: the minimum context that must transfer for the receiving agent (or human) to continue without asking the user to repeat anything.

The contract includes:

Conversation summary: What has been discussed, decided, and left unresolved
Active entities: People, products, dates, and reference numbers mentioned
Current goal: What the user is trying to accomplish
Trust level: The user's current autonomy settings and trust phase (from Part 2)
Action history: What the transferring agent already tried

Production example: Intercom Fin implements automatic handoff when it detects frustration, repeated loops, or explicit human request. Internal notes and the context panel carry the full conversation history to human agents. The user never has to repeat their issue.

State Recovery: When Context Breaks

Context will degrade. Conversations will exceed limits. Agents will accumulate errors that compound across turns. The question is not whether state recovery is needed but how elegantly the system handles it.

The Context Degradation Problem

Research on multi-turn reliability shows three mechanisms by which context degrades:

Token overflow: The conversation exceeds the model's context window, forcing truncation
Error compounding: Small inaccuracies in early turns become large errors by turn 10+
Intent drift: The agent gradually shifts its understanding of what the user wants, anchoring on its own prior responses rather than the user's actual corrections

Four Recovery Strategies

Strategy	When to Use	How It Works	Production Example
Conversation Summarization	Approaching token limits	Compress older turns into structured summaries preserving key entities and decisions	Amazon Bedrock executionId state tracking
State Checkpointing	Before irreversible actions	Save full conversation state at decision points, enabling rollback	Salesforce Agentforce Testing Center replay
Context Rewind	After detecting polluted context	Rewind to a known-good conversation point, discarding contaminated turns	Google Vertex AI Agent Builder rewind feature
Subagent Spawning	Context limits approaching	Spawn fresh agent with clean context, transfer structured summary (not raw transcript)	Anthropic MCP subagent spawning pattern

Context Rewind deserves special attention. Google Vertex AI Agent Builder now supports rewinding to any conversation point to remove "polluted" context. When the agent detects that accumulated errors are degrading performance, it can roll back to a checkpoint where the context was still clean, then replay only the essential information.

This is the conversation equivalent of the undo pattern from Part 2's Action Audit, but applied to the agent's own reasoning state rather than its external actions.

Designing for Graceful Degradation

Not every conversation needs perfect context for 50 turns. The key design question is: what is the minimum context required for each turn to be useful?

For a customer support agent, the minimum context might be: the customer's name, their issue category, and the last action taken. For a coding assistant, it might be: the current file, the task description, and the last three changes. For a sales agent, it might be: the prospect's company, their stage in the pipeline, and their stated objections.

Design your context persistence layer around these minimum viable contexts rather than trying to maintain the entire conversation history indefinitely.

“Over 40% of agentic AI projects will be canceled by end of 2027 due to escalating costs, unclear value, or inadequate risk controls.”

Gartner, June 2025

Building Conversation Flows: The Implementation Stack

Here is how the pieces fit together in a production system. Each layer builds on the one below it:

Layer	Purpose	Components
1. State Machine	Core interaction loop	Perception, Reasoning, Action, Observation cycle with stop conditions
2. Memory	Context persistence	Short-term scratchpad + long-term profiles + episodic retrieval
3. Handoff	Agent-to-agent/human transfer	Topology selection + handoff contract + context packaging
4. Recovery	Graceful degradation	Summarization + checkpointing + rewind + subagent spawning
5. Trust Integration	User control layer	Intent Preview + Autonomy Dial + Escalation Pathway (from Part 2)

The trust patterns from Part 2 sit on top of the conversation architecture, not alongside it. Intent Preview operates at the Action phase of the state machine. The Autonomy Dial configures how much of the Reasoning phase the user sees. Escalation Pathway is a specialized handoff from agent to human. Trust and conversation architecture are the same system at different levels of abstraction.

What the Major Platforms Get Right (and Wrong)

Salesforce Agentforce

Gets right: Hybrid reasoning (deterministic + LLM), the Testing Center that replays full conversations turn by turn with cumulative history. This is how you test multi-turn flows: by replaying real conversations and verifying that context persists correctly at each turn.

Watch out for: Tight coupling to the Salesforce ecosystem. If your conversation flows span multiple platforms, the context persistence model may not extend cleanly.

Microsoft Copilot Studio

Gets right: Generative orchestration that separates planning from execution. Multi-agent orchestration across Azure services. The hub-and-spoke model provides excellent auditability.

Watch out for: The central orchestrator can become a bottleneck. For high-throughput scenarios, the routing latency adds up.

Google Vertex AI Agent Builder

Gets right: Memory architecture (both short-term and long-term memory now GA). Context rewind is a category-defining feature. The Agent Designer provides a visual canvas for orchestrating agent/subagent flows.

Watch out for: The visual canvas can mask complexity. Multi-turn flows that look simple in the designer can have subtle state management issues that only surface in production.

OpenAI Agents SDK

Gets right: Simplicity. The handoff primitive is elegant: one function call transfers the entire conversation context. Client-side framework gives full control over orchestration and state.

Watch out for: Minimal built-in memory management. You own the context persistence layer entirely, which is both power and responsibility.

The AX Design Playbook Series

The AX Design Playbook

What Is AX Design? The Complete Guide to Agentic Experience Design

The framework, patterns, and maturity model

Trust Design Patterns: How Users Learn to Rely on AI Coworkers

The six interaction patterns that keep humans in control

Conversation Flow Architecture(you are here)

Multi-turn, multi-agent interactions where context persists

Agent Personality and Voice

Behavioral guardrails that build trust over time

AX Metrics: How to Measure Agentic Experience Quality Beyond Task Completion

The five-pillar measurement framework for trust, efficiency, autonomy, personality, and impact

Conversation Flow Architecture: Questions Teams Ask

Common questions about this topic, answered.

Microsoft Research and Salesforce Research found that LLMs exhibit a 39% average performance drop in multi-turn versus single-turn conversations. The three failure modes are premature assumption (the agent jumps to conclusions based on partial context), prior response anchoring (the agent over-relies on its own previous answers), and correction failure (the agent cannot properly integrate user corrections into its reasoning). These failures are consistent across all major models including GPT-4o, Claude, Gemini, and DeepSeek.

Conversation flow architecture is the structural design of how multi-turn interactions between humans and AI agents are managed in production. It covers four layers: the conversation state machine (the perception, reasoning, action, observation loop), context persistence (short-term, long-term, and episodic memory), handoff topology (how agents transfer conversations to other agents or humans), and state recovery (how the system handles context degradation and errors). Without this architecture, agents lose coherence after approximately 9 turns.

Hub-and-spoke uses a central orchestrator that routes all conversations (Microsoft Copilot Studio pattern). Mesh allows agents to communicate directly with peers without a central coordinator (Anthropic MCP pattern). Swarm uses lightweight peer-to-peer handoffs with minimal overhead (OpenAI Agents SDK pattern). Hub-and-spoke provides the most control and auditability. Mesh provides the highest resilience. Swarm provides the lowest latency. Choose based on whether your priority is compliance, uptime, or speed.

Context persistence operates across three memory layers. Short-term memory holds the active conversation state as a structured scratchpad (not raw transcript). Long-term memory stores user profiles, preferences, and historical patterns across sessions using services like Amazon Bedrock AgentCore Memory. Episodic memory enables semantic retrieval of specific past interactions, organized by topic rather than chronology (Google Memory Bank pattern). The key insight is that raw conversation history is not the same as useful context.

The conversation state machine models agent interactions as a four-phase loop: Perception (interpret user input in context), Reasoning (plan the response using tools and knowledge), Action (execute the chosen action), and Observation (evaluate the result and determine next steps). This loop runs continuously until a stop condition (task completion, escalation, or user exit). Every production agent platform from Salesforce Agentforce to Google Vertex AI implements this pattern.

Four strategies address context degradation. Conversation summarization compresses older turns into structured summaries. State checkpointing saves full state at decision points for rollback. Context rewind (available in Google Vertex AI) rolls back to a known-good point, discarding contaminated turns. Subagent spawning creates fresh agents with clean contexts while transferring structured summaries. The underlying principle is designing around minimum viable context rather than maintaining entire conversation history.

Every handoff should include a handoff contract with five elements: conversation summary (what was discussed and decided), active entities (people, products, dates mentioned), current goal (what the user is trying to accomplish), trust level (the user autonomy settings from the trust patterns), and action history (what the transferring agent already tried). Intercom Fin implements this through internal notes and context panels that carry full history to human agents.

Each platform excels differently. Salesforce Agentforce has the best testing infrastructure for multi-turn flows (conversation replay with cumulative history). Google Vertex AI Agent Builder has the most advanced memory architecture (short-term, long-term, episodic, plus context rewind). Microsoft Copilot Studio has the strongest orchestration for enterprise multi-agent scenarios. OpenAI Agents SDK has the simplest developer experience for handoffs. The right choice depends on whether you prioritize testing, memory, orchestration, or developer speed.

Conversation Architecture Is the Product

The models will get better at multi-turn. Context windows will expand. But the fundamental architectural challenges of memory, handoff, and state recovery are not model problems. They are design problems that require deliberate engineering.

The teams that build robust conversation architecture now will own the experience layer of agentic AI. The ones that rely on larger context windows to paper over architectural gaps will keep hitting the same 39% degradation wall, just at turn 19 instead of turn 9.

Ready to architect your agent conversation flows?

Full-Stack AI Services - Conversation architecture for production agent systems
Read Part 2: Trust Design Patterns - The six patterns that keep humans in control
Contact Us - Start your conversation flow audit

About the Author

Lloyd Pilapil

Founder & AI Product Architect at Pixelmojo

Lloyd Pilapil is the founder of Pixelmojo and a former Salesforce engineer who builds production AI systems for B2B companies. He writes about agentic AI, multi-agent orchestration, AX (Agentic Experience) design, GEO, and Thread-Based Engineering. His work focuses on shipping AI products that generate revenue, not prototypes.

Expertise

Agentic AI SystemsMulti-Agent OrchestrationAX DesignGEO & AI SearchThread-Based EngineeringAI Product DevelopmentGrowth MarketingUI/UX Design

Your Agent Forgets Everything After Turn 9

Here is the number that should terrify every team building agentic products: LLMs exhibit a 39% average performance drop in multi-turn conversations versus single-turn interactions.

The three failure modes are consistent across all models:

Premature assumption: The agent jumps to conclusions based on partial context instead of asking clarifying questions
Prior response anchoring: The agent over-relies on its own previous answers, compounding early errors
Correction failure: When users try to correct the agent, it cannot properly integrate the correction into its reasoning

Reasoning models like o3 and DeepSeek-R1 degrade equally. Known remediations (agent-like concatenation, lower temperature) are ineffective. This is not a model problem. It is an architecture problem.

TL;DR

LLMs lose 39% accuracy in multi-turn vs single-turn conversations, with average failure at turn 9.21 (Microsoft Research + Salesforce Research, 2025)
The three failure modes: premature assumption, prior response anchoring, and correction failure. All models affected equally.
Context persistence requires three memory layers: short-term (active session), long-term (cross-session profiles), and episodic (retrievable past interactions)
Three handoff topologies serve different needs: hub-and-spoke (control), mesh (resilience), swarm (speed). Choose based on your failure tolerance.
Google Vertex AI now supports conversation rewind to any point, removing polluted context. This is the state recovery pattern production systems need.
Over 40% of agentic AI projects will be canceled by end of 2027 due to escalating costs and inadequate architecture (Gartner, 2025)

The Multi-Turn Problem in Numbers

Before designing solutions, understand the scale of what breaks when conversations go beyond a single exchange.

The Multi-Turn Problem in Numbers

Why conversation architecture is not optional

39%

average performance drop in multi-turn vs single-turn

Microsoft Research + Salesforce, 2025

9.21

average turns before failure mode triggers

Psychology Today, Feb 2026

40%+

of agentic AI projects will be canceled by 2027

Gartner, June 2025

1m 38s

average chatbot-only conversation length

Fullview, 2025

The Three Failure Modes

Premature Assumption

Agent jumps to conclusions from partial context

Prior Response Anchoring

Agent over-relies on its own previous answers

Correction Failure

Agent cannot integrate user corrections

“All top LLMs exhibit a 39% average performance drop in multi-turn conversations. Reasoning models degrade equally. Known remediations are ineffective.”

Microsoft Research + Salesforce Research, 2025

The Conversation State Machine

Every production agent follows this loop until a stop condition is met

Perception

Interpret user input in full conversation context

Salesforce Atlas Reasoning Engine

Reasoning

Plan response using tools, knowledge, and history

Copilot Studio Generative Orchestration

Action

Execute the chosen action (gated by trust patterns)

Intent Preview from Part 2

Observation

Evaluate result, decide next step or stop condition

State checkpoint + Action Audit

Loop continues until: task complete, user exit, or escalation trigger

Perception: Reading the Full Signal

Reasoning: Planning Before Acting

Action: Executing with Reversibility

Observation: Closing the Loop

Context Persistence: The Three Memory Layers

The Three Memory Layers

Production context persistence requires all three

Short-Term Memory

Active session

Structured scratchpad for the current conversation

Current goal

Active entities

Unresolved questions

Rolling summary

Google Agent Engine Sessions

Long-Term Memory

Cross-session

User profiles and historical patterns that persist

User preferences

Trust level

Past decisions

Communication style

Amazon Bedrock AgentCore Memory

Episodic Memory

Semantic retrieval

Retrievable past interactions organized by topic

Past conversations by topic

Relevant precedents

Historical recommendations

Resolution patterns

Google Memory Bank

Short-Term Memory: The Working Scratchpad

Long-Term Memory: The User Profile

Episodic Memory: The Retrievable Past

This is not keyword search. It is semantic retrieval that understands the meaning of past conversations and surfaces relevant episodes based on the current context.

“The average chatbot-only conversation lasts 1 minute 38 seconds. With live chat handover: 15 minutes 21 seconds. Users abandon agents that lose context.”

Fullview AI Chatbot Statistics, 2025

Handoff Topologies: How Agents Pass Conversations

Handoff Topologies

How agents transfer conversations to other agents or humans

Hub-and-Spoke

Central orchestrator routes all conversations

Complete visibilityEasy to auditConsistent context

Microsoft Copilot Studio

Mesh

Agents communicate directly with peers

No single point of failureLower latencyScales horizontally

Anthropic MCP

Swarm

Lightweight peer-to-peer handoffs

Minimal overheadFastest handoffsSimple mental model

OpenAI Agents SDK

The Handoff Contract: Conversation summary + active entities + current goal + trust level + action history

Hub-and-Spoke: The Central Orchestrator

A central orchestrator agent receives all conversations and routes them to specialized agents or human operators. The hub maintains the master context and ensures nothing is lost in transit.

Mesh: Direct Agent-to-Agent

In a mesh topology, agents communicate directly with each other without a central coordinator. Each agent knows which peers can handle which capabilities and routes conversations accordingly.

Swarm: Lightweight Peer Handoffs

Topology	Control	Resilience	Latency	Best For
Hub-and-Spoke	Highest	Lowest (single point of failure)	Higher	Regulated industries, audit requirements
Mesh	Medium	Highest (no single point of failure)	Medium	Complex multi-domain systems, resilience-critical
Swarm	Lowest	Medium	Lowest	Speed-critical, developer-friendly, rapid iteration

The Handoff Contract

The contract includes:

Conversation summary: What has been discussed, decided, and left unresolved
Active entities: People, products, dates, and reference numbers mentioned
Current goal: What the user is trying to accomplish
Trust level: The user's current autonomy settings and trust phase (from Part 2)
Action history: What the transferring agent already tried

State Recovery: When Context Breaks

The Context Degradation Problem

Research on multi-turn reliability shows three mechanisms by which context degrades:

Token overflow: The conversation exceeds the model's context window, forcing truncation
Error compounding: Small inaccuracies in early turns become large errors by turn 10+
Intent drift: The agent gradually shifts its understanding of what the user wants, anchoring on its own prior responses rather than the user's actual corrections

Four Recovery Strategies

Strategy	When to Use	How It Works	Production Example
Conversation Summarization	Approaching token limits	Compress older turns into structured summaries preserving key entities and decisions	Amazon Bedrock executionId state tracking
State Checkpointing	Before irreversible actions	Save full conversation state at decision points, enabling rollback	Salesforce Agentforce Testing Center replay
Context Rewind	After detecting polluted context	Rewind to a known-good conversation point, discarding contaminated turns	Google Vertex AI Agent Builder rewind feature
Subagent Spawning	Context limits approaching	Spawn fresh agent with clean context, transfer structured summary (not raw transcript)	Anthropic MCP subagent spawning pattern

This is the conversation equivalent of the undo pattern from Part 2's Action Audit, but applied to the agent's own reasoning state rather than its external actions.

Designing for Graceful Degradation

Not every conversation needs perfect context for 50 turns. The key design question is: what is the minimum context required for each turn to be useful?

Design your context persistence layer around these minimum viable contexts rather than trying to maintain the entire conversation history indefinitely.

“Over 40% of agentic AI projects will be canceled by end of 2027 due to escalating costs, unclear value, or inadequate risk controls.”

Gartner, June 2025

Building Conversation Flows: The Implementation Stack

Here is how the pieces fit together in a production system. Each layer builds on the one below it:

Layer	Purpose	Components
1. State Machine	Core interaction loop	Perception, Reasoning, Action, Observation cycle with stop conditions
2. Memory	Context persistence	Short-term scratchpad + long-term profiles + episodic retrieval
3. Handoff	Agent-to-agent/human transfer	Topology selection + handoff contract + context packaging
4. Recovery	Graceful degradation	Summarization + checkpointing + rewind + subagent spawning
5. Trust Integration	User control layer	Intent Preview + Autonomy Dial + Escalation Pathway (from Part 2)

What the Major Platforms Get Right (and Wrong)

Salesforce Agentforce

Watch out for: Tight coupling to the Salesforce ecosystem. If your conversation flows span multiple platforms, the context persistence model may not extend cleanly.

Microsoft Copilot Studio

Gets right: Generative orchestration that separates planning from execution. Multi-agent orchestration across Azure services. The hub-and-spoke model provides excellent auditability.

Watch out for: The central orchestrator can become a bottleneck. For high-throughput scenarios, the routing latency adds up.

Google Vertex AI Agent Builder

Watch out for: The visual canvas can mask complexity. Multi-turn flows that look simple in the designer can have subtle state management issues that only surface in production.

OpenAI Agents SDK

Gets right: Simplicity. The handoff primitive is elegant: one function call transfers the entire conversation context. Client-side framework gives full control over orchestration and state.

Watch out for: Minimal built-in memory management. You own the context persistence layer entirely, which is both power and responsibility.

The AX Design Playbook Series

The AX Design Playbook

What Is AX Design? The Complete Guide to Agentic Experience Design

The framework, patterns, and maturity model

Trust Design Patterns: How Users Learn to Rely on AI Coworkers

The six interaction patterns that keep humans in control

Conversation Flow Architecture(you are here)

Multi-turn, multi-agent interactions where context persists

Agent Personality and Voice

Behavioral guardrails that build trust over time

AX Metrics: How to Measure Agentic Experience Quality Beyond Task Completion

The five-pillar measurement framework for trust, efficiency, autonomy, personality, and impact

Conversation Flow Architecture: Questions Teams Ask

Common questions about this topic, answered.

Conversation Architecture Is the Product

Ready to architect your agent conversation flows?

Full-Stack AI Services - Conversation architecture for production agent systems
Read Part 2: Trust Design Patterns - The six patterns that keep humans in control
Contact Us - Start your conversation flow audit

About the Author

Lloyd Pilapil

Founder & AI Product Architect at Pixelmojo

Expertise

Agentic AI SystemsMulti-Agent OrchestrationAX DesignGEO & AI SearchThread-Based EngineeringAI Product DevelopmentGrowth MarketingUI/UX Design

Your Agent Forgets Everything After Turn 9

Here is the number that should terrify every team building agentic products: LLMs exhibit a 39% average performance drop in multi-turn conversations versus single-turn interactions.

The three failure modes are consistent across all models:

Premature assumption: The agent jumps to conclusions based on partial context instead of asking clarifying questions
Prior response anchoring: The agent over-relies on its own previous answers, compounding early errors
Correction failure: When users try to correct the agent, it cannot properly integrate the correction into its reasoning

Reasoning models like o3 and DeepSeek-R1 degrade equally. Known remediations (agent-like concatenation, lower temperature) are ineffective. This is not a model problem. It is an architecture problem.

TL;DR

LLMs lose 39% accuracy in multi-turn vs single-turn conversations, with average failure at turn 9.21 (Microsoft Research + Salesforce Research, 2025)
The three failure modes: premature assumption, prior response anchoring, and correction failure. All models affected equally.
Context persistence requires three memory layers: short-term (active session), long-term (cross-session profiles), and episodic (retrievable past interactions)
Three handoff topologies serve different needs: hub-and-spoke (control), mesh (resilience), swarm (speed). Choose based on your failure tolerance.
Google Vertex AI now supports conversation rewind to any point, removing polluted context. This is the state recovery pattern production systems need.
Over 40% of agentic AI projects will be canceled by end of 2027 due to escalating costs and inadequate architecture (Gartner, 2025)

The Multi-Turn Problem in Numbers

Before designing solutions, understand the scale of what breaks when conversations go beyond a single exchange.

The Multi-Turn Problem in Numbers

Why conversation architecture is not optional

39%

average performance drop in multi-turn vs single-turn

Microsoft Research + Salesforce, 2025

9.21

average turns before failure mode triggers

Psychology Today, Feb 2026

40%+

of agentic AI projects will be canceled by 2027

Gartner, June 2025

1m 38s

average chatbot-only conversation length

Fullview, 2025

The Three Failure Modes

Premature Assumption

Agent jumps to conclusions from partial context

Prior Response Anchoring

Agent over-relies on its own previous answers

Correction Failure

Agent cannot integrate user corrections

“All top LLMs exhibit a 39% average performance drop in multi-turn conversations. Reasoning models degrade equally. Known remediations are ineffective.”

Microsoft Research + Salesforce Research, 2025

The Conversation State Machine

Every production agent follows this loop until a stop condition is met

Perception

Interpret user input in full conversation context

Salesforce Atlas Reasoning Engine

Reasoning

Plan response using tools, knowledge, and history

Copilot Studio Generative Orchestration

Action

Execute the chosen action (gated by trust patterns)

Intent Preview from Part 2

Observation

Evaluate result, decide next step or stop condition

State checkpoint + Action Audit

Loop continues until: task complete, user exit, or escalation trigger

Perception: Reading the Full Signal

Reasoning: Planning Before Acting

Action: Executing with Reversibility

Observation: Closing the Loop

Context Persistence: The Three Memory Layers

The Three Memory Layers

Production context persistence requires all three

Short-Term Memory

Active session

Structured scratchpad for the current conversation

Current goal

Active entities

Unresolved questions

Rolling summary

Google Agent Engine Sessions

Long-Term Memory

Cross-session

User profiles and historical patterns that persist

User preferences

Trust level

Past decisions

Communication style

Amazon Bedrock AgentCore Memory

Episodic Memory

Semantic retrieval

Retrievable past interactions organized by topic

Past conversations by topic

Relevant precedents

Historical recommendations

Resolution patterns

Google Memory Bank

Short-Term Memory: The Working Scratchpad

Long-Term Memory: The User Profile

Episodic Memory: The Retrievable Past

This is not keyword search. It is semantic retrieval that understands the meaning of past conversations and surfaces relevant episodes based on the current context.

“The average chatbot-only conversation lasts 1 minute 38 seconds. With live chat handover: 15 minutes 21 seconds. Users abandon agents that lose context.”

Fullview AI Chatbot Statistics, 2025

Handoff Topologies: How Agents Pass Conversations

Handoff Topologies

How agents transfer conversations to other agents or humans

Hub-and-Spoke

Central orchestrator routes all conversations

Complete visibilityEasy to auditConsistent context

Microsoft Copilot Studio

Mesh

Agents communicate directly with peers

No single point of failureLower latencyScales horizontally

Anthropic MCP

Swarm

Lightweight peer-to-peer handoffs

Minimal overheadFastest handoffsSimple mental model

OpenAI Agents SDK

The Handoff Contract: Conversation summary + active entities + current goal + trust level + action history

Hub-and-Spoke: The Central Orchestrator

A central orchestrator agent receives all conversations and routes them to specialized agents or human operators. The hub maintains the master context and ensures nothing is lost in transit.

Mesh: Direct Agent-to-Agent

In a mesh topology, agents communicate directly with each other without a central coordinator. Each agent knows which peers can handle which capabilities and routes conversations accordingly.

Swarm: Lightweight Peer Handoffs

Topology	Control	Resilience	Latency	Best For
Hub-and-Spoke	Highest	Lowest (single point of failure)	Higher	Regulated industries, audit requirements
Mesh	Medium	Highest (no single point of failure)	Medium	Complex multi-domain systems, resilience-critical
Swarm	Lowest	Medium	Lowest	Speed-critical, developer-friendly, rapid iteration

The Handoff Contract

The contract includes:

Conversation summary: What has been discussed, decided, and left unresolved
Active entities: People, products, dates, and reference numbers mentioned
Current goal: What the user is trying to accomplish
Trust level: The user's current autonomy settings and trust phase (from Part 2)
Action history: What the transferring agent already tried

State Recovery: When Context Breaks

The Context Degradation Problem

Research on multi-turn reliability shows three mechanisms by which context degrades:

Token overflow: The conversation exceeds the model's context window, forcing truncation
Error compounding: Small inaccuracies in early turns become large errors by turn 10+
Intent drift: The agent gradually shifts its understanding of what the user wants, anchoring on its own prior responses rather than the user's actual corrections

Four Recovery Strategies

Strategy	When to Use	How It Works	Production Example
Conversation Summarization	Approaching token limits	Compress older turns into structured summaries preserving key entities and decisions	Amazon Bedrock executionId state tracking
State Checkpointing	Before irreversible actions	Save full conversation state at decision points, enabling rollback	Salesforce Agentforce Testing Center replay
Context Rewind	After detecting polluted context	Rewind to a known-good conversation point, discarding contaminated turns	Google Vertex AI Agent Builder rewind feature
Subagent Spawning	Context limits approaching	Spawn fresh agent with clean context, transfer structured summary (not raw transcript)	Anthropic MCP subagent spawning pattern

This is the conversation equivalent of the undo pattern from Part 2's Action Audit, but applied to the agent's own reasoning state rather than its external actions.

Designing for Graceful Degradation

Not every conversation needs perfect context for 50 turns. The key design question is: what is the minimum context required for each turn to be useful?

Design your context persistence layer around these minimum viable contexts rather than trying to maintain the entire conversation history indefinitely.

“Over 40% of agentic AI projects will be canceled by end of 2027 due to escalating costs, unclear value, or inadequate risk controls.”

Gartner, June 2025

Building Conversation Flows: The Implementation Stack

Here is how the pieces fit together in a production system. Each layer builds on the one below it:

Layer	Purpose	Components
1. State Machine	Core interaction loop	Perception, Reasoning, Action, Observation cycle with stop conditions
2. Memory	Context persistence	Short-term scratchpad + long-term profiles + episodic retrieval
3. Handoff	Agent-to-agent/human transfer	Topology selection + handoff contract + context packaging
4. Recovery	Graceful degradation	Summarization + checkpointing + rewind + subagent spawning
5. Trust Integration	User control layer	Intent Preview + Autonomy Dial + Escalation Pathway (from Part 2)

What the Major Platforms Get Right (and Wrong)

Salesforce Agentforce

Watch out for: Tight coupling to the Salesforce ecosystem. If your conversation flows span multiple platforms, the context persistence model may not extend cleanly.

Microsoft Copilot Studio

Gets right: Generative orchestration that separates planning from execution. Multi-agent orchestration across Azure services. The hub-and-spoke model provides excellent auditability.

Watch out for: The central orchestrator can become a bottleneck. For high-throughput scenarios, the routing latency adds up.

Google Vertex AI Agent Builder

Watch out for: The visual canvas can mask complexity. Multi-turn flows that look simple in the designer can have subtle state management issues that only surface in production.

OpenAI Agents SDK

Gets right: Simplicity. The handoff primitive is elegant: one function call transfers the entire conversation context. Client-side framework gives full control over orchestration and state.

Watch out for: Minimal built-in memory management. You own the context persistence layer entirely, which is both power and responsibility.

The AX Design Playbook Series

The AX Design Playbook

What Is AX Design? The Complete Guide to Agentic Experience Design

The framework, patterns, and maturity model

Trust Design Patterns: How Users Learn to Rely on AI Coworkers

The six interaction patterns that keep humans in control

Conversation Flow Architecture(you are here)

Multi-turn, multi-agent interactions where context persists

Agent Personality and Voice

Behavioral guardrails that build trust over time

AX Metrics: How to Measure Agentic Experience Quality Beyond Task Completion

The five-pillar measurement framework for trust, efficiency, autonomy, personality, and impact

Conversation Flow Architecture: Questions Teams Ask

Common questions about this topic, answered.

Conversation Architecture Is the Product

Ready to architect your agent conversation flows?

Full-Stack AI Services - Conversation architecture for production agent systems
Read Part 2: Trust Design Patterns - The six patterns that keep humans in control
Contact Us - Start your conversation flow audit

Your Agent Forgets Everything After Turn 9

TL;DR

The Multi-Turn Problem in Numbers

The Multi-Turn Problem in Numbers

The Three Failure Modes

The Conversation State Machine

The Conversation State Machine

Perception: Reading the Full Signal

Reasoning: Planning Before Acting

Action: Executing with Reversibility

Observation: Closing the Loop

Context Persistence: The Three Memory Layers

The Three Memory Layers

Short-Term Memory: The Working Scratchpad

Long-Term Memory: The User Profile

Episodic Memory: The Retrievable Past

Handoff Topologies: How Agents Pass Conversations

Handoff Topologies

Hub-and-Spoke: The Central Orchestrator

Mesh: Direct Agent-to-Agent

Swarm: Lightweight Peer Handoffs

The Handoff Contract

State Recovery: When Context Breaks

The Context Degradation Problem

Four Recovery Strategies

Designing for Graceful Degradation

Building Conversation Flows: The Implementation Stack

What the Major Platforms Get Right (and Wrong)

Salesforce Agentforce

Microsoft Copilot Studio

Google Vertex AI Agent Builder

OpenAI Agents SDK

The AX Design Playbook Series

Conversation Flow Architecture: Questions Teams Ask

Conversation Architecture Is the Product

About the Author

Lloyd Pilapil

Related Reading

Your Agent Forgets Everything After Turn 9

TL;DR

The Multi-Turn Problem in Numbers

The Multi-Turn Problem in Numbers

The Three Failure Modes

The Conversation State Machine

The Conversation State Machine

Perception: Reading the Full Signal

Reasoning: Planning Before Acting

Action: Executing with Reversibility

Observation: Closing the Loop

Context Persistence: The Three Memory Layers

The Three Memory Layers

Short-Term Memory: The Working Scratchpad

Long-Term Memory: The User Profile

Episodic Memory: The Retrievable Past

Handoff Topologies: How Agents Pass Conversations

Handoff Topologies

Hub-and-Spoke: The Central Orchestrator

Mesh: Direct Agent-to-Agent

Swarm: Lightweight Peer Handoffs

The Handoff Contract

State Recovery: When Context Breaks

The Context Degradation Problem

Four Recovery Strategies

Designing for Graceful Degradation

Building Conversation Flows: The Implementation Stack

What the Major Platforms Get Right (and Wrong)

Salesforce Agentforce

Microsoft Copilot Studio

Google Vertex AI Agent Builder

OpenAI Agents SDK

The AX Design Playbook Series

Conversation Flow Architecture: Questions Teams Ask

Conversation Architecture Is the Product

About the Author

Lloyd Pilapil

Related Reading

Your Agent Forgets Everything After Turn 9

TL;DR

The Multi-Turn Problem in Numbers

The Multi-Turn Problem in Numbers