ChanlChanl
Agent Architecture

The Multi-Agent Pattern That Actually Works in Production

Gartner reports a 1,445% surge in multi-agent system inquiries. Here are the orchestration patterns that actually work when real customers call -- and why most teams pick the wrong one.

DGDean GroverCo-founderFollow
March 20, 2026
14 min read
Diagram showing interconnected AI agents coordinating a complex customer service workflow

A customer calls. She wants a refund for a damaged product, a replacement shipped overnight, and a callback scheduled with a manager to discuss her account. Three tasks. Three different backend systems. One conversation that needs to feel seamless.

This is the moment single-agent architectures break. Not because the model is dumb, but because one agent trying to hold refund policies, inventory queries, and scheduling logic simultaneously starts dropping context by turn four. The refund amount is wrong. The replacement ships to the old address. The callback never gets scheduled.

Gartner saw a 1,445% surge in multi-agent system inquiries from Q1 2024 to Q2 2025. By end of 2026, 40% of enterprise applications will embed task-specific AI agents, up from less than 5% in 2025. The question has shifted from "should we use multiple agents?" to "which orchestration pattern won't collapse under production traffic?"

This article answers that question. We'll follow our customer's three-part request through every major orchestration pattern, show you exactly where each one breaks, and land on the architecture that's actually surviving in production systems today.

Table of Contents

The Patterns at a Glance

Before diving in, here's the landscape. Each pattern handles our customer's three-part request differently.

PatternHow It RoutesLatencyCostDebuggabilityBest For
Flat routingClassifier picks one specialistLowLowHardSingle-intent requests
Sequential pipelineAgent A then B then CHighMediumEasyDependent steps
HierarchicalOrchestrator delegates dynamicallyMediumMediumBestComplex multi-step requests
Plan-and-executeExpensive planner, cheap executorsMediumLowestGoodCost-sensitive at scale

If you're new to multi-agent systems, start with our guide on building an agent orchestrator from scratch. It covers the fundamentals. This article picks up where that leaves off: what happens when those patterns meet real traffic.

Flat Routing: Fast but Fragile

The simplest multi-agent pattern. A classifier looks at the customer's message and routes it to one specialist agent.

refund intent replacement intent callback intent Customer Router Refund Agent Replacement Agent Scheduling Agent
Flat routing: a classifier picks one specialist per message
typescript
// Flat routing -- fast, but can only handle one intent per message
async function routeToSpecialist(message: string) {
  // Cheap model classifies intent (costs ~$0.001 per call)
  const intent = await classify(message, {
    model: "gpt-4o-mini",
    categories: ["refund", "replacement", "scheduling", "general"],
  });
 
  // Route to the one specialist that matches
  return specialists[intent].handle(message);
}

Where it works. Single-intent messages. "I want a refund" goes to the refund agent. Fast, cheap, done.

Where it breaks. Our customer said three things in one message. The classifier picks "refund" because it appears first. The replacement and callback requests vanish. She repeats herself. The agent apologizes and handles the replacement. The callback still never happens.

This is the most common production failure mode for flat routing: multi-intent messages get truncated to single intents. Research from Maxim AI found that specification failures, where the system misunderstands what the user actually needs, account for approximately 42% of multi-agent failures.

Flat routing works for chatbots that handle one question at a time. It does not work for customer service, where real conversations are messy, multi-part, and context-dependent.

Sequential Pipeline: Predictable but Slow

Pipeline the request through specialists in order. Each agent's output feeds into the next.

Refund + replacement + callback Refund processed, passing context Replacement ordered, passing context All three handled Customer Refund Agent Replacement Agent Scheduling Agent
Sequential pipeline: each agent processes and passes forward
typescript
// Sequential pipeline -- predictable, but every step waits for the last
async function sequentialPipeline(request: CustomerRequest) {
  // Step 1: Process refund (2-3 seconds)
  const refundResult = await refundAgent.handle(request);
 
  // Step 2: Order replacement -- needs refund context to avoid double-charging
  const replacementResult = await replacementAgent.handle({
    ...request,
    refundConfirmation: refundResult,
  });
 
  // Step 3: Schedule callback -- needs both prior results for summary
  const callbackResult = await schedulingAgent.handle({
    ...request,
    refundConfirmation: refundResult,
    replacementConfirmation: replacementResult,
  });
 
  return mergeResults(refundResult, replacementResult, callbackResult);
}

Where it works. When steps genuinely depend on each other. The replacement agent needs to know the refund was processed before shipping (to avoid double-charging). The scheduling agent needs both confirmations to summarize what happened.

Where it breaks. Our customer is still waiting. Latency compounds. Three agents running sequentially means 6-9 seconds of wall-clock time. On a voice call, that's dead air. On chat, she's already typed "hello?" and "are you there?"

Worse, the pipeline assumes a fixed order. What if the replacement is out of stock? The scheduling agent still runs, scheduling a callback about a replacement that won't arrive. The pipeline has no way to adapt.

Sequential pipelines are great for batch processing. For real-time customer conversations, the rigidity is a liability.

Hierarchical: Production's Favorite

Here's the pattern that actually survives production. An orchestrator agent receives the full request, decomposes it into subtasks, delegates each to a specialist, and merges the results.

subtask: process refund subtask: order replacement subtask: schedule callback result result result merged response Customer Orchestrator Refund Agent Replacement Agent Scheduling Agent
Hierarchical orchestration: one orchestrator owns the outcome
typescript
// Hierarchical orchestration -- the orchestrator owns the full lifecycle
async function orchestrate(message: string, context: ConversationContext) {
  // Step 1: Orchestrator decomposes into subtasks
  // Uses a capable model because decomposition is the hardest part
  const plan = await orchestrator.decompose(message, {
    model: "claude-sonnet-4-20250514",
    availableAgents: ["refund", "replacement", "scheduling"],
    conversationHistory: context.history,
  });
 
  // Step 2: Execute subtasks (parallel when independent, sequential when dependent)
  const results = {};
  for (const step of plan.steps) {
    if (step.dependsOn && !results[step.dependsOn]) {
      // Wait for dependency -- replacement needs refund result
      continue;
    }
 
    // Delegate to specialist with only the context it needs
    results[step.id] = await specialists[step.agent].handle({
      task: step.description,
      context: step.dependsOn ? results[step.dependsOn] : null,
    });
  }
 
  // Step 3: Orchestrator merges results into one coherent response
  return orchestrator.synthesize(results, context);
}

Why this wins in production. Three properties that the other patterns lack:

Clear accountability. The orchestrator owns the outcome. When the replacement is out of stock, the orchestrator catches it and adapts, maybe offering a store credit instead, rescheduling the callback topic. No rigid pipeline to derail.

Debuggable traces. Every delegation is a logged event: orchestrator decided to send subtask X to agent Y with context Z. When something goes wrong at 3am, you can replay the exact decision chain. GitHub's engineering team found that treating agents like distributed system components, with typed handoffs and explicit state contracts, is the key to reliability.

Graceful degradation. If the scheduling service is down, the orchestrator handles the refund and replacement, then tells our customer: "I've processed your refund and replacement. Our scheduling system is temporarily unavailable. I'll have someone call you within 2 hours." Two out of three requests handled. That's a partial success, not a total failure.

Microsoft's Azure Architecture Center documents this as the recommended pattern for production agent systems, with the orchestrator maintaining state and the specialists remaining stateless and focused.

Plan-and-Execute: The Cost Killer

Here's where it gets interesting. Hierarchical orchestration works, but it's expensive. The orchestrator runs a capable model for every customer interaction. At scale, those tokens add up.

Plan-and-execute splits the architecture into two tiers: an expensive model that thinks, and cheap models that do.

typescript
// Plan-and-execute -- expensive model plans, cheap models execute
async function planAndExecute(message: string, context: ConversationContext) {
  // PLANNER: Claude Sonnet decomposes the request ($0.003/1K input tokens)
  // This is the only expensive call -- it happens once per request
  const plan = await planner.createPlan(message, {
    model: "claude-sonnet-4-20250514",
    availableTools: ["process_refund", "check_inventory", "order_replacement",
                     "schedule_callback", "lookup_customer"],
  });
 
  // EXECUTOR: GPT-4o-mini runs each step ($0.00015/1K input tokens)
  // 20x cheaper per token -- and most steps are simple tool calls
  const results = {};
  for (const step of plan.steps) {
    results[step.id] = await executor.run(step, {
      model: "gpt-4o-mini", // Classification and tool calls don't need Sonnet
      tools: step.requiredTools,
      priorResults: step.dependencies.map((d) => results[d]),
    });
  }
 
  // SYNTHESIZER: Back to Sonnet for the customer-facing response
  // Merging three results into natural language needs the bigger model
  return synthesizer.compose(results, {
    model: "claude-sonnet-4-20250514",
    tone: context.customerSentiment,
  });
}

The math. Our customer's request generates roughly 3,000 tokens across the three specialist steps. Running everything on Claude Sonnet: ~$0.009. Running the plan-and-execute split with GPT-4o-mini for execution: ~$0.0015. That's an 83% cost reduction on a single request. At 100,000 daily customer interactions, that's the difference between $900/day and $150/day.

The insight is that most execution steps are simple: call an API, extract a field, classify a status. These tasks don't need frontier-model reasoning. A routing classification call costs ~$0.0025 and can redirect 30% of tasks to models that are 90% cheaper. The planner is the only step that needs to understand the full complexity of the request.

Conventional wisdom says you should use your best model everywhere for quality. The data says most execution steps are so simple that a model 20x cheaper handles them identically. Save the expensive reasoning for the one step that actually needs it.

When to use it. Plan-and-execute shines when you have high request volume (the savings compound), most execution steps are tool calls or structured extraction, and your quality requirements are met by smaller models for individual steps. If every step requires nuanced reasoning, the cost savings evaporate because you can't downgrade the executor model.

The Passing Ships Problem

Every multi-agent pattern shares one insidious failure mode. We call it the "passing ships" problem, and it's the reason most teams hit a wall around month three of production.

Here's how it manifests. Our customer's refund agent processes a $47.99 refund. The replacement agent, running in parallel, checks inventory and finds the item is discontinued. It substitutes a similar product at $52.99. The scheduling agent books a callback for "replacement follow-up."

From each agent's perspective, the job is done. From the customer's perspective, she was charged $4.00 extra without being asked, and the callback is about the wrong thing. The agents were ships passing in the night, each doing its job correctly in isolation, collectively producing a broken experience.

typescript
// THE PROBLEM: Agents can't see each other's decisions
// Each agent operates on its own snapshot of reality
 
// Refund agent sees: customer wants $47.99 back ✓
// Replacement agent sees: item discontinued, substitute available ✓
// Scheduling agent sees: customer wants callback about replacement ✓
 
// Nobody sees: the substitute costs more, and the callback topic is now wrong
 
// THE FIX: Shared scratchpad with real-time writes
const scratchpad = new SharedState(requestId);
 
async function executeWithSharedState(agent, task) {
  // Agent reads latest state before starting -- sees what others have done
  const currentState = await scratchpad.read();
 
  const result = await agent.handle({
    task,
    sharedContext: currentState, // Full picture, not just its own slice
  });
 
  // Agent writes result back -- other agents see it immediately
  await scratchpad.write(agent.id, result);
 
  return result;
}

The fix is structural, not algorithmic. Every agent reads from and writes to a shared scratchpad before and after execution. The orchestrator checks for conflicts before merging results. If the replacement agent changes the product, the orchestrator re-runs the refund calculation and updates the callback topic.

GitHub's multi-agent reliability analysis found that most failures trace back to missing structural components: shared state, ordering assumptions, and implicit handoffs. The agents aren't broken. The connections between them are.

This is also where monitoring and observability become non-negotiable. You need to see the full trace across all agents, not just individual agent logs, to catch these cross-agent coordination failures before customers do.

Framework Comparison for Production

You've picked an orchestration pattern. Now you need to implement it. Here's how the major frameworks compare for production multi-agent systems in 2026.

FrameworkOrchestration StyleProduction ReadinessBest ForWatch Out For
LangGraphGraph-based state machinesMost battle-testedComplex branching, rollback, deterministic executionSteep learning curve (graph theory required)
CrewAIRole-based teamsGood, less mature monitoringRapid prototyping, team-based workflows40% faster to deploy but harder to debug at scale
AutoGenConversational agentsProduction-ready (maintenance mode)Multi-party dialogues, consensusMicrosoft shifted focus to Agent Framework
OpenAI Agents SDKBuilt-in handoffsGrowing ecosystemOpenAI-native stacksVendor lock-in to OpenAI models
Microsoft Agent FrameworkEnterprise orchestrationNew, actively developedAzure-native enterpriseEarly stage, API surface still evolving

The honest recommendation. If you need production reliability today, LangGraph gives you the most control over state, branching, and error recovery. If you need to ship a prototype in a week, CrewAI's role-based abstraction gets you there fastest. If you're building on Azure, Microsoft's Agent Framework is the natural fit but expect to be an early adopter.

If your system is 2-4 agents with a clear workflow, you might not need a framework at all. A 150-line orchestrator with explicit handoffs is easier to debug than any framework's abstractions.

What Actually Breaks at Scale

We've covered the patterns. Here's what production teaches you that documentation doesn't.

Cascading hallucinations. Agent A hallucinates a policy ("refunds over $100 require manager approval"). Agent B, receiving this as context, treats it as fact and escalates unnecessarily. Agent C schedules a manager callback that shouldn't exist. One hallucination, three agents deep, creates a customer experience that's confidently wrong at every step.

The fix: Each agent validates its inputs against ground truth. The refund agent checks the actual refund policy, not what the orchestrator summarized. Tool integrations that connect agents to authoritative data sources prevent agents from operating on stale or fabricated context.

State drift under concurrency. Two agents read the customer's account balance at the same time. Both proceed as if the balance is $100. One issues a $47.99 refund, the other places a $52.99 order. The account goes negative. This is the distributed systems version of a race condition, and it's endemic to parallel agent execution.

The fix: Optimistic locking on shared state. Agents claim resources before modifying them. The orchestrator detects conflicts and retries.

Context window exhaustion. A five-agent system where each agent passes its full output to the next agent hits context limits by agent three. The later agents are operating on truncated input, missing critical details from the early agents.

The fix: Structured handoffs. Each agent produces a summary (50-100 tokens) alongside its full output. Downstream agents receive summaries by default, with the option to request full context for specific fields.

Evaluation blind spots. You test each agent individually. They all pass. The system fails in production because nobody tested the handoffs. Scenario testing with AI personas that simulate multi-part customer requests is the only reliable way to catch cross-agent failures before production.

The Pattern Decision Tree

Start here when choosing an orchestration pattern.

Always one Often multiple Yes, strict order Partially or no Token budget matters Reliability over cost How many intents per message? Flat Routing Do steps depend on each other? Sequential Pipeline Cost sensitivity? Plan-and-Execute Hierarchical
Decision tree for choosing an orchestration pattern

For most customer-facing production systems, the answer is hierarchical with plan-and-execute optimization. The orchestrator handles decomposition and conflict resolution using a capable model. Individual specialists execute using the cheapest model that can handle their specific task. Memory ensures context persists across the full interaction, so no agent starts from zero.

The autonomous AI agent market is projected to reach $8.5 billion by end of 2026, with Deloitte noting that enterprises who orchestrate agents well could push that figure 15-30% higher. But Gartner also warns that over 40% of agentic AI projects will be canceled by end of 2027 due to escalating costs and inadequate risk controls.

The difference between the projects that survive and the ones that get canceled? The surviving ones pick a pattern that matches their actual complexity, build observability in from day one, and resist the temptation to add agents when better prompts or tools would solve the problem.

Our customer got her refund, her replacement, and her callback. Three agents, one orchestrator, and a shared scratchpad that kept them all on the same page. That's not a demo. That's Tuesday.

Build multi-agent systems with shared tools, memory, and monitoring

Chanl gives every agent access to the same tools, knowledge base, and persistent memory -- then monitors the full orchestration trace across all agents in production.

Start building
DG

Co-founder

Building the platform for AI agents at Chanl — tools, testing, and observability for customer experience.

Learn Agentic AI

One lesson a week — practical techniques for building, testing, and shipping AI agents. From prompt engineering to production monitoring. Learn by doing.

500+ engineers subscribed

Frequently Asked Questions