ChanlChanl
Agent Architecture

Claude 4.6 broke our production agent in two hours — here's what's worth the migration

A practical developer guide to Claude 4.6 — adaptive thinking, 1M context, compaction API, tool search, and structured outputs. Real code examples in TypeScript and Python for building production AI agents.

DGDean GroverCo-founderFollow
March 15, 2026
20 min read read
Claude AI agent development tools with code on a developer workspace

I upgraded a production agent from Claude Sonnet 4.5 to Claude Opus 4.6 on a Tuesday morning. By Tuesday afternoon, every request was returning 400 errors. Not some requests — all of them. The agent had been prefilling assistant messages to steer responses, and that pattern silently broke in 4.6. No deprecation warning in the 4.5 response. No heads-up in the migration guide I skimmed. Just a hard 400.

I wasn't alone. In February 2026, LiveKit filed GitHub issue #4907 — Claude 4.6's prefilling removal immediately broke their entire Claude integration for voice and video agent pipelines. The prefilling removal wasn't a deprecation with a grace period. It was a hard 400 error that crashed production agents on day one. If one of the largest real-time communication platforms got caught off guard, you can assume plenty of smaller teams did too.

That's the kind of thing this article exists for. Anthropic released Claude 4.6 with a list of features that sounds transformative — adaptive thinking, a million-token context window, automatic compaction, tool search — and most of it genuinely is. But if you're building AI agents in production, what matters is knowing what breaks, what costs what, and where the new features actually change your code.

Prerequisites and setup

You'll need Node.js 18+ or Python 3.10+, an Anthropic API key, and a terminal. Install the SDK for your language:

bash
# TypeScript
npm install @anthropic-ai/sdk
 
# Python
pip install anthropic

Create a .env file with your API key:

text
ANTHROPIC_API_KEY=sk-ant-...

The model IDs you'll use throughout this article:

ModelIDRelease Date
Claude Opus 4.6claude-opus-4-6-20260205Feb 5, 2026
Claude Sonnet 4.6claude-sonnet-4-6-20260217Feb 17, 2026

If you're new to Claude's tool use API, the tool system deep dive covers the execution loop mechanics.

What changed in Claude 4.6 (the TL;DR)

Claude 4.6 is the largest single-release API surface change since Claude 3 introduced tool use. Nine features shipped across Opus and Sonnet, three things broke, and one pricing decision eliminated the main objection to large context windows.

Here's the full feature matrix — Opus 4.6 vs Sonnet 4.6 vs the 4.5 models they replace:

FeatureOpus 4.6Sonnet 4.6Opus 4.5Sonnet 4.5
Context window1M (GA)1M (GA)200K200K
Max output tokens128K64K64K16K
Adaptive thinkingYesYesNoNo
Compaction APIYesYesNoNo
Tool searchYesYesYesYes
Structured outputsGAGABetaBeta
Web search toolv20260209v20260209v20250305v20250305
Fast modePreviewNoNoNo
Data residencyYesYesNoNo
Agent TeamsPreviewNoNoNo

What broke

Three changes will bite you if you upgrade without reading the docs:

  1. Prefilling assistant messages — Returns a 400 error. No fallback, no flag to re-enable.
  2. thinking: {type: "enabled"} with budget_tokens — Deprecated. Still works today, will be removed.
  3. output_format — Moved to output_config.format. The old key still works but logs a deprecation warning.

If you're running agents in production, check your code for all three before upgrading. The prefill one is the silent killer — it worked fine on 4.5.

Adaptive thinking: the end of budget_tokens

Adaptive thinking is the single most important change for agent developers. It replaces the guesswork of setting a fixed thinking budget with a system that scales reasoning depth automatically based on what Claude is actually being asked to do.

The old way (deprecated)

Previously, you had to guess how many tokens Claude should spend thinking:

typescript
// OLD — deprecated on 4.6
const response = await anthropic.messages.create({
  model: "claude-opus-4-5-20250129",
  max_tokens: 16000,
  thinking: {
    type: "enabled",
    budget_tokens: 10000,
  },
  messages: [{ role: "user", content: "Look up order #ORD-48291" }],
});

The problem: a simple order lookup doesn't need 10,000 tokens of reasoning. But a complex multi-tool workflow — "compare this customer's purchase history against our return policy and recommend the best resolution" — might need every token. With a fixed budget, you either overspend on simple queries or underthink complex ones.

The new way: effort levels

typescript
import Anthropic from "@anthropic-ai/sdk";
 
const anthropic = new Anthropic();
 
// Simple tool call — low effort
const simpleResponse = await anthropic.messages.create({
  model: "claude-opus-4-6-20260205",
  max_tokens: 16000,
  thinking: {
    type: "adaptive",
    effort: "low",
  },
  messages: [{ role: "user", content: "What time is it in Tokyo?" }],
});
 
// Complex reasoning — high effort (default)
const complexResponse = await anthropic.messages.create({
  model: "claude-opus-4-6-20260205",
  max_tokens: 16000,
  thinking: {
    type: "adaptive",
    effort: "high",
  },
  messages: [
    {
      role: "user",
      content:
        "Review this customer's last 5 interactions, identify the recurring issue, and draft a resolution plan that addresses the root cause.",
    },
  ],
});

The Python equivalent:

python
import anthropic
 
client = anthropic.Anthropic()
 
# Adaptive thinking with effort control
response = client.messages.create(
    model="claude-opus-4-6-20260205",
    max_tokens=16000,
    thinking={
        "type": "adaptive",
        "effort": "high",
    },
    messages=[
        {
            "role": "user",
            "content": "Analyze this support transcript and identify where the agent should have escalated.",
        }
    ],
)
 
# Access thinking content
for block in response.content:
    if block.type == "thinking":
        print(f"Reasoning: {block.thinking}")
    elif block.type == "text":
        print(f"Response: {block.text}")

When to use each effort level

EffortUse CaseAgent Example
lowSimple lookups, classification"What's this customer's plan tier?"
mediumStandard tool selection, single-step tasks"Cancel this order and send confirmation"
highMulti-step reasoning, policy interpretation"This customer wants a refund but is outside the window — what are our options?"
maxComplex analysis, multi-tool orchestration"Review all interactions from this week, identify systemic issues, and draft a report"

For most agent workloads, high (the default) handles the sweet spot: Claude thinks deeply when the query is complex and skips unnecessary reasoning for straightforward requests. Set low for high-throughput, latency-sensitive operations like real-time classification. Use max sparingly — it burns through output tokens for analysis tasks that genuinely need deep reasoning.

1M tokens at standard pricing

Claude 4.6 ships with a 1-million-token context window — and unlike previous long-context options, there's no premium pricing. You pay the same rate per token whether you're using 10K tokens or 900K.

What this means for agent architecture

The 1M context window changes three architectural patterns:

Full conversation history. Instead of summarizing or truncating old messages, agents can maintain the complete conversation — including all tool calls, results, and reasoning — for sessions that run into the hundreds of turns. This is the difference between an agent that "remembers" what happened ten minutes ago and one that actually has the full record.

RAG with less chunking pressure. With 200K tokens, you had to aggressively chunk and rank documents before injecting them. With 1M, you can include entire documents, full policy manuals, or complete knowledge base sections. The RAG architecture patterns still apply — you still want retrieval over brute-force context stuffing — but the ceiling is dramatically higher.

Multi-agent context sharing. When one agent hands off to another, the receiving agent can ingest the full history of the previous conversation without lossy summarization. The customer doesn't have to repeat anything.

Pricing comparison

ModelInput (per 1M tokens)Output (per 1M tokens)Context Window
Opus 4.6$5$251M
Sonnet 4.6$3$151M
GPT-4o$2.50$10128K
Gemini 1.5 Pro$1.25$52M

Cost analysis: worked example

Instead of saying "the math works," here's the actual math for a customer support agent:

text
Average session: 15,000 input tokens + 3,000 output tokens
Volume: 1,000 sessions/day, 30 days/month
 
Sonnet 4.6:
  Input:  $3/1M  × 15,000 × 1,000 × 30 = $1,350/month
  Output: $15/1M × 3,000  × 1,000 × 30 = $1,350/month
  Total: $2,700/month
 
Opus 4.6:
  Input:  $5/1M  × 15,000 × 1,000 × 30 = $2,250/month
  Output: $25/1M × 3,000  × 1,000 × 30 = $2,250/month
  Total: $4,500/month
 
With adaptive thinking on Sonnet (est. 40% of sessions skip extended thinking):
  Output savings: ~$540/month
  Effective total: ~$2,160/month

Sonnet at $2,700/month handles 30,000 customer conversations. That's $0.09 per conversation. For most support operations, that's a fraction of what a human agent costs per ticket. The no-premium pricing on the 1M context window is what makes this viable — previously, long-context requests came with a multiplier that broke the economics at high volume.

The compaction API: infinite conversations for agents

Even with 1M tokens, long-running agent sessions will eventually hit the ceiling. The compaction API solves this by automatically summarizing older context when you approach the limit — enabling conversations that run indefinitely.

Compaction is server-side and automatic — you enable it and Claude handles the summarization.

How it works

Turn 1-50 (tools, reasoning, responses) Turn 51 (new message) Context approaching limit Summarize turns 1-30 Turn 52-100 (more tools, more reasoning) Context approaching limit again Summarize turns 1-60 Context: 180K tokens Context: 95K tokens (turns 1-30 summarized, 31-51 full) Context: 190K tokens Context: 85K tokens (compact summary + recent turns) Agent Claude API Compaction
Compaction automatically summarizes older context as conversations grow, keeping the active context focused

Code example

Compaction works with the standard messages API — you enable it and the API handles the rest:

typescript
import Anthropic from "@anthropic-ai/sdk";
 
const anthropic = new Anthropic();
 
// Long-running agent conversation with compaction
async function runAgentLoop(
  conversationHistory: Anthropic.MessageParam[]
) {
  const response = await anthropic.messages.create({
    model: "claude-opus-4-6-20260205",
    max_tokens: 8192,
    thinking: { type: "adaptive", effort: "high" },
    // Enable compaction for long-running sessions
    compaction: { enabled: true },
    system:
      "You are a customer support agent with access to order, billing, and account tools.",
    messages: conversationHistory,
    tools: supportTools,
  });
 
  return response;
}

Before compaction, the standard approach was to manually summarize conversations on the client side — writing your own summarization prompts, deciding what to keep, managing the context budget yourself. That pattern still works if you need fine-grained control over what gets preserved. But for most agent workflows, server-side compaction is less code and better results.

If you've built a custom memory system for your agents, compaction complements it. Compaction handles the in-conversation context window. Your persistent memory system handles cross-conversation recall — customer preferences, resolution history, learned facts. They solve different problems.

Tool search: 85% fewer tokens for large tool libraries

If your agent has more than a handful of tools, you've felt this problem: every tool definition eats context tokens. An agent with 30 tools might burn 15K-20K tokens on tool definitions alone before the conversation even starts.

Tool search fixes this by letting you defer tool loading. Instead of dumping all 30 tool definitions into the context, you mark most of them as deferred. Claude gets a single "tool search" tool plus your critical, always-needed tools. When Claude needs a deferred tool, it searches dynamically.

The defer_loading pattern

typescript
import Anthropic from "@anthropic-ai/sdk";
 
const anthropic = new Anthropic();
 
// Define tools — most are deferred
const tools: Anthropic.Tool[] = [
  // Always loaded — core tools used in every conversation
  {
    name: "lookup_customer",
    description: "Look up a customer by email, phone, or account ID",
    input_schema: {
      type: "object" as const,
      properties: {
        identifier: {
          type: "string",
          description: "Email, phone number, or account ID",
        },
      },
      required: ["identifier"],
    },
  },
  // Deferred — only loaded when Claude searches for them
  {
    name: "process_refund",
    description:
      "Process a refund for a specific order. Requires order ID and reason.",
    input_schema: {
      type: "object" as const,
      properties: {
        orderId: { type: "string" },
        reason: { type: "string" },
        amount: { type: "number" },
      },
      required: ["orderId", "reason"],
    },
    // @ts-expect-error — defer_loading is a new field
    defer_loading: true,
  },
  {
    name: "schedule_callback",
    description:
      "Schedule a callback for a customer at a specific time.",
    input_schema: {
      type: "object" as const,
      properties: {
        customerId: { type: "string" },
        scheduledTime: { type: "string", format: "date-time" },
        reason: { type: "string" },
      },
      required: ["customerId", "scheduledTime"],
    },
    // @ts-expect-error — defer_loading is a new field
    defer_loading: true,
  },
  // ... 25 more deferred tools
];
 
const response = await anthropic.messages.create({
  model: "claude-opus-4-6-20260205",
  max_tokens: 4096,
  thinking: { type: "adaptive" },
  tools,
  messages: [
    {
      role: "user",
      content: "I need to return order #ORD-77123",
    },
  ],
});

Claude sees lookup_customer (always loaded) and the tool search capability. When the user mentions a return, Claude searches for refund-related tools, discovers process_refund, and uses it — without the other 25+ tool definitions ever entering the context.

The numbers

Anthropic's internal testing showed accuracy improved with tool search, not just token efficiency:

ModelWithout tool searchWith tool search
Opus 449%74%
Opus 4.579.5%88.1%

Note: Most benchmarks cited in this article come from Anthropic's own documentation. Independent third-party benchmarks for Claude 4.6's agent features are still limited as of March 2026.

Fewer tools in context means less confusion about which tool to pick. For agents managing complex tool libraries — especially those using MCP to expose tools from multiple servers — this is a meaningful architectural improvement.

Structured outputs: finally GA

Structured outputs — getting Claude to return valid JSON matching a specific schema — graduated from beta to GA on Claude 4.6. The API change is small but matters: output_format moved to output_config.format.

Before (beta)

typescript
// OLD — beta header required, output_format at top level
const response = await anthropic.messages.create({
  model: "claude-sonnet-4-5-20250514",
  max_tokens: 1024,
  // Required beta header
  betas: ["structured-outputs-2025-01-24"],
  messages: [{ role: "user", content: "Extract the customer issue" }],
  // Old location
  output_format: {
    type: "json_schema",
    json_schema: {
      name: "customer_issue",
      schema: issueSchema,
    },
  },
});

After (GA)

typescript
// NEW — no beta header, output_config.format
const response = await anthropic.messages.create({
  model: "claude-opus-4-6-20260205",
  max_tokens: 1024,
  messages: [
    {
      role: "user",
      content: "Extract the customer issue from this transcript",
    },
  ],
  output_config: {
    format: {
      type: "json_schema",
      json_schema: {
        name: "customer_issue",
        schema: {
          type: "object",
          properties: {
            category: {
              type: "string",
              enum: [
                "billing",
                "shipping",
                "product",
                "account",
                "other",
              ],
            },
            severity: {
              type: "string",
              enum: ["low", "medium", "high", "critical"],
            },
            summary: { type: "string" },
            actionRequired: { type: "boolean" },
          },
          required: [
            "category",
            "severity",
            "summary",
            "actionRequired",
          ],
        },
      },
    },
  },
});

The Python equivalent:

python
response = client.messages.create(
    model="claude-opus-4-6-20260205",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Extract the customer issue from this transcript"}
    ],
    output_config={
        "format": {
            "type": "json_schema",
            "json_schema": {
                "name": "customer_issue",
                "schema": {
                    "type": "object",
                    "properties": {
                        "category": {
                            "type": "string",
                            "enum": ["billing", "shipping", "product", "account", "other"],
                        },
                        "severity": {
                            "type": "string",
                            "enum": ["low", "medium", "high", "critical"],
                        },
                        "summary": {"type": "string"},
                        "actionRequired": {"type": "boolean"},
                    },
                    "required": ["category", "severity", "summary", "actionRequired"],
                },
            },
        }
    },
)

For agent builders, structured outputs eliminate the "parse and pray" pattern. When your agent needs to return structured tool results to a downstream system — a CRM update, a ticket creation, an analytics event — you get guaranteed schema compliance instead of hoping the JSON is valid.

Web search and code execution

Claude 4.6 adds native web search and code execution as first-party tools. Combined with Claude's multimodal capabilities for processing images, PDFs, and documents, this makes Claude 4.6 agents capable of research-driven workflows.

This matters for agents that need real-time data — stock prices, shipping status from carrier APIs, current product availability — that can't be pre-loaded into the agent's knowledge base.

typescript
const response = await anthropic.messages.create({
  model: "claude-opus-4-6-20260205",
  max_tokens: 4096,
  thinking: { type: "adaptive" },
  tools: [
    {
      type: "web_search_20260209",
      name: "web_search",
      // Dynamic filtering enabled by default on 4.6
    },
  ],
  messages: [
    {
      role: "user",
      content:
        "What are the current shipping rates for FedEx Ground from New York to Los Angeles?",
    },
  ],
});

The code execution layer runs for free when paired with web search or web fetch — Claude filters results programmatically before they consume context tokens, improving both accuracy and cost efficiency.

Opus 4.6 vs Sonnet 4.6: which one for your agent?

Sonnet 4.6 dropped twelve days after Opus 4.6, and the benchmarks tell a story most developers don't expect: Sonnet is close enough to Opus that the default choice should be Sonnet.

BenchmarkOpus 4.6Sonnet 4.6Gap
SWE-bench Verified80.8%79.6%1.2%
OSWorld (GUI automation)72.7%72.5%0.2%
Math89% (vs 62% on Sonnet 4.5)
Speed20-30 t/s40-60 t/s2x faster
Cost (input/output)$5 / $25$3 / $15~40% cheaper
Max output tokens128K64K2x on Opus

Decision framework

Start with Sonnet 4.6 when:

  • Your agent handles high-volume, latency-sensitive conversations
  • Tool selection and basic reasoning are the primary tasks
  • You need to keep costs predictable at scale
  • 64K output tokens is sufficient (it usually is)

Escalate to Opus 4.6 when:

  • You need deep multi-step reasoning across many documents
  • 128K output tokens matters (long analysis, large code generation)
  • You're using Agent Teams for coordinated multi-agent work
  • Complex policy interpretation or nuanced decision-making

The practical pattern: run Sonnet as your default, route complex requests to Opus. Most agent platforms support model routing — classify the incoming request, pick the model. Simple order lookup? Sonnet. "Review my entire account history and recommend a plan change"? Opus.

Breaking changes and migration

Three things will break your code if you upgrade to 4.6 model IDs without changes. Here's each one with the fix.

1. Prefilling assistant messages (hard break)

If you've been using assistant message prefilling to guide Claude's output format or behavior, it will not work on 4.6. The API returns a 400 error.

This is more widespread than it sounds. Prefilling was a common workaround for several patterns that now have first-class solutions:

JSON mode workaround. Before structured outputs existed, the standard way to get JSON from Claude was to prefill the opening brace:

typescript
// BREAKS on 4.6 — returns 400
messages: [
  { role: "user", content: "Classify this ticket" },
  { role: "assistant", content: '{"category": "' }, // prefill to force JSON
]

Persona injection. Prefilling was used to force Claude into a specific voice or persona from the first token:

typescript
// BREAKS on 4.6 — returns 400
messages: [
  { role: "user", content: "Help me with my order" },
  { role: "assistant", content: "Hi! I'm Alex from support. " }, // prefill persona
]
 
// FIX: System prompt handles persona
messages: [
  { role: "user", content: "Help me with my order" }
],
system: "You are Alex from support. Always introduce yourself by name."

Format enforcement in streaming. Some implementations prefilled format markers to ensure streaming responses started with the expected structure — a header, a particular greeting, or a structured preamble.

Function calling setup. Before native tool use was mature, prefilling was used to steer Claude toward calling specific functions by starting the response with a function call pattern.

Where to look in your codebase: Search for any role: "assistant" message that appears as the last element in the messages array. That's a prefill. Also check for helper functions that append assistant messages before API calls — these are often buried in utility layers.

Fix for JSON mode: Use structured outputs (now GA):

typescript
// WORKS on 4.6 — structured outputs replace prefilling for JSON
const response = await anthropic.messages.create({
  model: "claude-opus-4-6-20260205",
  max_tokens: 1024,
  messages: [{ role: "user", content: "Classify this ticket" }],
  output_config: {
    format: {
      type: "json_schema",
      json_schema: {
        name: "ticket_classification",
        schema: {
          type: "object",
          properties: {
            category: {
              type: "string",
              enum: ["billing", "shipping", "product", "account"],
            },
          },
          required: ["category"],
        },
      },
    },
  },
});

Fix for persona injection: Move to system prompts. They're more reliable than prefilling ever was, and they work across turns without needing to re-inject.

Fix for format enforcement: Structured outputs handle this natively. If you need free-form text with a specific structure, describe the format in your system prompt — Claude 4.6 follows formatting instructions more reliably than 4.5 did.

2. budget_tokens deprecation (soft break)

Your code won't break today, but it will when Anthropic removes support:

typescript
// DEPRECATED — still works, but switch now
thinking: { type: "enabled", budget_tokens: 10000 }
 
// REPLACEMENT
thinking: { type: "adaptive", effort: "high" }

3. output_format rename (soft break)

typescript
// DEPRECATED
output_format: { type: "json_schema", ... }
 
// REPLACEMENT
output_config: { format: { type: "json_schema", ... } }

Migration checklist

Progress0/7
  • Search codebase for prefilled assistant messages (role: assistant with partial content)
  • Replace all assistant prefills with structured outputs or system prompts
  • Replace thinking.type enabled + budget_tokens with adaptive thinking
  • Move output_format to output_config.format
  • Update model IDs to claude-opus-4-6-20260205 or claude-sonnet-4-6-20260217
  • Test all tool definitions still work (no schema changes, but verify)
  • Run full agent test suite against new model before deploying

What can go wrong

Every feature above has edge cases and unknowns that are worth considering before you build your architecture around them.

Autonomous agents with unrestricted tool access. The stakes of getting agent architecture wrong extend beyond bad responses. In December 2025, as reported by the Financial Times in February 2026, Amazon's Kiro AI agent was given a straightforward task to fix a minor issue in AWS Cost Explorer. With operator-level permissions and no mandatory peer review for AI-initiated changes, Kiro autonomously deleted and recreated an AWS production environment, triggering a 13-hour outage. This is the extreme version of every tool-use failure mode in this article — an agent with the right tools, the right permissions, and no behavioral guardrails to prevent catastrophic actions. As Claude 4.6 makes agents more capable, the gap between what an agent can do and what it should do gets more consequential.

Compaction: what gets lost? When compaction summarizes older turns, you lose the verbatim content. For most support conversations, this is fine — the summary preserves intent and key facts. But if your agent needs exact quotes, specific numbers from earlier in the conversation, or auditability of what was said in turn 12, you can't rely on compaction to preserve it. There's no API to inspect what the compacted summary contains or to control what gets prioritized during summarization. If you need that level of control, client-side summarization with your own prompts is still the right approach.

Adaptive thinking: debugging the black box. Adaptive thinking decides how much to reason based on query complexity, but you can't see the heuristic. When Claude under-thinks a complex query (produces a shallow answer with effort: "high"), or over-thinks a simple lookup (burns tokens reasoning about a straightforward classification), your only lever is changing the effort level. There's no way to inspect why Claude chose a particular thinking depth, which makes debugging inconsistent behavior harder than it was with explicit budget_tokens.

Web search: accuracy and freshness. The web search tool is useful but not infallible. Search results can be stale, inaccurate, or misleading — the same problems any search engine has. Claude doesn't verify the accuracy of search results before incorporating them into responses. For agents that make decisions based on web data (current prices, policy changes, regulatory information), you should validate critical facts through your own APIs rather than trusting web search alone. Rate limits on the web search tool are also not well documented.

Tool search accuracy. The 49% to 74% improvement on MCP evaluations is significant, but those are Anthropic's own benchmarks tested under their conditions. Your tool library's descriptions, naming conventions, and overlap between tools all affect how well tool search works in practice. Poorly described tools won't be found when needed. Ambiguously named tools may be found when they shouldn't be.

The 1M context window isn't free to fill. No pricing premium doesn't mean no cost. A 500K-token context at Sonnet rates costs $1.50 per request in input tokens alone. At 1,000 requests/day, that's $45,000/month just in input costs. The context window is a ceiling, not a target — retrieval and chunking are still the right default for most workloads.

Building a Claude 4.6 agent: putting it all together

Here's a complete working agent that combines adaptive thinking, tool use, and structured outputs — the patterns you'd use in a production customer support agent:

typescript
import Anthropic from "@anthropic-ai/sdk";
 
const anthropic = new Anthropic();
 
// Define tools with deferred loading for non-critical ones
const tools: Anthropic.Tool[] = [
  {
    name: "lookup_customer",
    description:
      "Look up customer details by email, phone, or account ID. Returns name, plan, account status, and recent activity.",
    input_schema: {
      type: "object" as const,
      properties: {
        identifier: {
          type: "string",
          description: "Customer email, phone, or account ID",
        },
      },
      required: ["identifier"],
    },
  },
  {
    name: "search_orders",
    description:
      "Search orders by customer ID, order number, or date range. Returns order details including status and tracking.",
    input_schema: {
      type: "object" as const,
      properties: {
        customerId: { type: "string" },
        orderId: { type: "string" },
        status: {
          type: "string",
          enum: [
            "pending",
            "shipped",
            "delivered",
            "returned",
            "cancelled",
          ],
        },
      },
      required: [],
    },
  },
];
 
// Simulated tool execution
async function executeTool(
  name: string,
  input: Record<string, unknown>
): Promise<string> {
  switch (name) {
    case "lookup_customer":
      return JSON.stringify({
        id: "cust_8291",
        name: "Sarah Chen",
        email: "sarah@example.com",
        plan: "business",
        accountStatus: "active",
        memberSince: "2024-08-15",
      });
    case "search_orders":
      return JSON.stringify({
        orders: [
          {
            id: "ORD-77123",
            status: "delivered",
            items: ["Widget Pro X2"],
            deliveredAt: "2026-03-10",
            total: 149.99,
          },
        ],
      });
    default:
      return JSON.stringify({ error: `Unknown tool: ${name}` });
  }
}
 
// Agent loop with adaptive thinking
async function runAgent(userMessage: string) {
  const messages: Anthropic.MessageParam[] = [
    { role: "user", content: userMessage },
  ];
 
  console.log(`\nUser: ${userMessage}\n`);
 
  // Agent loop — handle tool use
  while (true) {
    const response = await anthropic.messages.create({
      model: "claude-sonnet-4-6-20260217",
      max_tokens: 4096,
      thinking: { type: "adaptive", effort: "high" },
      system:
        "You are a customer support agent. Use tools to look up real data before answering. Be specific and helpful.",
      tools,
      messages,
    });
 
    // Check for tool use
    const toolBlocks = response.content.filter(
      (b) => b.type === "tool_use"
    );
 
    if (toolBlocks.length === 0) {
      // No tools — extract text response
      const textBlock = response.content.find(
        (b) => b.type === "text"
      );
      if (textBlock && textBlock.type === "text") {
        console.log(`Agent: ${textBlock.text}`);
      }
      break;
    }
 
    // Execute tools and continue
    messages.push({ role: "assistant", content: response.content });
 
    const toolResults: Anthropic.ToolResultBlockParam[] = [];
    for (const tool of toolBlocks) {
      if (tool.type === "tool_use") {
        console.log(
          `  [Tool] ${tool.name}(${JSON.stringify(tool.input)})`
        );
        const result = await executeTool(
          tool.name,
          tool.input as Record<string, unknown>
        );
        console.log(`  [Result] ${result.substring(0, 100)}...`);
        toolResults.push({
          type: "tool_result",
          tool_use_id: tool.id,
          content: result,
        });
      }
    }
 
    messages.push({ role: "user", content: toolResults });
  }
}
 
// Run it
runAgent(
  "Hi, I'm sarah@example.com. I received order ORD-77123 but the product is damaged. What are my options?"
);

This agent uses Sonnet 4.6 with adaptive thinking — it thinks deeply about the customer's situation (damaged product, needs options) without burning tokens on the simple tool lookups. In production, you'd add error handling, timeouts, and the compaction API for long sessions. You'd also route complex cases to Opus 4.6 using a classifier.

What's coming next

Three features from the 4.6 release are in research preview, which means they work but the API will change:

Agent Teams. Multiple subagents that coordinate autonomously. In Claude Code, this means spinning up agents that each handle a different part of a codebase review. For customer-facing agents, this could mean a routing agent, a research agent, and a response agent working in parallel. The API for agent coordination isn't public yet — it's exposed through the Claude Agent SDK (renamed from Claude Code SDK).

Fast mode. Opus 4.6 only. Up to 2.5x faster output token generation at premium pricing. For latency-critical agent applications — real-time voice, live chat — this could make Opus viable where speed previously forced you to use Sonnet.

Data residency controls. The inference_geo parameter lets you specify where model inference runs ("us" or "global"). For agents handling PII, healthcare data, or financial information, this is the compliance primitive you've been asking for.

Build agents with tools, memory, and observability

See how Chanl works with Claude 4.6 agents.

Learn more
DG

Co-founder

Building the platform for AI agents at Chanl — tools, testing, and observability for customer experience.

Learn Agentic AI

One lesson a week — practical techniques for building, testing, and shipping AI agents. From prompt engineering to production monitoring. Learn by doing.

500+ engineers subscribed

Frequently Asked Questions