What is adaptive thinking in Claude 4.6?

Adaptive thinking replaces the old budget_tokens approach to extended thinking. Instead of setting a fixed token budget for reasoning, you set an effort level (low, medium, high, or max) and Claude decides how deeply to think based on the complexity of each request. It outperforms fixed budgets for most agentic workloads because simple tool calls get fast responses while complex multi-step reasoning gets full thinking depth.

How much does Claude Opus 4.6 cost?

Claude Opus 4.6 costs $5 per million input tokens and $25 per million output tokens. The 1M token context window is available at standard pricing with no long-context premium. Sonnet 4.6 costs $3/$15 per million tokens and delivers 98% of Opus's coding performance at roughly one-fifth the cost.

What's the difference between Opus 4.6 and Sonnet 4.6 for AI agents?

Sonnet 4.6 scores within 1.2 percentage points of Opus on SWE-bench Verified (79.6% vs 80.8%) and is effectively tied on OSWorld GUI automation (72.5% vs 72.7%). Opus offers 128K max output tokens (vs 64K), deeper reasoning on complex tasks, and Agent Teams support. Start with Sonnet for most agent workloads and escalate to Opus when you hit reasoning limits.

How does the Claude 4.6 compaction API work?

The compaction API provides automatic server-side context summarization. When your conversation approaches the context window limit, the API summarizes older content into concise summaries, keeping the active context focused. This enables effectively infinite conversations and is especially valuable for agent workflows with many tool calls and long reasoning chains.

What breaking changes are in Claude 4.6?

Three breaking changes to watch for: prefilling assistant messages now returns a 400 error (not supported on 4.6), budget_tokens and thinking type 'enabled' are deprecated in favor of adaptive thinking, and output_format has moved to output_config.format for structured outputs.

How does tool search reduce token usage in Claude 4.6?

Tool search lets you mark tools with defer_loading: true so they aren't loaded into Claude's context upfront. Claude only sees a tool search tool plus your most critical tools. When it needs a deferred tool, it searches dynamically. This reduces token usage by 85% for large tool libraries while improving accuracy — Opus went from 49% to 74% on MCP evaluations with tool search enabled.

Can Claude 4.6 search the web during agent conversations?

Yes. Claude 4.6 supports the web_search_20260209 tool with dynamic filtering. Claude can write and execute code to filter web search results before they enter the context window, improving accuracy while reducing token consumption. Code execution is free when used alongside web search or web fetch.

What is the Claude Agent SDK?

The Claude Agent SDK (renamed from Claude Code SDK) is Anthropic's official framework for building agents powered by Claude. It provides the same agent harness that powers Claude Code — including tool use, permission systems, subagent coordination, and memory across long-running tasks — for any type of agent application.

Claude 4.6 broke our production agent in two hours — here's what's worth the migration | Chanl Blog

I upgraded a production agent from Claude Sonnet 4.5 to Claude Opus 4.6 on a Tuesday morning. By Tuesday afternoon, every request was returning 400 errors. Not some requests — all of them. The agent had been prefilling assistant messages to steer responses, and that pattern silently broke in 4.6. No deprecation warning in the 4.5 response. No heads-up in the migration guide I skimmed. Just a hard 400.

I wasn't alone. In February 2026, LiveKit filed GitHub issue #4907 — Claude 4.6's prefilling removal immediately broke their entire Claude integration for voice and video agent pipelines. The prefilling removal wasn't a deprecation with a grace period. It was a hard 400 error that crashed production agents on day one. If one of the largest real-time communication platforms got caught off guard, you can assume plenty of smaller teams did too.

That's the kind of thing this article exists for. Anthropic released Claude 4.6 with a list of features that sounds transformative — adaptive thinking, a million-token context window, automatic compaction, tool search — and most of it genuinely is. But if you're building AI agents in production, what matters is knowing what breaks, what costs what, and where the new features actually change your code.

Prerequisites and setup

You'll need Node.js 18+ or Python 3.10+, an Anthropic API key, and a terminal. Install the SDK for your language:

bash

# TypeScript
npm install @anthropic-ai/sdk
 
# Python
pip install anthropic

Create a .env file with your API key:

text

ANTHROPIC_API_KEY=sk-ant-...

The model IDs you'll use throughout this article:

Model	ID	Release Date
Claude Opus 4.6	`claude-opus-4-6-20260205`	Feb 5, 2026
Claude Sonnet 4.6	`claude-sonnet-4-6-20260217`	Feb 17, 2026

If you're new to Claude's tool use API, the tool system deep dive covers the execution loop mechanics.

What changed in Claude 4.6 (the TL;DR)

Claude 4.6 is the largest single-release API surface change since Claude 3 introduced tool use. Nine features shipped across Opus and Sonnet, three things broke, and one pricing decision eliminated the main objection to large context windows.

Here's the full feature matrix — Opus 4.6 vs Sonnet 4.6 vs the 4.5 models they replace:

Feature	Opus 4.6	Sonnet 4.6	Opus 4.5	Sonnet 4.5
Context window	1M (GA)	1M (GA)	200K	200K
Max output tokens	128K	64K	64K	16K
Adaptive thinking	Yes	Yes	No	No
Compaction API	Yes	Yes	No	No
Tool search	Yes	Yes	Yes	Yes
Structured outputs	GA	GA	Beta	Beta
Web search tool	v20260209	v20260209	v20250305	v20250305
Fast mode	Preview	No	No	No
Data residency	Yes	Yes	No	No
Agent Teams	Preview	No	No	No

What broke

Three changes will bite you if you upgrade without reading the docs:

Prefilling assistant messages — Returns a 400 error. No fallback, no flag to re-enable.
thinking: {type: "enabled"} with budget_tokens — Deprecated. Still works today, will be removed.
output_format — Moved to output_config.format. The old key still works but logs a deprecation warning.

If you're running agents in production, check your code for all three before upgrading. The prefill one is the silent killer — it worked fine on 4.5.

Adaptive thinking: the end of budget_tokens

Adaptive thinking is the single most important change for agent developers. It replaces the guesswork of setting a fixed thinking budget with a system that scales reasoning depth automatically based on what Claude is actually being asked to do.

The old way (deprecated)

Previously, you had to guess how many tokens Claude should spend thinking:

typescript

// OLD — deprecated on 4.6
const response = await anthropic.messages.create({
  model: "claude-opus-4-5-20250129",
  max_tokens: 16000,
  thinking: {
    type: "enabled",
    budget_tokens: 10000,
  },
  messages: [{ role: "user", content: "Look up order #ORD-48291" }],
});

The problem: a simple order lookup doesn't need 10,000 tokens of reasoning. But a complex multi-tool workflow — "compare this customer's purchase history against our return policy and recommend the best resolution" — might need every token. With a fixed budget, you either overspend on simple queries or underthink complex ones.

The new way: effort levels

typescript

import Anthropic from "@anthropic-ai/sdk";
 
const anthropic = new Anthropic();
 
// Simple tool call — low effort
const simpleResponse = await anthropic.messages.create({
  model: "claude-opus-4-6-20260205",
  max_tokens: 16000,
  thinking: {
    type: "adaptive",
    effort: "low",
  },
  messages: [{ role: "user", content: "What time is it in Tokyo?" }],
});
 
// Complex reasoning — high effort (default)
const complexResponse = await anthropic.messages.create({
  model: "claude-opus-4-6-20260205",
  max_tokens: 16000,
  thinking: {
    type: "adaptive",
    effort: "high",
  },
  messages: [
    {
      role: "user",
      content:
        "Review this customer's last 5 interactions, identify the recurring issue, and draft a resolution plan that addresses the root cause.",
    },
  ],
});

The Python equivalent:

python

import anthropic
 
client = anthropic.Anthropic()
 
# Adaptive thinking with effort control
response = client.messages.create(
    model="claude-opus-4-6-20260205",
    max_tokens=16000,
    thinking={
        "type": "adaptive",
        "effort": "high",
    },
    messages=[
        {
            "role": "user",
            "content": "Analyze this support transcript and identify where the agent should have escalated.",
        }
    ],
)
 
# Access thinking content
for block in response.content:
    if block.type == "thinking":
        print(f"Reasoning: {block.thinking}")
    elif block.type == "text":
        print(f"Response: {block.text}")

When to use each effort level

Effort	Use Case	Agent Example
`low`	Simple lookups, classification	"What's this customer's plan tier?"
`medium`	Standard tool selection, single-step tasks	"Cancel this order and send confirmation"
`high`	Multi-step reasoning, policy interpretation	"This customer wants a refund but is outside the window — what are our options?"
`max`	Complex analysis, multi-tool orchestration	"Review all interactions from this week, identify systemic issues, and draft a report"

For most agent workloads, high (the default) handles the sweet spot: Claude thinks deeply when the query is complex and skips unnecessary reasoning for straightforward requests. Set low for high-throughput, latency-sensitive operations like real-time classification. Use max sparingly — it burns through output tokens for analysis tasks that genuinely need deep reasoning.

1M tokens at standard pricing

Claude 4.6 ships with a 1-million-token context window — and unlike previous long-context options, there's no premium pricing. You pay the same rate per token whether you're using 10K tokens or 900K.

What this means for agent architecture

The 1M context window changes three architectural patterns:

Full conversation history. Instead of summarizing or truncating old messages, agents can maintain the complete conversation — including all tool calls, results, and reasoning — for sessions that run into the hundreds of turns. This is the difference between an agent that "remembers" what happened ten minutes ago and one that actually has the full record.

RAG with less chunking pressure. With 200K tokens, you had to aggressively chunk and rank documents before injecting them. With 1M, you can include entire documents, full policy manuals, or complete knowledge base sections. The RAG architecture patterns still apply — you still want retrieval over brute-force context stuffing — but the ceiling is dramatically higher.

Multi-agent context sharing. When one agent hands off to another, the receiving agent can ingest the full history of the previous conversation without lossy summarization. The customer doesn't have to repeat anything.

Pricing comparison

Model	Input (per 1M tokens)	Output (per 1M tokens)	Context Window
Opus 4.6	$5	$25	1M
Sonnet 4.6	$3	$15	1M
GPT-4o	$2.50	$10	128K
Gemini 1.5 Pro	$1.25	$5	2M

Cost analysis: worked example

Instead of saying "the math works," here's the actual math for a customer support agent:

text

Average session: 15,000 input tokens + 3,000 output tokens
Volume: 1,000 sessions/day, 30 days/month
 
Sonnet 4.6:
  Input:  $3/1M  × 15,000 × 1,000 × 30 = $1,350/month
  Output: $15/1M × 3,000  × 1,000 × 30 = $1,350/month
  Total: $2,700/month
 
Opus 4.6:
  Input:  $5/1M  × 15,000 × 1,000 × 30 = $2,250/month
  Output: $25/1M × 3,000  × 1,000 × 30 = $2,250/month
  Total: $4,500/month
 
With adaptive thinking on Sonnet (est. 40% of sessions skip extended thinking):
  Output savings: ~$540/month
  Effective total: ~$2,160/month

Sonnet at $2,700/month handles 30,000 customer conversations. That's $0.09 per conversation. For most support operations, that's a fraction of what a human agent costs per ticket. The no-premium pricing on the 1M context window is what makes this viable — previously, long-context requests came with a multiplier that broke the economics at high volume.

The compaction API: infinite conversations for agents

Even with 1M tokens, long-running agent sessions will eventually hit the ceiling. The compaction API solves this by automatically summarizing older context when you approach the limit — enabling conversations that run indefinitely.

Compaction is server-side and automatic — you enable it and Claude handles the summarization.

How it works

Compaction automatically summarizes older context as conversations grow, keeping the active context focused

Code example

Compaction works with the standard messages API — you enable it and the API handles the rest:

typescript

import Anthropic from "@anthropic-ai/sdk";
 
const anthropic = new Anthropic();
 
// Long-running agent conversation with compaction
async function runAgentLoop(
  conversationHistory: Anthropic.MessageParam[]
) {
  const response = await anthropic.messages.create({
    model: "claude-opus-4-6-20260205",
    max_tokens: 8192,
    thinking: { type: "adaptive", effort: "high" },
    // Enable compaction for long-running sessions
    compaction: { enabled: true },
    system:
      "You are a customer support agent with access to order, billing, and account tools.",
    messages: conversationHistory,
    tools: supportTools,
  });
 
  return response;
}

Before compaction, the standard approach was to manually summarize conversations on the client side — writing your own summarization prompts, deciding what to keep, managing the context budget yourself. That pattern still works if you need fine-grained control over what gets preserved. But for most agent workflows, server-side compaction is less code and better results.

If you've built a custom memory system for your agents, compaction complements it. Compaction handles the in-conversation context window. Your persistent memory system handles cross-conversation recall — customer preferences, resolution history, learned facts. They solve different problems.

Tool search: 85% fewer tokens for large tool libraries

If your agent has more than a handful of tools, you've felt this problem: every tool definition eats context tokens. An agent with 30 tools might burn 15K-20K tokens on tool definitions alone before the conversation even starts.

Tool search fixes this by letting you defer tool loading. Instead of dumping all 30 tool definitions into the context, you mark most of them as deferred. Claude gets a single "tool search" tool plus your critical, always-needed tools. When Claude needs a deferred tool, it searches dynamically.

The defer_loading pattern

typescript

import Anthropic from "@anthropic-ai/sdk";
 
const anthropic = new Anthropic();
 
// Define tools — most are deferred
const tools: Anthropic.Tool[] = [
  // Always loaded — core tools used in every conversation
  {
    name: "lookup_customer",
    description: "Look up a customer by email, phone, or account ID",
    input_schema: {
      type: "object" as const,
      properties: {
        identifier: {
          type: "string",
          description: "Email, phone number, or account ID",
        },
      },
      required: ["identifier"],
    },
  },
  // Deferred — only loaded when Claude searches for them
  {
    name: "process_refund",
    description:
      "Process a refund for a specific order. Requires order ID and reason.",
    input_schema: {
      type: "object" as const,
      properties: {
        orderId: { type: "string" },
        reason: { type: "string" },
        amount: { type: "number" },
      },
      required: ["orderId", "reason"],
    },
    // @ts-expect-error — defer_loading is a new field
    defer_loading: true,
  },
  {
    name: "schedule_callback",
    description:
      "Schedule a callback for a customer at a specific time.",
    input_schema: {
      type: "object" as const,
      properties: {
        customerId: { type: "string" },
        scheduledTime: { type: "string", format: "date-time" },
        reason: { type: "string" },
      },
      required: ["customerId", "scheduledTime"],
    },
    // @ts-expect-error — defer_loading is a new field
    defer_loading: true,
  },
  // ... 25 more deferred tools
];
 
const response = await anthropic.messages.create({
  model: "claude-opus-4-6-20260205",
  max_tokens: 4096,
  thinking: { type: "adaptive" },
  tools,
  messages: [
    {
      role: "user",
      content: "I need to return order #ORD-77123",
    },
  ],
});

Claude sees lookup_customer (always loaded) and the tool search capability. When the user mentions a return, Claude searches for refund-related tools, discovers process_refund, and uses it — without the other 25+ tool definitions ever entering the context.

The numbers

Anthropic's internal testing showed accuracy improved with tool search, not just token efficiency:

Model	Without tool search	With tool search
Opus 4	49%	74%
Opus 4.5	79.5%	88.1%

Note: Most benchmarks cited in this article come from Anthropic's own documentation. Independent third-party benchmarks for Claude 4.6's agent features are still limited as of March 2026.

Fewer tools in context means less confusion about which tool to pick. For agents managing complex tool libraries — especially those using MCP to expose tools from multiple servers — this is a meaningful architectural improvement.

Structured outputs: finally GA

Structured outputs — getting Claude to return valid JSON matching a specific schema — graduated from beta to GA on Claude 4.6. The API change is small but matters: output_format moved to output_config.format.

Before (beta)

typescript

// OLD — beta header required, output_format at top level
const response = await anthropic.messages.create({
  model: "claude-sonnet-4-5-20250514",
  max_tokens: 1024,
  // Required beta header
  betas: ["structured-outputs-2025-01-24"],
  messages: [{ role: "user", content: "Extract the customer issue" }],
  // Old location
  output_format: {
    type: "json_schema",
    json_schema: {
      name: "customer_issue",
      schema: issueSchema,
    },
  },
});

After (GA)

typescript

// NEW — no beta header, output_config.format
const response = await anthropic.messages.create({
  model: "claude-opus-4-6-20260205",
  max_tokens: 1024,
  messages: [
    {
      role: "user",
      content: "Extract the customer issue from this transcript",
    },
  ],
  output_config: {
    format: {
      type: "json_schema",
      json_schema: {
        name: "customer_issue",
        schema: {
          type: "object",
          properties: {
            category: {
              type: "string",
              enum: [
                "billing",
                "shipping",
                "product",
                "account",
                "other",
              ],
            },
            severity: {
              type: "string",
              enum: ["low", "medium", "high", "critical"],
            },
            summary: { type: "string" },
            actionRequired: { type: "boolean" },
          },
          required: [
            "category",
            "severity",
            "summary",
            "actionRequired",
          ],
        },
      },
    },
  },
});

The Python equivalent:

python

response = client.messages.create(
    model="claude-opus-4-6-20260205",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Extract the customer issue from this transcript"}
    ],
    output_config={
        "format": {
            "type": "json_schema",
            "json_schema": {
                "name": "customer_issue",
                "schema": {
                    "type": "object",
                    "properties": {
                        "category": {
                            "type": "string",
                            "enum": ["billing", "shipping", "product", "account", "other"],
                        },
                        "severity": {
                            "type": "string",
                            "enum": ["low", "medium", "high", "critical"],
                        },
                        "summary": {"type": "string"},
                        "actionRequired": {"type": "boolean"},
                    },
                    "required": ["category", "severity", "summary", "actionRequired"],
                },
            },
        }
    },
)

For agent builders, structured outputs eliminate the "parse and pray" pattern. When your agent needs to return structured tool results to a downstream system — a CRM update, a ticket creation, an analytics event — you get guaranteed schema compliance instead of hoping the JSON is valid.

Web search and code execution

Claude 4.6 adds native web search and code execution as first-party tools. Combined with Claude's multimodal capabilities for processing images, PDFs, and documents, this makes Claude 4.6 agents capable of research-driven workflows.

This matters for agents that need real-time data — stock prices, shipping status from carrier APIs, current product availability — that can't be pre-loaded into the agent's knowledge base.

typescript

const response = await anthropic.messages.create({
  model: "claude-opus-4-6-20260205",
  max_tokens: 4096,
  thinking: { type: "adaptive" },
  tools: [
    {
      type: "web_search_20260209",
      name: "web_search",
      // Dynamic filtering enabled by default on 4.6
    },
  ],
  messages: [
    {
      role: "user",
      content:
        "What are the current shipping rates for FedEx Ground from New York to Los Angeles?",
    },
  ],
});

The code execution layer runs for free when paired with web search or web fetch — Claude filters results programmatically before they consume context tokens, improving both accuracy and cost efficiency.

Opus 4.6 vs Sonnet 4.6: which one for your agent?

Sonnet 4.6 dropped twelve days after Opus 4.6, and the benchmarks tell a story most developers don't expect: Sonnet is close enough to Opus that the default choice should be Sonnet.

Benchmark	Opus 4.6	Sonnet 4.6	Gap
SWE-bench Verified	80.8%	79.6%	1.2%
OSWorld (GUI automation)	72.7%	72.5%	0.2%
Math	—	89% (vs 62% on Sonnet 4.5)	—
Speed	20-30 t/s	40-60 t/s	2x faster
Cost (input/output)	$5 / $25	$3 / $15	~40% cheaper
Max output tokens	128K	64K	2x on Opus

Decision framework

Start with Sonnet 4.6 when:

Your agent handles high-volume, latency-sensitive conversations
Tool selection and basic reasoning are the primary tasks
You need to keep costs predictable at scale
64K output tokens is sufficient (it usually is)

Escalate to Opus 4.6 when:

You need deep multi-step reasoning across many documents
128K output tokens matters (long analysis, large code generation)
You're using Agent Teams for coordinated multi-agent work
Complex policy interpretation or nuanced decision-making

The practical pattern: run Sonnet as your default, route complex requests to Opus. Most agent platforms support model routing — classify the incoming request, pick the model. Simple order lookup? Sonnet. "Review my entire account history and recommend a plan change"? Opus.

Breaking changes and migration

Three things will break your code if you upgrade to 4.6 model IDs without changes. Here's each one with the fix.

1. Prefilling assistant messages (hard break)

If you've been using assistant message prefilling to guide Claude's output format or behavior, it will not work on 4.6. The API returns a 400 error.

This is more widespread than it sounds. Prefilling was a common workaround for several patterns that now have first-class solutions:

JSON mode workaround. Before structured outputs existed, the standard way to get JSON from Claude was to prefill the opening brace:

typescript

// BREAKS on 4.6 — returns 400
messages: [
  { role: "user", content: "Classify this ticket" },
  { role: "assistant", content: '{"category": "' }, // prefill to force JSON
]

Persona injection. Prefilling was used to force Claude into a specific voice or persona from the first token:

typescript

// BREAKS on 4.6 — returns 400
messages: [
  { role: "user", content: "Help me with my order" },
  { role: "assistant", content: "Hi! I'm Alex from support. " }, // prefill persona
]
 
// FIX: System prompt handles persona
messages: [
  { role: "user", content: "Help me with my order" }
],
system: "You are Alex from support. Always introduce yourself by name."

Format enforcement in streaming. Some implementations prefilled format markers to ensure streaming responses started with the expected structure — a header, a particular greeting, or a structured preamble.

Function calling setup. Before native tool use was mature, prefilling was used to steer Claude toward calling specific functions by starting the response with a function call pattern.

Where to look in your codebase: Search for any role: "assistant" message that appears as the last element in the messages array. That's a prefill. Also check for helper functions that append assistant messages before API calls — these are often buried in utility layers.

Fix for JSON mode: Use structured outputs (now GA):

typescript

// WORKS on 4.6 — structured outputs replace prefilling for JSON
const response = await anthropic.messages.create({
  model: "claude-opus-4-6-20260205",
  max_tokens: 1024,
  messages: [{ role: "user", content: "Classify this ticket" }],
  output_config: {
    format: {
      type: "json_schema",
      json_schema: {
        name: "ticket_classification",
        schema: {
          type: "object",
          properties: {
            category: {
              type: "string",
              enum: ["billing", "shipping", "product", "account"],
            },
          },
          required: ["category"],
        },
      },
    },
  },
});

Fix for persona injection: Move to system prompts. They're more reliable than prefilling ever was, and they work across turns without needing to re-inject.

Fix for format enforcement: Structured outputs handle this natively. If you need free-form text with a specific structure, describe the format in your system prompt — Claude 4.6 follows formatting instructions more reliably than 4.5 did.

2. budget_tokens deprecation (soft break)

Your code won't break today, but it will when Anthropic removes support:

typescript

// DEPRECATED — still works, but switch now
thinking: { type: "enabled", budget_tokens: 10000 }
 
// REPLACEMENT
thinking: { type: "adaptive", effort: "high" }

3. output_format rename (soft break)

typescript

// DEPRECATED
output_format: { type: "json_schema", ... }
 
// REPLACEMENT
output_config: { format: { type: "json_schema", ... } }

Migration checklist

Progress0/7

Search codebase for prefilled assistant messages (role: assistant with partial content)
Replace all assistant prefills with structured outputs or system prompts
Replace thinking.type enabled + budget_tokens with adaptive thinking
Move output_format to output_config.format
Update model IDs to claude-opus-4-6-20260205 or claude-sonnet-4-6-20260217
Test all tool definitions still work (no schema changes, but verify)
Run full agent test suite against new model before deploying

What can go wrong

Every feature above has edge cases and unknowns that are worth considering before you build your architecture around them.

Autonomous agents with unrestricted tool access. The stakes of getting agent architecture wrong extend beyond bad responses. In December 2025, as reported by the Financial Times in February 2026, Amazon's Kiro AI agent was given a straightforward task to fix a minor issue in AWS Cost Explorer. With operator-level permissions and no mandatory peer review for AI-initiated changes, Kiro autonomously deleted and recreated an AWS production environment, triggering a 13-hour outage. This is the extreme version of every tool-use failure mode in this article — an agent with the right tools, the right permissions, and no behavioral guardrails to prevent catastrophic actions. As Claude 4.6 makes agents more capable, the gap between what an agent can do and what it should do gets more consequential.

Compaction: what gets lost? When compaction summarizes older turns, you lose the verbatim content. For most support conversations, this is fine — the summary preserves intent and key facts. But if your agent needs exact quotes, specific numbers from earlier in the conversation, or auditability of what was said in turn 12, you can't rely on compaction to preserve it. There's no API to inspect what the compacted summary contains or to control what gets prioritized during summarization. If you need that level of control, client-side summarization with your own prompts is still the right approach.

Adaptive thinking: debugging the black box. Adaptive thinking decides how much to reason based on query complexity, but you can't see the heuristic. When Claude under-thinks a complex query (produces a shallow answer with effort: "high"), or over-thinks a simple lookup (burns tokens reasoning about a straightforward classification), your only lever is changing the effort level. There's no way to inspect why Claude chose a particular thinking depth, which makes debugging inconsistent behavior harder than it was with explicit budget_tokens.

Web search: accuracy and freshness. The web search tool is useful but not infallible. Search results can be stale, inaccurate, or misleading — the same problems any search engine has. Claude doesn't verify the accuracy of search results before incorporating them into responses. For agents that make decisions based on web data (current prices, policy changes, regulatory information), you should validate critical facts through your own APIs rather than trusting web search alone. Rate limits on the web search tool are also not well documented.

Tool search accuracy. The 49% to 74% improvement on MCP evaluations is significant, but those are Anthropic's own benchmarks tested under their conditions. Your tool library's descriptions, naming conventions, and overlap between tools all affect how well tool search works in practice. Poorly described tools won't be found when needed. Ambiguously named tools may be found when they shouldn't be.

The 1M context window isn't free to fill. No pricing premium doesn't mean no cost. A 500K-token context at Sonnet rates costs $1.50 per request in input tokens alone. At 1,000 requests/day, that's $45,000/month just in input costs. The context window is a ceiling, not a target — retrieval and chunking are still the right default for most workloads.

Building a Claude 4.6 agent: putting it all together

Here's a complete working agent that combines adaptive thinking, tool use, and structured outputs — the patterns you'd use in a production customer support agent:

typescript

import Anthropic from "@anthropic-ai/sdk";
 
const anthropic = new Anthropic();
 
// Define tools with deferred loading for non-critical ones
const tools: Anthropic.Tool[] = [
  {
    name: "lookup_customer",
    description:
      "Look up customer details by email, phone, or account ID. Returns name, plan, account status, and recent activity.",
    input_schema: {
      type: "object" as const,
      properties: {
        identifier: {
          type: "string",
          description: "Customer email, phone, or account ID",
        },
      },
      required: ["identifier"],
    },
  },
  {
    name: "search_orders",
    description:
      "Search orders by customer ID, order number, or date range. Returns order details including status and tracking.",
    input_schema: {
      type: "object" as const,
      properties: {
        customerId: { type: "string" },
        orderId: { type: "string" },
        status: {
          type: "string",
          enum: [
            "pending",
            "shipped",
            "delivered",
            "returned",
            "cancelled",
          ],
        },
      },
      required: [],
    },
  },
];
 
// Simulated tool execution
async function executeTool(
  name: string,
  input: Record<string, unknown>
): Promise<string> {
  switch (name) {
    case "lookup_customer":
      return JSON.stringify({
        id: "cust_8291",
        name: "Sarah Chen",
        email: "sarah@example.com",
        plan: "business",
        accountStatus: "active",
        memberSince: "2024-08-15",
      });
    case "search_orders":
      return JSON.stringify({
        orders: [
          {
            id: "ORD-77123",
            status: "delivered",
            items: ["Widget Pro X2"],
            deliveredAt: "2026-03-10",
            total: 149.99,
          },
        ],
      });
    default:
      return JSON.stringify({ error: `Unknown tool: ${name}` });
  }
}
 
// Agent loop with adaptive thinking
async function runAgent(userMessage: string) {
  const messages: Anthropic.MessageParam[] = [
    { role: "user", content: userMessage },
  ];
 
  console.log(`\nUser: ${userMessage}\n`);
 
  // Agent loop — handle tool use
  while (true) {
    const response = await anthropic.messages.create({
      model: "claude-sonnet-4-6-20260217",
      max_tokens: 4096,
      thinking: { type: "adaptive", effort: "high" },
      system:
        "You are a customer support agent. Use tools to look up real data before answering. Be specific and helpful.",
      tools,
      messages,
    });
 
    // Check for tool use
    const toolBlocks = response.content.filter(
      (b) => b.type === "tool_use"
    );
 
    if (toolBlocks.length === 0) {
      // No tools — extract text response
      const textBlock = response.content.find(
        (b) => b.type === "text"
      );
      if (textBlock && textBlock.type === "text") {
        console.log(`Agent: ${textBlock.text}`);
      }
      break;
    }
 
    // Execute tools and continue
    messages.push({ role: "assistant", content: response.content });
 
    const toolResults: Anthropic.ToolResultBlockParam[] = [];
    for (const tool of toolBlocks) {
      if (tool.type === "tool_use") {
        console.log(
          `  [Tool] ${tool.name}(${JSON.stringify(tool.input)})`
        );
        const result = await executeTool(
          tool.name,
          tool.input as Record<string, unknown>
        );
        console.log(`  [Result] ${result.substring(0, 100)}...`);
        toolResults.push({
          type: "tool_result",
          tool_use_id: tool.id,
          content: result,
        });
      }
    }
 
    messages.push({ role: "user", content: toolResults });
  }
}
 
// Run it
runAgent(
  "Hi, I'm sarah@example.com. I received order ORD-77123 but the product is damaged. What are my options?"
);

This agent uses Sonnet 4.6 with adaptive thinking — it thinks deeply about the customer's situation (damaged product, needs options) without burning tokens on the simple tool lookups. In production, you'd add error handling, timeouts, and the compaction API for long sessions. You'd also route complex cases to Opus 4.6 using a classifier.

What's coming next

Three features from the 4.6 release are in research preview, which means they work but the API will change:

Agent Teams. Multiple subagents that coordinate autonomously. In Claude Code, this means spinning up agents that each handle a different part of a codebase review. For customer-facing agents, this could mean a routing agent, a research agent, and a response agent working in parallel. The API for agent coordination isn't public yet — it's exposed through the Claude Agent SDK (renamed from Claude Code SDK).

Fast mode. Opus 4.6 only. Up to 2.5x faster output token generation at premium pricing. For latency-critical agent applications — real-time voice, live chat — this could make Opus viable where speed previously forced you to use Sonnet.

Data residency controls. The inference_geo parameter lets you specify where model inference runs ("us" or "global"). For agents handling PII, healthcare data, or financial information, this is the compliance primitive you've been asking for.

Build agents with tools, memory, and observability

See how Chanl works with Claude 4.6 agents.

Learn more

Sources & References

Key Takeaway

Testing edge cases before production deployment can reduce customer complaints by 80% and prevent costly emergency fixes post-launch.

learning-ai claude anthropic ai-agents tool-use typescript python llm

Dean Grover

Co-founder

Building the platform for AI agents at Chanl — tools, testing, and observability for customer experience.

Learn Agentic AI

One lesson a week — practical techniques for building, testing, and shipping AI agents. From prompt engineering to production monitoring. Learn by doing.