What changed in MCP authentication since the original spec?

The June 2025 spec formally classified MCP servers as OAuth 2.0 Resource Servers and mandated OAuth 2.1 with PKCE using the S256 challenge method. The November 2025 update added Protected Resource Metadata discovery and Resource Indicators (RFC 8707) to bind tokens to specific servers, preventing token replay across services.

Why did MCP deprecate SSE in favor of Streamable HTTP?

SSE had three critical limitations for production use: no resumable streams, a requirement for long-lived server connections, and unidirectional message flow. Streamable HTTP fixes all three — it supports stateless request-response, optional SSE upgrade for streaming, session management via the Mcp-Session-Id header, and works cleanly with standard HTTP infrastructure.

What is MCP sampling and when should you use it?

Sampling lets an MCP server request an LLM completion from the client — flipping the usual direction of inference. Use it when a tool needs to reason over intermediate results, classify data before acting, or generate content as part of a multi-step workflow. The client mediates all sampling requests, so the server never gets direct LLM access.

How do MCP gateways differ from regular API gateways?

MCP gateways are session-aware and handle bidirectional communication, which standard API gateways don't support well. They route JSON-RPC messages (not REST endpoints), manage MCP session lifecycle, and can transform tool schemas, enforce per-tenant access control, and cache tool results — all capabilities specific to the MCP protocol.

Can you dynamically add and remove tools from an MCP server at runtime?

Yes. MCP servers can send a notifications/tools/list_changed notification whenever their tool inventory changes. The client then calls tools/list to get the updated set. This enables patterns like permission-gated tools, feature flags, and context-dependent tool availability without reconnecting.

What are the biggest MCP security risks in production?

According to the 2025 Astrix security report, 43% of early MCP servers had command injection vulnerabilities, and over half relied on insecure static secrets. The top risks are prompt injection through tool descriptions, token replay attacks (now mitigated by Resource Indicators), tool poisoning from untrusted registries, and insufficient input validation against JSON Schema.

How do you handle multi-tenant tool isolation in MCP?

Run a gateway that inspects the authenticated tenant from the OAuth token, then routes to tenant-specific MCP server instances or injects tenant context into shared servers. Secrets are resolved per-tenant at execution time, never embedded in tool definitions. Each tenant should have isolated tool registries and independent rate limits.

What are MCP Tasks and why do they matter?

Introduced in the November 2025 spec as an experimental primitive, Tasks let any MCP request become asynchronous — the server returns a task handle, and the client polls for status and results. This unlocks long-running operations like document processing, model training, and multi-step workflows that would otherwise time out.

MCP Deep Dive: Advanced Patterns for Agent Tool Integration | Chanl Blog

You've built an MCP server. Tools register, clients connect, the Inspector shows green checkmarks. Now ship it to production and watch everything break.

That's not a dig at the protocol — it's the reality of moving from local stdio transport to a distributed system handling real traffic. Authentication? The early spec didn't have it. Transport reliability? SSE connections drop behind load balancers. Multi-tenancy? Your single-server demo serves one user at a time.

The MCP specification has evolved significantly since those early days. The June 2025 update added OAuth 2.1 as the standard auth framework. The March 2025 release introduced Streamable HTTP to replace fragile SSE connections. And the November 2025 release brought asynchronous Tasks, Protected Resource Metadata, and a comprehensive authorization framework that enterprises actually trust. With over 97 million monthly SDK downloads and backing from Anthropic, OpenAI, Google, and Microsoft, MCP isn't experimental anymore — it's the integration standard.

This article covers the advanced patterns that take you from "it works on my machine" to "it handles 10,000 concurrent sessions across 50 tenants." We'll build working TypeScript for each pattern, because architecture diagrams without code are just decorations.

Prerequisites

This is the sequel to MCP Explained: Build Your First MCP Server in TypeScript and Python. You should be comfortable with MCP's three primitives (tools, resources, prompts), JSON-RPC 2.0, and the basic client-server lifecycle before continuing. If terms like tools/list or initialize don't ring a bell, start there.

You'll also want familiarity with OAuth 2.0 concepts (authorization codes, access tokens, PKCE) and HTTP transport patterns. The AI Agent Tools guide covers tool management patterns that complement what we'll build here.

bash

npm install @modelcontextprotocol/sdk zod express jsonwebtoken

Every code example uses TypeScript and runs standalone. We're building production patterns, not toys.

Pattern	What You'll Build	When You Need It
OAuth 2.1 with PKCE	Auth middleware for MCP servers as OAuth Resource Servers	Any remote MCP deployment
Streamable HTTP transport	Full transport implementation with session management	Replacing SSE, serverless environments
MCP gateways	Proxy layer for routing, auth, and observability	Multi-server architectures
Dynamic tool registration	Runtime tool add/remove with client notification	Feature flags, permission-gated tools
MCP sampling	Server-initiated LLM completions	Tools that need to reason mid-execution
Multi-tenant architecture	Tenant-isolated MCP with per-tenant secrets	SaaS platforms, shared infrastructure
Security hardening	Input validation, token scoping, audit logging	Every production deployment

OAuth 2.1 Authentication: MCP Servers as Resource Servers

MCP's authorization model treats every remote MCP server as an OAuth 2.0 Resource Server — the same role your REST API plays in a standard OAuth flow. Clients authenticate through an Authorization Server, receive scoped access tokens, and present those tokens on every MCP request. The spec mandates OAuth 2.1 with PKCE using the S256 challenge method, no exceptions.

This wasn't always the case. Early MCP deployments ran over stdio with no auth at all, or used static API keys passed as environment variables. The June 2025 spec update formalized the OAuth requirement, and the November 2025 release added Protected Resource Metadata (PRM) discovery — a mechanism where the MCP server advertises which Authorization Server clients should use.

Here's the discovery flow. When a client first connects to a remote MCP server, it doesn't need to know the Authorization Server upfront:

MCP OAuth 2.1 discovery flow: the server tells the client where to authenticate

The critical security addition here is Resource Indicators (RFC 8707). When the client requests a token, it includes the MCP server's URL as the resource parameter. The Authorization Server embeds this as the token's audience claim. The MCP server then validates that the token was specifically issued for it — preventing a compromised server from replaying tokens against other services.

Here's what token validation looks like on the server side. This middleware sits in front of your MCP request handler and rejects anything without a properly scoped token:

typescript

import { IncomingMessage } from 'http';
import jwt from 'jsonwebtoken';
 
interface McpTokenClaims {
  sub: string;
  aud: string | string[];
  scope: string;
  iss: string;
  exp: number;
  tenant_id?: string;
}
 
const MCP_SERVER_RESOURCE = 'https://mcp.example.com';
 
function validateMcpToken(req: IncomingMessage): McpTokenClaims {
  const authHeader = req.headers.authorization;
  if (!authHeader?.startsWith('Bearer ')) {
    throw new Error('Missing Bearer token');
  }
 
  const token = authHeader.slice(7);
  const claims = jwt.verify(token, publicKey, {
    algorithms: ['RS256'],
    issuer: 'https://auth.example.com',
  }) as McpTokenClaims;
 
  // RFC 8707: verify the token was issued for THIS server
  const audiences = Array.isArray(claims.aud) ? claims.aud : [claims.aud];
  if (!audiences.includes(MCP_SERVER_RESOURCE)) {
    throw new Error(
      `Token audience mismatch: expected ${MCP_SERVER_RESOURCE}, got ${audiences.join(', ')}`
    );
  }
 
  // Verify required scopes for MCP operations
  const scopes = claims.scope?.split(' ') || [];
  if (!scopes.includes('mcp:tools') && !scopes.includes('mcp:read')) {
    throw new Error('Insufficient scope: requires mcp:tools or mcp:read');
  }
 
  return claims;
}

On the client side, PKCE protects the authorization code exchange. The client generates a random code_verifier, hashes it with SHA-256 to create the code_challenge, sends the challenge during authorization, and presents the original verifier when exchanging the code for a token. This prevents intercepted authorization codes from being used by a different party:

typescript

import { randomBytes, createHash } from 'crypto';
 
function generatePkceChallenge() {
  // Generate a cryptographically random verifier (43-128 chars)
  const verifier = randomBytes(32)
    .toString('base64url')
    .slice(0, 64);
 
  // S256: SHA-256 hash of verifier, base64url-encoded
  const challenge = createHash('sha256')
    .update(verifier)
    .digest('base64url');
 
  return { verifier, challenge, method: 'S256' as const };
}
 
// During authorization request
async function startMcpAuth(authServerUrl: string, mcpServerUrl: string) {
  const { verifier, challenge, method } = generatePkceChallenge();
 
  const authUrl = new URL(`${authServerUrl}/authorize`);
  authUrl.searchParams.set('response_type', 'code');
  authUrl.searchParams.set('client_id', CLIENT_ID);
  authUrl.searchParams.set('redirect_uri', REDIRECT_URI);
  authUrl.searchParams.set('scope', 'mcp:tools mcp:read');
  authUrl.searchParams.set('code_challenge', challenge);
  authUrl.searchParams.set('code_challenge_method', method);
  // RFC 8707: bind the token to this specific MCP server
  authUrl.searchParams.set('resource', mcpServerUrl);
 
  // Store verifier for the token exchange step
  return { authUrl: authUrl.toString(), verifier };
}

Two things to note. First, the resource parameter ensures the resulting token is audience-bound — it can only be used against mcpServerUrl. Second, the spec requires HTTPS for all authorization endpoints. There's no development-mode exception for production deployments.

Streamable HTTP: The Transport That Replaced SSE

Streamable HTTP is the production transport for remote MCP servers, replacing the deprecated HTTP+SSE transport from the 2024-11-05 spec. It solves three problems that made SSE unreliable at scale: no stream resumption, mandatory long-lived connections, and unidirectional message delivery. Every new MCP server should use Streamable HTTP unless you're building a local tool that runs over stdio.

The core idea is simple: the server exposes a single HTTP endpoint that accepts POST requests (for client-to-server messages) and GET requests (for opening an optional SSE stream for server-initiated messages). Most interactions are plain request-response — the client POSTs a JSON-RPC message, the server responds with JSON. When the server needs to stream multiple messages (progress updates, notifications), it can upgrade a single response to SSE.

Streamable HTTP: most calls are simple POST/response, with optional SSE for streaming

Session management is critical. After initialization, the server returns an Mcp-Session-Id header — a cryptographically secure, globally unique identifier. The client includes this header on every subsequent request, and the server uses it to route messages to the correct session state.

Here's a Streamable HTTP server implementation. This handles both POST (standard request-response) and GET (SSE stream for notifications), with proper session tracking:

typescript

import express from 'express';
import { randomUUID } from 'crypto';
 
interface McpSession {
  id: string;
  tenantId: string;
  sseResponse?: express.Response;
  tools: Map<string, ToolDefinition>;
  createdAt: Date;
}
 
const sessions = new Map<string, McpSession>();
 
const app = express();
app.use(express.json());
 
// POST /mcp — handles all client-to-server JSON-RPC messages
app.post('/mcp', async (req, res) => {
  const message = req.body;
 
  // Initialize: create session, return capabilities
  if (message.method === 'initialize') {
    const sessionId = randomUUID();
    const session: McpSession = {
      id: sessionId,
      tenantId: extractTenantFromToken(req),
      tools: loadToolsForTenant(extractTenantFromToken(req)),
      createdAt: new Date(),
    };
    sessions.set(sessionId, session);
 
    res.setHeader('Mcp-Session-Id', sessionId);
    return res.json({
      jsonrpc: '2.0',
      id: message.id,
      result: {
        protocolVersion: '2025-11-25',
        capabilities: {
          tools: { listChanged: true },
          sampling: {},
        },
        serverInfo: { name: 'acme-mcp', version: '2.1.0' },
      },
    });
  }
 
  // All other requests require a valid session
  const sessionId = req.headers['mcp-session-id'] as string;
  const session = sessions.get(sessionId);
  if (!session) {
    return res.status(404).json({
      jsonrpc: '2.0',
      id: message.id,
      error: { code: -32001, message: 'Invalid or expired session' },
    });
  }
 
  // Route to handler based on method
  if (message.method === 'tools/list') {
    return res.json({
      jsonrpc: '2.0',
      id: message.id,
      result: { tools: Array.from(session.tools.values()) },
    });
  }
 
  if (message.method === 'tools/call') {
    const tool = session.tools.get(message.params.name);
    if (!tool) {
      return res.status(404).json({
        jsonrpc: '2.0',
        id: message.id,
        error: { code: -32602, message: `Unknown tool: ${message.params.name}` },
      });
    }
 
    // For long-running tools, upgrade to SSE
    if (tool.longRunning) {
      res.setHeader('Content-Type', 'text/event-stream');
      res.setHeader('Cache-Control', 'no-cache');
      res.setHeader('Connection', 'keep-alive');
 
      // Stream progress updates
      await executeWithProgress(tool, message.params.arguments, (progress) => {
        res.write(`data: ${JSON.stringify({
          jsonrpc: '2.0',
          method: 'notifications/progress',
          params: { progressToken: message.id, progress: progress.percent, total: 100 },
        })}\n\n`);
      });
 
      // Send final result and close
      res.write(`data: ${JSON.stringify({
        jsonrpc: '2.0',
        id: message.id,
        result: { content: [{ type: 'text', text: 'Completed' }] },
      })}\n\n`);
      return res.end();
    }
 
    // Standard synchronous tool execution
    const result = await executeTool(tool, message.params.arguments);
    return res.json({
      jsonrpc: '2.0',
      id: message.id,
      result: { content: [{ type: 'text', text: JSON.stringify(result) }] },
    });
  }
});
 
// GET /mcp — opens SSE stream for server-initiated messages
app.get('/mcp', (req, res) => {
  const sessionId = req.headers['mcp-session-id'] as string;
  const session = sessions.get(sessionId);
  if (!session) return res.status(404).end();
 
  res.setHeader('Content-Type', 'text/event-stream');
  res.setHeader('Cache-Control', 'no-cache');
  res.setHeader('Connection', 'keep-alive');
 
  // Store the response for server-initiated pushes
  session.sseResponse = res;
 
  req.on('close', () => {
    session.sseResponse = undefined;
  });
});
 
app.listen(3002, () => console.log('MCP Streamable HTTP server on :3002'));

Why does this matter over SSE? Three reasons. First, stateless tools don't need a persistent connection — a POST returns a JSON response and the connection closes. Infrastructure (load balancers, CDNs, serverless functions) handles this natively. Second, the SSE upgrade is opt-in per request, not per session. A server can respond with JSON for fast tools and SSE for slow ones. Third, session IDs make horizontal scaling straightforward — you can route by session to a sticky backend or store session state in Redis.

The TypeScript SDK (version 1.10.0+) supports Streamable HTTP out of the box via StreamableHTTPServerTransport. If you're using the SDK, you get session management, content negotiation, and SSE upgrade for free.

MCP Gateways: The Control Plane for Tool Traffic

An MCP gateway is a reverse proxy purpose-built for MCP traffic — session-aware, JSON-RPC-native, and capable of transforming tool schemas in flight. Think of it as the layer between your AI agents and your tool infrastructure that handles auth, routing, observability, and access control in one place.

Standard API gateways (Kong, Envoy, AWS API Gateway) handle REST traffic well. But MCP is bidirectional JSON-RPC with session state, which makes standard request-response proxying insufficient. An MCP gateway understands the protocol: it can inspect tools/list responses, enforce tool-level permissions, route different tools to different backends, and inject tenant context — all transparently to the client.

MCP gateway architecture: one entry point, multiple backend MCP servers

Here's a gateway implementation that handles routing, auth validation, and tool-level access control. This sits between clients and your MCP server fleet:

typescript

import express from 'express';
 
interface GatewayRoute {
  toolPrefix: string;
  upstream: string;
  requiredScopes: string[];
}
 
const routes: GatewayRoute[] = [
  { toolPrefix: 'crm_', upstream: 'http://mcp-crm:3002', requiredScopes: ['crm:read'] },
  { toolPrefix: 'payment_', upstream: 'http://mcp-payments:3003', requiredScopes: ['payments:write'] },
  { toolPrefix: 'internal_', upstream: 'http://mcp-internal:3004', requiredScopes: ['admin'] },
];
 
const gateway = express();
gateway.use(express.json());
 
gateway.post('/mcp', async (req, res) => {
  const message = req.body;
 
  // Validate OAuth token on every request
  const claims = validateMcpToken(req);
 
  // tools/call: route based on tool name prefix
  if (message.method === 'tools/call') {
    const toolName = message.params?.name;
    const route = routes.find((r) => toolName?.startsWith(r.toolPrefix));
 
    if (!route) {
      return res.status(404).json({
        jsonrpc: '2.0',
        id: message.id,
        error: { code: -32602, message: `No route for tool: ${toolName}` },
      });
    }
 
    // Check tool-level scopes
    const tokenScopes = claims.scope.split(' ');
    const hasAccess = route.requiredScopes.every((s) => tokenScopes.includes(s));
    if (!hasAccess) {
      logAudit('tool_access_denied', { tool: toolName, tenant: claims.tenant_id });
      return res.status(403).json({
        jsonrpc: '2.0',
        id: message.id,
        error: { code: -32603, message: `Insufficient scope for ${toolName}` },
      });
    }
 
    // Forward to upstream MCP server
    logAudit('tool_call', { tool: toolName, tenant: claims.tenant_id });
    const upstream = await fetch(`${route.upstream}/mcp`, {
      method: 'POST',
      headers: {
        'Content-Type': 'application/json',
        'X-Tenant-Id': claims.tenant_id || '',
        'X-Gateway-Token': createInternalToken(claims),
      },
      body: JSON.stringify(message),
    });
 
    const result = await upstream.json();
    return res.json(result);
  }
 
  // tools/list: aggregate from all upstreams the tenant has access to
  if (message.method === 'tools/list') {
    const tokenScopes = claims.scope.split(' ');
    const accessible = routes.filter((r) =>
      r.requiredScopes.every((s) => tokenScopes.includes(s))
    );
 
    const allTools = await Promise.all(
      accessible.map(async (route) => {
        const resp = await fetch(`${route.upstream}/mcp`, {
          method: 'POST',
          headers: {
            'Content-Type': 'application/json',
            'X-Tenant-Id': claims.tenant_id || '',
          },
          body: JSON.stringify({ jsonrpc: '2.0', id: 'list', method: 'tools/list', params: {} }),
        });
        const data = await resp.json();
        return data.result?.tools || [];
      })
    );
 
    return res.json({
      jsonrpc: '2.0',
      id: message.id,
      result: { tools: allTools.flat() },
    });
  }
});

The gateway gives you a single chokepoint for every agent-tool interaction. Need to rate-limit a specific tenant? Add it here. Need an audit trail of every tool invocation? It's already in the proxy. Need to swap a backend MCP server without touching client code? Change the route.

Production MCP gateways like Peta MCP Suite and IBM's ContextForge add capabilities beyond basic proxying: intelligent caching (skip re-executing deterministic tools), cost tracking per tenant, and canary deployments where you route a percentage of traffic to a new tool version. If you're managing MCP servers at scale, the gateway becomes essential infrastructure rather than an optional nicety.

Dynamic Tool Registration: Tools That Appear and Disappear at Runtime

MCP servers don't have to declare a static set of tools at startup. The protocol supports runtime tool registration — adding, removing, and modifying tools while sessions are active — through the notifications/tools/list_changed notification. When a server sends this notification, connected clients call tools/list to refresh their tool inventory without reconnecting.

This isn't just a convenience feature. It enables patterns that static tool sets can't support:

Permission-gated tools: A user authenticates, and tools they're authorized for appear. Their session expires, and sensitive tools disappear.
Feature flags: Roll out a new tool to 5% of sessions, monitor error rates, then gradually increase exposure.
Context-dependent availability: Show database tools only when a database connection is healthy. Hide them during maintenance.
Progressive disclosure: Start with basic tools, reveal advanced ones after the agent demonstrates competence (yes, really — some teams gate tools behind successful execution of simpler ones).

Here's a dynamic tool registry that supports runtime registration and notifies all connected clients when the tool set changes:

typescript

import { EventEmitter } from 'events';
import { z } from 'zod';
 
interface DynamicTool {
  name: string;
  description: string;
  inputSchema: z.ZodType;
  handler: (args: unknown) => Promise<unknown>;
  requiredPermissions: string[];
  featureFlag?: string;
  healthCheck?: () => Promise<boolean>;
}
 
class DynamicToolRegistry extends EventEmitter {
  private tools = new Map<string, DynamicTool>();
  private toolHealth = new Map<string, boolean>();
 
  register(tool: DynamicTool): void {
    this.tools.set(tool.name, tool);
    this.toolHealth.set(tool.name, true);
    this.emit('tools_changed');
    console.log(`Tool registered: ${tool.name}`);
  }
 
  unregister(toolName: string): void {
    if (this.tools.delete(toolName)) {
      this.toolHealth.delete(toolName);
      this.emit('tools_changed');
      console.log(`Tool unregistered: ${toolName}`);
    }
  }
 
  // Get tools visible to a specific session based on permissions and health
  getVisibleTools(
    userPermissions: string[],
    activeFeatureFlags: string[]
  ): DynamicTool[] {
    return Array.from(this.tools.values()).filter((tool) => {
      // Permission check
      const hasPermission = tool.requiredPermissions.every((p) =>
        userPermissions.includes(p)
      );
      if (!hasPermission) return false;
 
      // Feature flag check
      if (tool.featureFlag && !activeFeatureFlags.includes(tool.featureFlag)) {
        return false;
      }
 
      // Health check
      if (this.toolHealth.get(tool.name) === false) return false;
 
      return true;
    });
  }
 
  // Periodic health monitoring
  async runHealthChecks(): Promise<void> {
    for (const [name, tool] of this.tools) {
      if (tool.healthCheck) {
        try {
          const healthy = await tool.healthCheck();
          const wasHealthy = this.toolHealth.get(name);
          this.toolHealth.set(name, healthy);
 
          // Only notify if health status changed
          if (healthy !== wasHealthy) {
            this.emit('tools_changed');
            console.log(`Tool ${name} health changed: ${healthy}`);
          }
        } catch {
          this.toolHealth.set(name, false);
          this.emit('tools_changed');
        }
      }
    }
  }
}
 
// Wire it up to session notifications
const registry = new DynamicToolRegistry();
 
registry.on('tools_changed', () => {
  // Notify all active sessions
  for (const session of sessions.values()) {
    if (session.sseResponse) {
      session.sseResponse.write(`data: ${JSON.stringify({
        jsonrpc: '2.0',
        method: 'notifications/tools/list_changed',
      })}\n\n`);
    }
  }
});
 
// Register a tool at runtime
registry.register({
  name: 'stripe_refund',
  description: 'Process a refund for a Stripe charge',
  inputSchema: z.object({
    chargeId: z.string().describe('Stripe charge ID'),
    amount: z.number().optional().describe('Partial refund amount in cents'),
    reason: z.enum(['duplicate', 'fraudulent', 'requested_by_customer']),
  }),
  handler: async (args) => { /* Stripe API call */ },
  requiredPermissions: ['payments:refund'],
  featureFlag: 'stripe-refund-v2',
  healthCheck: async () => {
    const resp = await fetch('https://api.stripe.com/v1/balance', {
      headers: { Authorization: `Bearer ${process.env.STRIPE_KEY}` },
    });
    return resp.ok;
  },
});
 
// Health checks every 30 seconds
setInterval(() => registry.runHealthChecks(), 30_000);

The health check pattern deserves emphasis. When an external service goes down, tools that depend on it should disappear from the agent's context rather than fail at execution time. An agent that doesn't know a tool exists won't try to use it. An agent that sees a tool and can't use it wastes tokens, confuses users, and generates error logs.

MCP Sampling: When the Server Needs to Think

Sampling inverts the normal direction of LLM inference. Instead of the client sending prompts to the server, the MCP server sends a sampling/createMessage request back to the client, asking it to run an LLM completion. The client mediates the request — optionally showing it to the user for approval — then returns the LLM's response to the server. The server never gets direct access to the model.

Why would a tool need LLM access? Consider a data analysis tool that queries a database, gets 500 rows, and needs to summarize them before returning results. Or a content moderation tool that checks whether generated text violates policy. Or a workflow orchestrator that needs to classify an intermediate result before deciding which step to execute next. These all require reasoning during tool execution, not just before or after.

MCP sampling: the server requests an LLM completion through the client

Here's a tool that uses sampling to classify and summarize query results. The server requests an LLM completion mid-execution, uses the response to structure its output, and returns the final result:

typescript

import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
import { z } from 'zod';
 
const server = new McpServer({
  name: 'analytics-server',
  version: '1.0.0',
  capabilities: { sampling: {} }, // Advertise sampling support
});
 
server.tool(
  'analyze_support_tickets',
  'Analyze recent support tickets and return categorized insights',
  {
    timeRange: z.enum(['24h', '7d', '30d']).describe('Time range for analysis'),
    category: z.string().optional().describe('Filter by ticket category'),
  },
  async ({ timeRange, category }, { sendRequest }) => {
    // Step 1: Query the database
    const tickets = await queryTickets({ timeRange, category });
 
    if (tickets.length === 0) {
      return {
        content: [{ type: 'text', text: 'No tickets found for the specified criteria.' }],
      };
    }
 
    // Step 2: Use sampling to classify and summarize the tickets
    // The server asks the CLIENT to run this through an LLM
    const samplingResult = await sendRequest({
      method: 'sampling/createMessage',
      params: {
        messages: [
          {
            role: 'user',
            content: {
              type: 'text',
              text: `Analyze these ${tickets.length} support tickets and provide:
1. Top 3 issue categories with ticket counts
2. Sentiment breakdown (positive/neutral/negative)
3. Average resolution time per category
4. Any emerging patterns or spikes
 
Raw ticket data:
${JSON.stringify(tickets.slice(0, 50), null, 2)}
${tickets.length > 50 ? `\n... and ${tickets.length - 50} more tickets` : ''}`,
            },
          },
        ],
        systemPrompt: 'You are a support analytics assistant. Return structured analysis in markdown format. Be specific with numbers. Do not speculate beyond the data provided.',
        maxTokens: 1000,
        modelPreferences: {
          hints: [{ name: 'claude-sonnet-4-20250514' }],
          intelligencePriority: 0.7,
          speedPriority: 0.3,
        },
      },
    });
 
    // Step 3: Combine raw stats with LLM analysis
    const stats = computeBasicStats(tickets);
 
    return {
      content: [
        {
          type: 'text',
          text: `## Support Ticket Analysis (${timeRange})\n\n**Total tickets:** ${tickets.length}\n**Avg response time:** ${stats.avgResponseTime}h\n\n${samplingResult.content.text}`,
        },
      ],
    };
  }
);

Sampling has a deliberate constraint: the client controls everything. It can modify the prompt before sending it to the model, filter the response before returning it to the server, or reject the request entirely. This human-in-the-loop design means the server can't use sampling to exfiltrate data through crafted prompts — at least not without the user or client noticing.

The modelPreferences field lets the server suggest — not mandate — which model to use. The intelligencePriority and speedPriority values (0 to 1) hint at the tradeoff, but the client makes the final decision. If your tool needs fast classification, set speed high. If it needs nuanced analysis, favor intelligence. The client maps these preferences to whatever models it has available.

One practical concern: sampling adds latency. Every sampling/createMessage round trip includes client-side validation, LLM inference, and possibly user approval. For tools that need sub-second responses, sampling isn't the right pattern. Use it for analytical tools, content generation, and multi-step workflows where the additional latency is worth the reasoning capability.

mcp-config.json

Live

{

"mcpServers":

{

"chanl":

{

"url": "https://app.chanl.ai/mcp",

"transport": "sse",

"apiKey": "sk-chanl-...a4f2"

}

Tools

12 connected

Memory

Active

Knowledge

3 sources

Multi-Tenant MCP Architecture

Multi-tenancy in MCP means multiple organizations sharing infrastructure while maintaining strict isolation of data, tools, secrets, and access controls. Every production SaaS that exposes MCP servers faces this challenge — you can't spin up a dedicated MCP cluster for each customer, but you also can't let Tenant A's database credentials leak into Tenant B's tool execution.

The architecture has three isolation layers: authentication (who is this?), authorization (what can they access?), and runtime isolation (how do we prevent cross-contamination?).

Multi-tenant MCP: tenant context flows from auth through gateway to isolated execution

The key design decision is whether to use shared servers with tenant context injection or dedicated server instances per tenant. Shared servers are cheaper but require careful isolation in application code. Dedicated instances are safer but more expensive and harder to manage at scale. Most production systems use a hybrid: shared servers for read-only tools, dedicated instances for tools that execute write operations or handle sensitive data.

Here's a tenant-aware secret resolver. Secrets are never embedded in tool definitions — they're resolved at execution time based on the authenticated tenant:

typescript

interface TenantSecretStore {
  resolve(tenantId: string, secretName: string): Promise<string | null>;
  listAvailable(tenantId: string): Promise<string[]>;
}
 
class VaultSecretStore implements TenantSecretStore {
  async resolve(tenantId: string, secretName: string): Promise<string | null> {
    // Secrets are namespaced by tenant in the vault
    const path = `mcp/tenants/${tenantId}/secrets/${secretName}`;
 
    try {
      const response = await fetch(`${VAULT_URL}/v1/${path}`, {
        headers: { 'X-Vault-Token': VAULT_TOKEN },
      });
 
      if (!response.ok) return null;
      const data = await response.json();
      return data.data?.value || null;
    } catch {
      return null;
    }
  }
 
  async listAvailable(tenantId: string): Promise<string[]> {
    const path = `mcp/tenants/${tenantId}/secrets`;
    const response = await fetch(`${VAULT_URL}/v1/${path}?list=true`, {
      headers: { 'X-Vault-Token': VAULT_TOKEN },
    });
    const data = await response.json();
    return data.data?.keys || [];
  }
}
 
// In tool execution, secrets resolve based on the authenticated tenant
async function executeToolWithTenantSecrets(
  tool: DynamicTool,
  args: unknown,
  tenantId: string,
  secretStore: TenantSecretStore
): Promise<unknown> {
  // Resolve all secrets this tool needs for this tenant
  const resolvedSecrets: Record<string, string> = {};
 
  for (const secretName of tool.requiredSecrets || []) {
    const value = await secretStore.resolve(tenantId, secretName);
    if (!value) {
      throw new Error(
        `Secret '${secretName}' not configured for tenant ${tenantId}`
      );
    }
    resolvedSecrets[secretName] = value;
  }
 
  // Execute with tenant-scoped secrets injected
  return tool.handler(args, { secrets: resolvedSecrets, tenantId });
}

Per-tenant rate limiting prevents a single noisy tenant from degrading service for others. Track tool invocations by tenant and enforce both request-per-second and daily quota limits:

typescript

interface RateLimitConfig {
  requestsPerSecond: number;
  dailyQuota: number;
}
 
const tenantLimits = new Map<string, RateLimitConfig>();
const tenantUsage = new Map<string, { second: number; daily: number; lastReset: number }>();
 
function checkRateLimit(tenantId: string): { allowed: boolean; retryAfter?: number } {
  const limits = tenantLimits.get(tenantId) || { requestsPerSecond: 10, dailyQuota: 10000 };
  const usage = tenantUsage.get(tenantId) || { second: 0, daily: 0, lastReset: Date.now() };
 
  const now = Date.now();
 
  // Reset second counter
  if (now - usage.lastReset > 1000) {
    usage.second = 0;
    usage.lastReset = now;
  }
 
  // Check per-second limit
  if (usage.second >= limits.requestsPerSecond) {
    return { allowed: false, retryAfter: 1 };
  }
 
  // Check daily quota
  if (usage.daily >= limits.dailyQuota) {
    return { allowed: false, retryAfter: 86400 };
  }
 
  usage.second++;
  usage.daily++;
  tenantUsage.set(tenantId, usage);
  return { allowed: true };
}

If you're building a platform where agents use tools and memory across multiple customer workspaces, multi-tenant isolation isn't optional — it's table stakes. The alternative is a security incident that affects every customer simultaneously.

Security Hardening for Production MCP

The 2025 Astrix security report found that 43% of early MCP servers had command injection vulnerabilities, and over half relied on insecure static secrets rather than proper OAuth flows. Shipping an MCP server to production without security hardening is asking for an incident. Here's the defense-in-depth approach that addresses the real attack surface.

Input Validation

Every tool parameter must be validated against its JSON Schema before execution. The MCP spec defines input schemas using JSON Schema, and Zod gives you runtime validation in TypeScript. Never trust that the client (or the LLM) sent valid data:

typescript

import { z } from 'zod';
 
// Define strict schemas — not permissive "any" objects
const refundSchema = z.object({
  chargeId: z.string()
    .regex(/^ch_[a-zA-Z0-9]{24}$/, 'Invalid Stripe charge ID format'),
  amount: z.number()
    .int()
    .positive()
    .max(999999, 'Refund amount exceeds maximum')
    .optional(),
  reason: z.enum(['duplicate', 'fraudulent', 'requested_by_customer']),
});
 
// Validate BEFORE any external call
function validateToolInput<T>(schema: z.ZodType<T>, input: unknown): T {
  const result = schema.safeParse(input);
  if (!result.success) {
    const errors = result.error.issues
      .map((i) => `${i.path.join('.')}: ${i.message}`)
      .join('; ');
    throw new Error(`Input validation failed: ${errors}`);
  }
  return result.data;
}

Prompt Injection Defense

Tool descriptions are part of the LLM's context, which makes them a vector for prompt injection. A malicious MCP server (or a compromised tool registry) can inject instructions through tool descriptions that override the agent's behavior. Validate tool descriptions from external sources and sanitize before including them in context:

typescript

function sanitizeToolDescription(description: string): string {
  // Strip anything that looks like prompt injection
  const suspicious = [
    /ignore previous instructions/i,
    /you are now/i,
    /system:\s/i,
    /forget everything/i,
    /<\/?system>/i,
  ];
 
  for (const pattern of suspicious) {
    if (pattern.test(description)) {
      console.warn(`Suspicious tool description blocked: ${description.slice(0, 100)}`);
      return 'Tool description unavailable — flagged for review.';
    }
  }
 
  // Truncate excessively long descriptions
  if (description.length > 500) {
    return description.slice(0, 497) + '...';
  }
 
  return description;
}

Audit Logging

Every tool invocation should produce a structured audit log entry. This isn't just for compliance — it's how you debug "why did the agent transfer $50,000 to an unknown account?" at 2 AM:

typescript

interface AuditEntry {
  timestamp: string;
  sessionId: string;
  tenantId: string;
  userId: string;
  toolName: string;
  inputHash: string; // SHA-256 of input, not the raw input (may contain PII)
  result: 'success' | 'error' | 'denied';
  durationMs: number;
  errorMessage?: string;
}
 
function logAuditEntry(entry: AuditEntry): void {
  // Structured JSON to stdout — your log aggregator picks it up
  console.log(JSON.stringify({
    ...entry,
    level: entry.result === 'error' ? 'error' : 'info',
    service: 'mcp-server',
    type: 'tool_audit',
  }));
}

Token Scope Enforcement

Don't just validate that a token exists — verify that its scopes match the specific operation. A token with mcp:read shouldn't be able to call tools that write data:

typescript

const toolScopeMap: Record<string, string[]> = {
  'crm_search_contacts': ['crm:read'],
  'crm_update_contact': ['crm:read', 'crm:write'],
  'payment_refund': ['payments:write', 'payments:refund'],
  'analytics_query': ['analytics:read'],
};
 
function enforceToolScopes(toolName: string, tokenScopes: string[]): void {
  const required = toolScopeMap[toolName];
  if (!required) {
    throw new Error(`No scope mapping for tool: ${toolName}`);
  }
 
  const missing = required.filter((s) => !tokenScopes.includes(s));
  if (missing.length > 0) {
    throw new Error(
      `Missing required scopes for ${toolName}: ${missing.join(', ')}`
    );
  }
}

Container and Network Isolation

Even with input validation and token scoping, defense-in-depth means assuming your application code has bugs. Run MCP servers in isolated containers with read-only filesystems, non-root users, and strict resource limits. Network policies should prevent MCP servers from reaching anything except the specific services they need — a CRM tool server shouldn't be able to connect to your payment database, even if a code injection makes it try.

For high-security environments, consider running tool execution in ephemeral sandboxes (Firecracker microVMs or gVisor containers) that are destroyed after each invocation. The overhead is measurable — expect 50-200ms of cold-start latency — but the isolation guarantee is worth it for tools that process untrusted input or execute user-provided code.

These layers stack. A request must pass token validation, then audience verification, then scope enforcement, then input validation, then container-level isolation, then execution with tenant-scoped secrets — and every step gets logged. Skip any one layer and you've created an attack surface.

For teams building agent infrastructure with monitoring and observability, the audit log becomes your primary debugging tool. When an agent does something unexpected, the audit trail tells you exactly which tools were called, with what inputs, and in what order.

The Spec Is Moving Fast: What's Next

The November 2025 specification (2025-11-25) represents the biggest evolution since MCP launched. Beyond the OAuth and transport improvements we've already covered, three additions are worth tracking.

Tasks turn any MCP request into an asynchronous operation. A server can return a task handle instead of waiting for completion, and the client polls for progress and results via tasks/get and tasks/result. This unlocks workloads that would otherwise time out — document processing, batch analytics, multi-step approval workflows. Tasks support states like working, input_required, completed, failed, and cancelled, giving clients fine-grained control over long-running operations.

Elicitation lets an MCP server pause mid-execution and request structured input from the user through the client. Instead of the tool failing because it needs a clarification ("Which subscription do you want to cancel — Pro or Enterprise?"), it sends an ElicitationRequest with a schema describing what it needs. The client renders a form, the user responds, and execution continues. The spec explicitly prohibits using elicitation for sensitive data like credentials — that's what URL-mode elicitation is for, redirecting the user to a secure external page.

Structured tool outputs let tool definitions declare their output schema, not just their input schema. Clients and LLMs can now know what shape the response will have before calling the tool, enabling better planning, type-safe tool chains, and reduced hallucination when the model processes results.

Structured outputs also have a direct impact on agent evaluation. When a tool declares its output schema upfront, you can write deterministic assertions against the response shape instead of relying on fuzzy text matching. Combined with Tasks (which give you discrete status transitions to test) and sampling (which introduces an LLM call you can mock), these primitives make MCP tool chains far more testable than the fire-and-forget function calls they replace.

These capabilities are experimental in the current spec, meaning the API surface may change. But the direction is clear: MCP is evolving from a synchronous tool-calling protocol into a full-fledged agent-infrastructure standard. If you're building tools that need async processing, user interaction, or structured data pipelines, start experimenting now.

Putting It All Together: A Production Checklist

If you've been building MCP servers with stdio transport and no auth, here's the path to production readiness. Not every project needs every pattern — but you should make a conscious decision about each one.

Progress0/12

Start with auth and transport — they're the foundation everything else builds on. Add the gateway when you have more than three MCP servers or more than one tenant. Add sampling when your tools need to reason. Add dynamic registration when your tool set is no longer static.

The protocol's maturity curve is steep. A year ago, MCP was a local development protocol with no auth and stdio-only transport. Today, it has OAuth 2.1, Streamable HTTP, asynchronous tasks, sampling, elicitation, and structured outputs — with major enterprises running it in production. Teams building agent tool infrastructure that integrates with the protocol are already seeing the compounding benefits: one integration point, universal client compatibility, and a security model that auditors actually understand.

The basics get you connected. These patterns keep you running.

Frequently Asked Questions

How do I migrate an existing SSE-based MCP server to Streamable HTTP?

Replace your SSE transport with StreamableHTTPServerTransport from the TypeScript SDK (v1.10.0+). The main change is that your server now handles POST requests for individual JSON-RPC messages instead of maintaining a persistent SSE connection. Existing tools don't need modification — only the transport layer changes. Keep your SSE endpoint running temporarily for clients that haven't upgraded.

Can I use MCP sampling without human approval for every request?

Yes. The spec says clients may show sampling requests to users for approval, not that they must. In automated pipelines, the client can auto-approve sampling requests that match certain criteria (trusted servers, specific tools, low-risk operations). But you should log every sampling request for audit purposes regardless.

What's the performance overhead of running an MCP gateway?

Minimal for a properly implemented gateway. The JSON-RPC message parsing adds microseconds. The real overhead is network hops — each proxied request adds one round trip between the client and the upstream server. For most tool invocations that involve external API calls (databases, SaaS APIs), the gateway latency is noise compared to the tool execution time. Use connection pooling and keep-alive to minimize TCP overhead.

Do I need separate OAuth scopes for each tool?

Not necessarily. Group tools by capability domain (read, write, admin) and assign scopes at that level. Fine-grained per-tool scopes become unmanageable past 20-30 tools. The scope hierarchy mcp:read < mcp:tools < mcp:admin covers most use cases, with domain-specific scopes (crm:read, payments:write) for tools that access sensitive systems.

How do MCP Tasks differ from just making a synchronous tool call that takes a long time?

Tasks give the client control over long-running operations. Instead of blocking on a single HTTP request that might time out, the client gets a task handle immediately and can check progress, cancel the operation, or handle input_required states where the task needs additional information. This is essential for operations that take minutes (batch processing, model training) rather than seconds.

Ship MCP servers that handle production traffic

Chanl's MCP runtime manages auth, multi-tenant routing, and tool lifecycle so you focus on building tools — not infrastructure.

Explore MCP features

Sources & References

Key Takeaway

Testing edge cases before production deployment can reduce customer complaints by 80% and prevent costly emergency fixes post-launch.

learning-ai mcp typescript security oauth tools agent-infrastructure

Lucas Dalamarta

Engineering Lead

Building the platform for AI agents at Chanl — tools, testing, and observability for customer experience.

Learn Agentic AI

One lesson a week — practical techniques for building, testing, and shipping AI agents. From prompt engineering to production monitoring. Learn by doing.