You've built an MCP server. Tools register, clients connect, the Inspector shows green checkmarks. Now ship it to production and watch everything break.
That's not a dig at the protocol — it's the reality of moving from local stdio transport to a distributed system handling real traffic. Authentication? The early spec didn't have it. Transport reliability? SSE connections drop behind load balancers. Multi-tenancy? Your single-server demo serves one user at a time.
The MCP specification has evolved significantly since those early days. The June 2025 update added OAuth 2.1 as the standard auth framework. The March 2025 release introduced Streamable HTTP to replace fragile SSE connections. And the November 2025 release brought asynchronous Tasks, Protected Resource Metadata, and a comprehensive authorization framework that enterprises actually trust. With over 97 million monthly SDK downloads and backing from Anthropic, OpenAI, Google, and Microsoft, MCP isn't experimental anymore — it's the integration standard.
This article covers the advanced patterns that take you from "it works on my machine" to "it handles 10,000 concurrent sessions across 50 tenants." We'll build working TypeScript for each pattern, because architecture diagrams without code are just decorations.
Prerequisites
This is the sequel to MCP Explained: Build Your First MCP Server in TypeScript and Python. You should be comfortable with MCP's three primitives (tools, resources, prompts), JSON-RPC 2.0, and the basic client-server lifecycle before continuing. If terms like tools/list or initialize don't ring a bell, start there.
You'll also want familiarity with OAuth 2.0 concepts (authorization codes, access tokens, PKCE) and HTTP transport patterns. The AI Agent Tools guide covers tool management patterns that complement what we'll build here.
npm install @modelcontextprotocol/sdk zod express jsonwebtokenEvery code example uses TypeScript and runs standalone. We're building production patterns, not toys.
| Pattern | What You'll Build | When You Need It |
|---|---|---|
| OAuth 2.1 with PKCE | Auth middleware for MCP servers as OAuth Resource Servers | Any remote MCP deployment |
| Streamable HTTP transport | Full transport implementation with session management | Replacing SSE, serverless environments |
| MCP gateways | Proxy layer for routing, auth, and observability | Multi-server architectures |
| Dynamic tool registration | Runtime tool add/remove with client notification | Feature flags, permission-gated tools |
| MCP sampling | Server-initiated LLM completions | Tools that need to reason mid-execution |
| Multi-tenant architecture | Tenant-isolated MCP with per-tenant secrets | SaaS platforms, shared infrastructure |
| Security hardening | Input validation, token scoping, audit logging | Every production deployment |
OAuth 2.1 Authentication: MCP Servers as Resource Servers
MCP's authorization model treats every remote MCP server as an OAuth 2.0 Resource Server — the same role your REST API plays in a standard OAuth flow. Clients authenticate through an Authorization Server, receive scoped access tokens, and present those tokens on every MCP request. The spec mandates OAuth 2.1 with PKCE using the S256 challenge method, no exceptions.
This wasn't always the case. Early MCP deployments ran over stdio with no auth at all, or used static API keys passed as environment variables. The June 2025 spec update formalized the OAuth requirement, and the November 2025 release added Protected Resource Metadata (PRM) discovery — a mechanism where the MCP server advertises which Authorization Server clients should use.
Here's the discovery flow. When a client first connects to a remote MCP server, it doesn't need to know the Authorization Server upfront:
The critical security addition here is Resource Indicators (RFC 8707). When the client requests a token, it includes the MCP server's URL as the resource parameter. The Authorization Server embeds this as the token's audience claim. The MCP server then validates that the token was specifically issued for it — preventing a compromised server from replaying tokens against other services.
Here's what token validation looks like on the server side. This middleware sits in front of your MCP request handler and rejects anything without a properly scoped token:
import { IncomingMessage } from 'http';
import jwt from 'jsonwebtoken';
interface McpTokenClaims {
sub: string;
aud: string | string[];
scope: string;
iss: string;
exp: number;
tenant_id?: string;
}
const MCP_SERVER_RESOURCE = 'https://mcp.example.com';
function validateMcpToken(req: IncomingMessage): McpTokenClaims {
const authHeader = req.headers.authorization;
if (!authHeader?.startsWith('Bearer ')) {
throw new Error('Missing Bearer token');
}
const token = authHeader.slice(7);
const claims = jwt.verify(token, publicKey, {
algorithms: ['RS256'],
issuer: 'https://auth.example.com',
}) as McpTokenClaims;
// RFC 8707: verify the token was issued for THIS server
const audiences = Array.isArray(claims.aud) ? claims.aud : [claims.aud];
if (!audiences.includes(MCP_SERVER_RESOURCE)) {
throw new Error(
`Token audience mismatch: expected ${MCP_SERVER_RESOURCE}, got ${audiences.join(', ')}`
);
}
// Verify required scopes for MCP operations
const scopes = claims.scope?.split(' ') || [];
if (!scopes.includes('mcp:tools') && !scopes.includes('mcp:read')) {
throw new Error('Insufficient scope: requires mcp:tools or mcp:read');
}
return claims;
}On the client side, PKCE protects the authorization code exchange. The client generates a random code_verifier, hashes it with SHA-256 to create the code_challenge, sends the challenge during authorization, and presents the original verifier when exchanging the code for a token. This prevents intercepted authorization codes from being used by a different party:
import { randomBytes, createHash } from 'crypto';
function generatePkceChallenge() {
// Generate a cryptographically random verifier (43-128 chars)
const verifier = randomBytes(32)
.toString('base64url')
.slice(0, 64);
// S256: SHA-256 hash of verifier, base64url-encoded
const challenge = createHash('sha256')
.update(verifier)
.digest('base64url');
return { verifier, challenge, method: 'S256' as const };
}
// During authorization request
async function startMcpAuth(authServerUrl: string, mcpServerUrl: string) {
const { verifier, challenge, method } = generatePkceChallenge();
const authUrl = new URL(`${authServerUrl}/authorize`);
authUrl.searchParams.set('response_type', 'code');
authUrl.searchParams.set('client_id', CLIENT_ID);
authUrl.searchParams.set('redirect_uri', REDIRECT_URI);
authUrl.searchParams.set('scope', 'mcp:tools mcp:read');
authUrl.searchParams.set('code_challenge', challenge);
authUrl.searchParams.set('code_challenge_method', method);
// RFC 8707: bind the token to this specific MCP server
authUrl.searchParams.set('resource', mcpServerUrl);
// Store verifier for the token exchange step
return { authUrl: authUrl.toString(), verifier };
}Two things to note. First, the resource parameter ensures the resulting token is audience-bound — it can only be used against mcpServerUrl. Second, the spec requires HTTPS for all authorization endpoints. There's no development-mode exception for production deployments.
Streamable HTTP: The Transport That Replaced SSE
Streamable HTTP is the production transport for remote MCP servers, replacing the deprecated HTTP+SSE transport from the 2024-11-05 spec. It solves three problems that made SSE unreliable at scale: no stream resumption, mandatory long-lived connections, and unidirectional message delivery. Every new MCP server should use Streamable HTTP unless you're building a local tool that runs over stdio.
The core idea is simple: the server exposes a single HTTP endpoint that accepts POST requests (for client-to-server messages) and GET requests (for opening an optional SSE stream for server-initiated messages). Most interactions are plain request-response — the client POSTs a JSON-RPC message, the server responds with JSON. When the server needs to stream multiple messages (progress updates, notifications), it can upgrade a single response to SSE.
Session management is critical. After initialization, the server returns an Mcp-Session-Id header — a cryptographically secure, globally unique identifier. The client includes this header on every subsequent request, and the server uses it to route messages to the correct session state.
Here's a Streamable HTTP server implementation. This handles both POST (standard request-response) and GET (SSE stream for notifications), with proper session tracking:
import express from 'express';
import { randomUUID } from 'crypto';
interface McpSession {
id: string;
tenantId: string;
sseResponse?: express.Response;
tools: Map<string, ToolDefinition>;
createdAt: Date;
}
const sessions = new Map<string, McpSession>();
const app = express();
app.use(express.json());
// POST /mcp — handles all client-to-server JSON-RPC messages
app.post('/mcp', async (req, res) => {
const message = req.body;
// Initialize: create session, return capabilities
if (message.method === 'initialize') {
const sessionId = randomUUID();
const session: McpSession = {
id: sessionId,
tenantId: extractTenantFromToken(req),
tools: loadToolsForTenant(extractTenantFromToken(req)),
createdAt: new Date(),
};
sessions.set(sessionId, session);
res.setHeader('Mcp-Session-Id', sessionId);
return res.json({
jsonrpc: '2.0',
id: message.id,
result: {
protocolVersion: '2025-11-25',
capabilities: {
tools: { listChanged: true },
sampling: {},
},
serverInfo: { name: 'acme-mcp', version: '2.1.0' },
},
});
}
// All other requests require a valid session
const sessionId = req.headers['mcp-session-id'] as string;
const session = sessions.get(sessionId);
if (!session) {
return res.status(404).json({
jsonrpc: '2.0',
id: message.id,
error: { code: -32001, message: 'Invalid or expired session' },
});
}
// Route to handler based on method
if (message.method === 'tools/list') {
return res.json({
jsonrpc: '2.0',
id: message.id,
result: { tools: Array.from(session.tools.values()) },
});
}
if (message.method === 'tools/call') {
const tool = session.tools.get(message.params.name);
if (!tool) {
return res.status(404).json({
jsonrpc: '2.0',
id: message.id,
error: { code: -32602, message: `Unknown tool: ${message.params.name}` },
});
}
// For long-running tools, upgrade to SSE
if (tool.longRunning) {
res.setHeader('Content-Type', 'text/event-stream');
res.setHeader('Cache-Control', 'no-cache');
res.setHeader('Connection', 'keep-alive');
// Stream progress updates
await executeWithProgress(tool, message.params.arguments, (progress) => {
res.write(`data: ${JSON.stringify({
jsonrpc: '2.0',
method: 'notifications/progress',
params: { progressToken: message.id, progress: progress.percent, total: 100 },
})}\n\n`);
});
// Send final result and close
res.write(`data: ${JSON.stringify({
jsonrpc: '2.0',
id: message.id,
result: { content: [{ type: 'text', text: 'Completed' }] },
})}\n\n`);
return res.end();
}
// Standard synchronous tool execution
const result = await executeTool(tool, message.params.arguments);
return res.json({
jsonrpc: '2.0',
id: message.id,
result: { content: [{ type: 'text', text: JSON.stringify(result) }] },
});
}
});
// GET /mcp — opens SSE stream for server-initiated messages
app.get('/mcp', (req, res) => {
const sessionId = req.headers['mcp-session-id'] as string;
const session = sessions.get(sessionId);
if (!session) return res.status(404).end();
res.setHeader('Content-Type', 'text/event-stream');
res.setHeader('Cache-Control', 'no-cache');
res.setHeader('Connection', 'keep-alive');
// Store the response for server-initiated pushes
session.sseResponse = res;
req.on('close', () => {
session.sseResponse = undefined;
});
});
app.listen(3002, () => console.log('MCP Streamable HTTP server on :3002'));Why does this matter over SSE? Three reasons. First, stateless tools don't need a persistent connection — a POST returns a JSON response and the connection closes. Infrastructure (load balancers, CDNs, serverless functions) handles this natively. Second, the SSE upgrade is opt-in per request, not per session. A server can respond with JSON for fast tools and SSE for slow ones. Third, session IDs make horizontal scaling straightforward — you can route by session to a sticky backend or store session state in Redis.
The TypeScript SDK (version 1.10.0+) supports Streamable HTTP out of the box via StreamableHTTPServerTransport. If you're using the SDK, you get session management, content negotiation, and SSE upgrade for free.
MCP Gateways: The Control Plane for Tool Traffic
An MCP gateway is a reverse proxy purpose-built for MCP traffic — session-aware, JSON-RPC-native, and capable of transforming tool schemas in flight. Think of it as the layer between your AI agents and your tool infrastructure that handles auth, routing, observability, and access control in one place.
Standard API gateways (Kong, Envoy, AWS API Gateway) handle REST traffic well. But MCP is bidirectional JSON-RPC with session state, which makes standard request-response proxying insufficient. An MCP gateway understands the protocol: it can inspect tools/list responses, enforce tool-level permissions, route different tools to different backends, and inject tenant context — all transparently to the client.
Here's a gateway implementation that handles routing, auth validation, and tool-level access control. This sits between clients and your MCP server fleet:
import express from 'express';
interface GatewayRoute {
toolPrefix: string;
upstream: string;
requiredScopes: string[];
}
const routes: GatewayRoute[] = [
{ toolPrefix: 'crm_', upstream: 'http://mcp-crm:3002', requiredScopes: ['crm:read'] },
{ toolPrefix: 'payment_', upstream: 'http://mcp-payments:3003', requiredScopes: ['payments:write'] },
{ toolPrefix: 'internal_', upstream: 'http://mcp-internal:3004', requiredScopes: ['admin'] },
];
const gateway = express();
gateway.use(express.json());
gateway.post('/mcp', async (req, res) => {
const message = req.body;
// Validate OAuth token on every request
const claims = validateMcpToken(req);
// tools/call: route based on tool name prefix
if (message.method === 'tools/call') {
const toolName = message.params?.name;
const route = routes.find((r) => toolName?.startsWith(r.toolPrefix));
if (!route) {
return res.status(404).json({
jsonrpc: '2.0',
id: message.id,
error: { code: -32602, message: `No route for tool: ${toolName}` },
});
}
// Check tool-level scopes
const tokenScopes = claims.scope.split(' ');
const hasAccess = route.requiredScopes.every((s) => tokenScopes.includes(s));
if (!hasAccess) {
logAudit('tool_access_denied', { tool: toolName, tenant: claims.tenant_id });
return res.status(403).json({
jsonrpc: '2.0',
id: message.id,
error: { code: -32603, message: `Insufficient scope for ${toolName}` },
});
}
// Forward to upstream MCP server
logAudit('tool_call', { tool: toolName, tenant: claims.tenant_id });
const upstream = await fetch(`${route.upstream}/mcp`, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'X-Tenant-Id': claims.tenant_id || '',
'X-Gateway-Token': createInternalToken(claims),
},
body: JSON.stringify(message),
});
const result = await upstream.json();
return res.json(result);
}
// tools/list: aggregate from all upstreams the tenant has access to
if (message.method === 'tools/list') {
const tokenScopes = claims.scope.split(' ');
const accessible = routes.filter((r) =>
r.requiredScopes.every((s) => tokenScopes.includes(s))
);
const allTools = await Promise.all(
accessible.map(async (route) => {
const resp = await fetch(`${route.upstream}/mcp`, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'X-Tenant-Id': claims.tenant_id || '',
},
body: JSON.stringify({ jsonrpc: '2.0', id: 'list', method: 'tools/list', params: {} }),
});
const data = await resp.json();
return data.result?.tools || [];
})
);
return res.json({
jsonrpc: '2.0',
id: message.id,
result: { tools: allTools.flat() },
});
}
});The gateway gives you a single chokepoint for every agent-tool interaction. Need to rate-limit a specific tenant? Add it here. Need an audit trail of every tool invocation? It's already in the proxy. Need to swap a backend MCP server without touching client code? Change the route.
Production MCP gateways like Peta MCP Suite and IBM's ContextForge add capabilities beyond basic proxying: intelligent caching (skip re-executing deterministic tools), cost tracking per tenant, and canary deployments where you route a percentage of traffic to a new tool version. If you're managing MCP servers at scale, the gateway becomes essential infrastructure rather than an optional nicety.
Dynamic Tool Registration: Tools That Appear and Disappear at Runtime
MCP servers don't have to declare a static set of tools at startup. The protocol supports runtime tool registration — adding, removing, and modifying tools while sessions are active — through the notifications/tools/list_changed notification. When a server sends this notification, connected clients call tools/list to refresh their tool inventory without reconnecting.
This isn't just a convenience feature. It enables patterns that static tool sets can't support:
- Permission-gated tools: A user authenticates, and tools they're authorized for appear. Their session expires, and sensitive tools disappear.
- Feature flags: Roll out a new tool to 5% of sessions, monitor error rates, then gradually increase exposure.
- Context-dependent availability: Show database tools only when a database connection is healthy. Hide them during maintenance.
- Progressive disclosure: Start with basic tools, reveal advanced ones after the agent demonstrates competence (yes, really — some teams gate tools behind successful execution of simpler ones).
Here's a dynamic tool registry that supports runtime registration and notifies all connected clients when the tool set changes:
import { EventEmitter } from 'events';
import { z } from 'zod';
interface DynamicTool {
name: string;
description: string;
inputSchema: z.ZodType;
handler: (args: unknown) => Promise<unknown>;
requiredPermissions: string[];
featureFlag?: string;
healthCheck?: () => Promise<boolean>;
}
class DynamicToolRegistry extends EventEmitter {
private tools = new Map<string, DynamicTool>();
private toolHealth = new Map<string, boolean>();
register(tool: DynamicTool): void {
this.tools.set(tool.name, tool);
this.toolHealth.set(tool.name, true);
this.emit('tools_changed');
console.log(`Tool registered: ${tool.name}`);
}
unregister(toolName: string): void {
if (this.tools.delete(toolName)) {
this.toolHealth.delete(toolName);
this.emit('tools_changed');
console.log(`Tool unregistered: ${toolName}`);
}
}
// Get tools visible to a specific session based on permissions and health
getVisibleTools(
userPermissions: string[],
activeFeatureFlags: string[]
): DynamicTool[] {
return Array.from(this.tools.values()).filter((tool) => {
// Permission check
const hasPermission = tool.requiredPermissions.every((p) =>
userPermissions.includes(p)
);
if (!hasPermission) return false;
// Feature flag check
if (tool.featureFlag && !activeFeatureFlags.includes(tool.featureFlag)) {
return false;
}
// Health check
if (this.toolHealth.get(tool.name) === false) return false;
return true;
});
}
// Periodic health monitoring
async runHealthChecks(): Promise<void> {
for (const [name, tool] of this.tools) {
if (tool.healthCheck) {
try {
const healthy = await tool.healthCheck();
const wasHealthy = this.toolHealth.get(name);
this.toolHealth.set(name, healthy);
// Only notify if health status changed
if (healthy !== wasHealthy) {
this.emit('tools_changed');
console.log(`Tool ${name} health changed: ${healthy}`);
}
} catch {
this.toolHealth.set(name, false);
this.emit('tools_changed');
}
}
}
}
}
// Wire it up to session notifications
const registry = new DynamicToolRegistry();
registry.on('tools_changed', () => {
// Notify all active sessions
for (const session of sessions.values()) {
if (session.sseResponse) {
session.sseResponse.write(`data: ${JSON.stringify({
jsonrpc: '2.0',
method: 'notifications/tools/list_changed',
})}\n\n`);
}
}
});
// Register a tool at runtime
registry.register({
name: 'stripe_refund',
description: 'Process a refund for a Stripe charge',
inputSchema: z.object({
chargeId: z.string().describe('Stripe charge ID'),
amount: z.number().optional().describe('Partial refund amount in cents'),
reason: z.enum(['duplicate', 'fraudulent', 'requested_by_customer']),
}),
handler: async (args) => { /* Stripe API call */ },
requiredPermissions: ['payments:refund'],
featureFlag: 'stripe-refund-v2',
healthCheck: async () => {
const resp = await fetch('https://api.stripe.com/v1/balance', {
headers: { Authorization: `Bearer ${process.env.STRIPE_KEY}` },
});
return resp.ok;
},
});
// Health checks every 30 seconds
setInterval(() => registry.runHealthChecks(), 30_000);The health check pattern deserves emphasis. When an external service goes down, tools that depend on it should disappear from the agent's context rather than fail at execution time. An agent that doesn't know a tool exists won't try to use it. An agent that sees a tool and can't use it wastes tokens, confuses users, and generates error logs.
MCP Sampling: When the Server Needs to Think
Sampling inverts the normal direction of LLM inference. Instead of the client sending prompts to the server, the MCP server sends a sampling/createMessage request back to the client, asking it to run an LLM completion. The client mediates the request — optionally showing it to the user for approval — then returns the LLM's response to the server. The server never gets direct access to the model.
Why would a tool need LLM access? Consider a data analysis tool that queries a database, gets 500 rows, and needs to summarize them before returning results. Or a content moderation tool that checks whether generated text violates policy. Or a workflow orchestrator that needs to classify an intermediate result before deciding which step to execute next. These all require reasoning during tool execution, not just before or after.
Here's a tool that uses sampling to classify and summarize query results. The server requests an LLM completion mid-execution, uses the response to structure its output, and returns the final result:
import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
import { z } from 'zod';
const server = new McpServer({
name: 'analytics-server',
version: '1.0.0',
capabilities: { sampling: {} }, // Advertise sampling support
});
server.tool(
'analyze_support_tickets',
'Analyze recent support tickets and return categorized insights',
{
timeRange: z.enum(['24h', '7d', '30d']).describe('Time range for analysis'),
category: z.string().optional().describe('Filter by ticket category'),
},
async ({ timeRange, category }, { sendRequest }) => {
// Step 1: Query the database
const tickets = await queryTickets({ timeRange, category });
if (tickets.length === 0) {
return {
content: [{ type: 'text', text: 'No tickets found for the specified criteria.' }],
};
}
// Step 2: Use sampling to classify and summarize the tickets
// The server asks the CLIENT to run this through an LLM
const samplingResult = await sendRequest({
method: 'sampling/createMessage',
params: {
messages: [
{
role: 'user',
content: {
type: 'text',
text: `Analyze these ${tickets.length} support tickets and provide:
1. Top 3 issue categories with ticket counts
2. Sentiment breakdown (positive/neutral/negative)
3. Average resolution time per category
4. Any emerging patterns or spikes
Raw ticket data:
${JSON.stringify(tickets.slice(0, 50), null, 2)}
${tickets.length > 50 ? `\n... and ${tickets.length - 50} more tickets` : ''}`,
},
},
],
systemPrompt: 'You are a support analytics assistant. Return structured analysis in markdown format. Be specific with numbers. Do not speculate beyond the data provided.',
maxTokens: 1000,
modelPreferences: {
hints: [{ name: 'claude-sonnet-4-20250514' }],
intelligencePriority: 0.7,
speedPriority: 0.3,
},
},
});
// Step 3: Combine raw stats with LLM analysis
const stats = computeBasicStats(tickets);
return {
content: [
{
type: 'text',
text: `## Support Ticket Analysis (${timeRange})\n\n**Total tickets:** ${tickets.length}\n**Avg response time:** ${stats.avgResponseTime}h\n\n${samplingResult.content.text}`,
},
],
};
}
);Sampling has a deliberate constraint: the client controls everything. It can modify the prompt before sending it to the model, filter the response before returning it to the server, or reject the request entirely. This human-in-the-loop design means the server can't use sampling to exfiltrate data through crafted prompts — at least not without the user or client noticing.
The modelPreferences field lets the server suggest — not mandate — which model to use. The intelligencePriority and speedPriority values (0 to 1) hint at the tradeoff, but the client makes the final decision. If your tool needs fast classification, set speed high. If it needs nuanced analysis, favor intelligence. The client maps these preferences to whatever models it has available.
One practical concern: sampling adds latency. Every sampling/createMessage round trip includes client-side validation, LLM inference, and possibly user approval. For tools that need sub-second responses, sampling isn't the right pattern. Use it for analytical tools, content generation, and multi-step workflows where the additional latency is worth the reasoning capability.
Multi-Tenant MCP Architecture
Multi-tenancy in MCP means multiple organizations sharing infrastructure while maintaining strict isolation of data, tools, secrets, and access controls. Every production SaaS that exposes MCP servers faces this challenge — you can't spin up a dedicated MCP cluster for each customer, but you also can't let Tenant A's database credentials leak into Tenant B's tool execution.
The architecture has three isolation layers: authentication (who is this?), authorization (what can they access?), and runtime isolation (how do we prevent cross-contamination?).
The key design decision is whether to use shared servers with tenant context injection or dedicated server instances per tenant. Shared servers are cheaper but require careful isolation in application code. Dedicated instances are safer but more expensive and harder to manage at scale. Most production systems use a hybrid: shared servers for read-only tools, dedicated instances for tools that execute write operations or handle sensitive data.
Here's a tenant-aware secret resolver. Secrets are never embedded in tool definitions — they're resolved at execution time based on the authenticated tenant:
interface TenantSecretStore {
resolve(tenantId: string, secretName: string): Promise<string | null>;
listAvailable(tenantId: string): Promise<string[]>;
}
class VaultSecretStore implements TenantSecretStore {
async resolve(tenantId: string, secretName: string): Promise<string | null> {
// Secrets are namespaced by tenant in the vault
const path = `mcp/tenants/${tenantId}/secrets/${secretName}`;
try {
const response = await fetch(`${VAULT_URL}/v1/${path}`, {
headers: { 'X-Vault-Token': VAULT_TOKEN },
});
if (!response.ok) return null;
const data = await response.json();
return data.data?.value || null;
} catch {
return null;
}
}
async listAvailable(tenantId: string): Promise<string[]> {
const path = `mcp/tenants/${tenantId}/secrets`;
const response = await fetch(`${VAULT_URL}/v1/${path}?list=true`, {
headers: { 'X-Vault-Token': VAULT_TOKEN },
});
const data = await response.json();
return data.data?.keys || [];
}
}
// In tool execution, secrets resolve based on the authenticated tenant
async function executeToolWithTenantSecrets(
tool: DynamicTool,
args: unknown,
tenantId: string,
secretStore: TenantSecretStore
): Promise<unknown> {
// Resolve all secrets this tool needs for this tenant
const resolvedSecrets: Record<string, string> = {};
for (const secretName of tool.requiredSecrets || []) {
const value = await secretStore.resolve(tenantId, secretName);
if (!value) {
throw new Error(
`Secret '${secretName}' not configured for tenant ${tenantId}`
);
}
resolvedSecrets[secretName] = value;
}
// Execute with tenant-scoped secrets injected
return tool.handler(args, { secrets: resolvedSecrets, tenantId });
}Per-tenant rate limiting prevents a single noisy tenant from degrading service for others. Track tool invocations by tenant and enforce both request-per-second and daily quota limits:
interface RateLimitConfig {
requestsPerSecond: number;
dailyQuota: number;
}
const tenantLimits = new Map<string, RateLimitConfig>();
const tenantUsage = new Map<string, { second: number; daily: number; lastReset: number }>();
function checkRateLimit(tenantId: string): { allowed: boolean; retryAfter?: number } {
const limits = tenantLimits.get(tenantId) || { requestsPerSecond: 10, dailyQuota: 10000 };
const usage = tenantUsage.get(tenantId) || { second: 0, daily: 0, lastReset: Date.now() };
const now = Date.now();
// Reset second counter
if (now - usage.lastReset > 1000) {
usage.second = 0;
usage.lastReset = now;
}
// Check per-second limit
if (usage.second >= limits.requestsPerSecond) {
return { allowed: false, retryAfter: 1 };
}
// Check daily quota
if (usage.daily >= limits.dailyQuota) {
return { allowed: false, retryAfter: 86400 };
}
usage.second++;
usage.daily++;
tenantUsage.set(tenantId, usage);
return { allowed: true };
}If you're building a platform where agents use tools and memory across multiple customer workspaces, multi-tenant isolation isn't optional — it's table stakes. The alternative is a security incident that affects every customer simultaneously.
Security Hardening for Production MCP
The 2025 Astrix security report found that 43% of early MCP servers had command injection vulnerabilities, and over half relied on insecure static secrets rather than proper OAuth flows. Shipping an MCP server to production without security hardening is asking for an incident. Here's the defense-in-depth approach that addresses the real attack surface.
Input Validation
Every tool parameter must be validated against its JSON Schema before execution. The MCP spec defines input schemas using JSON Schema, and Zod gives you runtime validation in TypeScript. Never trust that the client (or the LLM) sent valid data:
import { z } from 'zod';
// Define strict schemas — not permissive "any" objects
const refundSchema = z.object({
chargeId: z.string()
.regex(/^ch_[a-zA-Z0-9]{24}$/, 'Invalid Stripe charge ID format'),
amount: z.number()
.int()
.positive()
.max(999999, 'Refund amount exceeds maximum')
.optional(),
reason: z.enum(['duplicate', 'fraudulent', 'requested_by_customer']),
});
// Validate BEFORE any external call
function validateToolInput<T>(schema: z.ZodType<T>, input: unknown): T {
const result = schema.safeParse(input);
if (!result.success) {
const errors = result.error.issues
.map((i) => `${i.path.join('.')}: ${i.message}`)
.join('; ');
throw new Error(`Input validation failed: ${errors}`);
}
return result.data;
}Prompt Injection Defense
Tool descriptions are part of the LLM's context, which makes them a vector for prompt injection. A malicious MCP server (or a compromised tool registry) can inject instructions through tool descriptions that override the agent's behavior. Validate tool descriptions from external sources and sanitize before including them in context:
function sanitizeToolDescription(description: string): string {
// Strip anything that looks like prompt injection
const suspicious = [
/ignore previous instructions/i,
/you are now/i,
/system:\s/i,
/forget everything/i,
/<\/?system>/i,
];
for (const pattern of suspicious) {
if (pattern.test(description)) {
console.warn(`Suspicious tool description blocked: ${description.slice(0, 100)}`);
return 'Tool description unavailable — flagged for review.';
}
}
// Truncate excessively long descriptions
if (description.length > 500) {
return description.slice(0, 497) + '...';
}
return description;
}Audit Logging
Every tool invocation should produce a structured audit log entry. This isn't just for compliance — it's how you debug "why did the agent transfer $50,000 to an unknown account?" at 2 AM:
interface AuditEntry {
timestamp: string;
sessionId: string;
tenantId: string;
userId: string;
toolName: string;
inputHash: string; // SHA-256 of input, not the raw input (may contain PII)
result: 'success' | 'error' | 'denied';
durationMs: number;
errorMessage?: string;
}
function logAuditEntry(entry: AuditEntry): void {
// Structured JSON to stdout — your log aggregator picks it up
console.log(JSON.stringify({
...entry,
level: entry.result === 'error' ? 'error' : 'info',
service: 'mcp-server',
type: 'tool_audit',
}));
}Token Scope Enforcement
Don't just validate that a token exists — verify that its scopes match the specific operation. A token with mcp:read shouldn't be able to call tools that write data:
const toolScopeMap: Record<string, string[]> = {
'crm_search_contacts': ['crm:read'],
'crm_update_contact': ['crm:read', 'crm:write'],
'payment_refund': ['payments:write', 'payments:refund'],
'analytics_query': ['analytics:read'],
};
function enforceToolScopes(toolName: string, tokenScopes: string[]): void {
const required = toolScopeMap[toolName];
if (!required) {
throw new Error(`No scope mapping for tool: ${toolName}`);
}
const missing = required.filter((s) => !tokenScopes.includes(s));
if (missing.length > 0) {
throw new Error(
`Missing required scopes for ${toolName}: ${missing.join(', ')}`
);
}
}Container and Network Isolation
Even with input validation and token scoping, defense-in-depth means assuming your application code has bugs. Run MCP servers in isolated containers with read-only filesystems, non-root users, and strict resource limits. Network policies should prevent MCP servers from reaching anything except the specific services they need — a CRM tool server shouldn't be able to connect to your payment database, even if a code injection makes it try.
For high-security environments, consider running tool execution in ephemeral sandboxes (Firecracker microVMs or gVisor containers) that are destroyed after each invocation. The overhead is measurable — expect 50-200ms of cold-start latency — but the isolation guarantee is worth it for tools that process untrusted input or execute user-provided code.
These layers stack. A request must pass token validation, then audience verification, then scope enforcement, then input validation, then container-level isolation, then execution with tenant-scoped secrets — and every step gets logged. Skip any one layer and you've created an attack surface.
For teams building agent infrastructure with monitoring and observability, the audit log becomes your primary debugging tool. When an agent does something unexpected, the audit trail tells you exactly which tools were called, with what inputs, and in what order.
The Spec Is Moving Fast: What's Next
The November 2025 specification (2025-11-25) represents the biggest evolution since MCP launched. Beyond the OAuth and transport improvements we've already covered, three additions are worth tracking.
Tasks turn any MCP request into an asynchronous operation. A server can return a task handle instead of waiting for completion, and the client polls for progress and results via tasks/get and tasks/result. This unlocks workloads that would otherwise time out — document processing, batch analytics, multi-step approval workflows. Tasks support states like working, input_required, completed, failed, and cancelled, giving clients fine-grained control over long-running operations.
Elicitation lets an MCP server pause mid-execution and request structured input from the user through the client. Instead of the tool failing because it needs a clarification ("Which subscription do you want to cancel — Pro or Enterprise?"), it sends an ElicitationRequest with a schema describing what it needs. The client renders a form, the user responds, and execution continues. The spec explicitly prohibits using elicitation for sensitive data like credentials — that's what URL-mode elicitation is for, redirecting the user to a secure external page.
Structured tool outputs let tool definitions declare their output schema, not just their input schema. Clients and LLMs can now know what shape the response will have before calling the tool, enabling better planning, type-safe tool chains, and reduced hallucination when the model processes results.
Structured outputs also have a direct impact on agent evaluation. When a tool declares its output schema upfront, you can write deterministic assertions against the response shape instead of relying on fuzzy text matching. Combined with Tasks (which give you discrete status transitions to test) and sampling (which introduces an LLM call you can mock), these primitives make MCP tool chains far more testable than the fire-and-forget function calls they replace.
These capabilities are experimental in the current spec, meaning the API surface may change. But the direction is clear: MCP is evolving from a synchronous tool-calling protocol into a full-fledged agent-infrastructure standard. If you're building tools that need async processing, user interaction, or structured data pipelines, start experimenting now.
Putting It All Together: A Production Checklist
If you've been building MCP servers with stdio transport and no auth, here's the path to production readiness. Not every project needs every pattern — but you should make a conscious decision about each one.
- OAuth 2.1 with PKCE for all remote MCP servers
- Streamable HTTP transport (replace SSE)
- Token audience validation via Resource Indicators (RFC 8707)
- Input validation with Zod schemas on every tool parameter
- Audit logging for all tool invocations
- Session management with cryptographic session IDs
- Per-tenant secret resolution (never embed secrets in tool configs)
- Rate limiting per tenant and per tool
- Health checks that remove unhealthy tools from client context
- Gateway layer for routing, auth, and observability
- Dynamic tool registration with list_changed notifications
- Prompt injection defense for tool descriptions from external sources
Start with auth and transport — they're the foundation everything else builds on. Add the gateway when you have more than three MCP servers or more than one tenant. Add sampling when your tools need to reason. Add dynamic registration when your tool set is no longer static.
The protocol's maturity curve is steep. A year ago, MCP was a local development protocol with no auth and stdio-only transport. Today, it has OAuth 2.1, Streamable HTTP, asynchronous tasks, sampling, elicitation, and structured outputs — with major enterprises running it in production. Teams building agent tool infrastructure that integrates with the protocol are already seeing the compounding benefits: one integration point, universal client compatibility, and a security model that auditors actually understand.
The basics get you connected. These patterns keep you running.
Frequently Asked Questions
How do I migrate an existing SSE-based MCP server to Streamable HTTP?
Replace your SSE transport with StreamableHTTPServerTransport from the TypeScript SDK (v1.10.0+). The main change is that your server now handles POST requests for individual JSON-RPC messages instead of maintaining a persistent SSE connection. Existing tools don't need modification — only the transport layer changes. Keep your SSE endpoint running temporarily for clients that haven't upgraded.
Can I use MCP sampling without human approval for every request?
Yes. The spec says clients may show sampling requests to users for approval, not that they must. In automated pipelines, the client can auto-approve sampling requests that match certain criteria (trusted servers, specific tools, low-risk operations). But you should log every sampling request for audit purposes regardless.
What's the performance overhead of running an MCP gateway?
Minimal for a properly implemented gateway. The JSON-RPC message parsing adds microseconds. The real overhead is network hops — each proxied request adds one round trip between the client and the upstream server. For most tool invocations that involve external API calls (databases, SaaS APIs), the gateway latency is noise compared to the tool execution time. Use connection pooling and keep-alive to minimize TCP overhead.
Do I need separate OAuth scopes for each tool?
Not necessarily. Group tools by capability domain (read, write, admin) and assign scopes at that level. Fine-grained per-tool scopes become unmanageable past 20-30 tools. The scope hierarchy mcp:read < mcp:tools < mcp:admin covers most use cases, with domain-specific scopes (crm:read, payments:write) for tools that access sensitive systems.
How do MCP Tasks differ from just making a synchronous tool call that takes a long time?
Tasks give the client control over long-running operations. Instead of blocking on a single HTTP request that might time out, the client gets a task handle immediately and can check progress, cancel the operation, or handle input_required states where the task needs additional information. This is essential for operations that take minutes (batch processing, model training) rather than seconds.
Ship MCP servers that handle production traffic
Chanl's MCP runtime manages auth, multi-tenant routing, and tool lifecycle so you focus on building tools — not infrastructure.
Explore MCP features- MCP Authorization Specification (2025-11-25)
- MCP Transports Specification — Streamable HTTP
- MCP Sampling Specification
- MCP Tasks Specification (2025-11-25)
- MCP Elicitation Specification
- Auth0 — MCP Spec Updates from June 2025
- WorkOS — MCP 2025-11-25 Spec Update: Async Tasks, Better OAuth
- Auth0 — Why MCP Move Away from SSE Simplifies Security
- fka.dev — Why MCP Deprecated SSE and Went with Streamable HTTP
- Stytch — MCP Authentication and Authorization Implementation Guide
- Astrix Security — State of MCP Server Security 2025
- Stack Overflow — Authentication and Authorization in MCP
- Speakeasy — Dynamic Tool Discovery in MCP
- GitHub Blog — Building Smarter Interactions with MCP Elicitation
- Palo Alto Unit 42 — New Prompt Injection Attack Vectors Through MCP Sampling
- Composio — MCP Gateways Guide for AI Agent Architecture 2026
- Zuplo — The State of MCP: Adoption, Security & Production Readiness
- Oso — Authorization for MCP: OAuth 2.1, PRMs, and Best Practices
Engineering Lead
Building the platform for AI agents at Chanl — tools, testing, and observability for customer experience.
Learn Agentic AI
One lesson a week — practical techniques for building, testing, and shipping AI agents. From prompt engineering to production monitoring. Learn by doing.



