Last week I asked Claude to check the weather in three cities and book the cheapest flight. Two API calls, zero code on my part. The model picked the right functions, filled in the arguments, compared the results, and booked the $247 option to Lisbon. That's function calling.
Before OpenAI shipped this primitive in June 2023, LLMs could only generate text. They could describe how to check the weather. They could write code that checks the weather. But they couldn't actually check the weather. Function calling changed that. It gave models a structured way to say: "I need to run this function with these arguments, and I'll use the result to finish my response."
This article builds a multi-tool agent from scratch. We start with the simplest possible tool call, then layer on parallel execution, multi-provider support, validation, and production hardening. Every code block runs. Every example ships in both TypeScript and Python.
| What You'll Build | What You'll Learn |
|---|---|
| Single tool call (weather lookup) | The request/response loop that powers all tool use |
| Same agent on three providers | OpenAI, Anthropic, and Google schema differences |
| Parallel tool execution | Concurrent calls for independent operations |
| Validated tool calls | Zod and Pydantic schemas for reliable arguments |
| Multi-tool research agent | Full working agent with search, read, and summarize |
What Is Function Calling?
The name is misleading. The model doesn't call anything.
Function calling is a structured output mode where the model returns JSON describing a function it wants your code to execute, instead of generating a text response. It never runs code. It never touches your database. It outputs a function name and arguments, and your application decides what to do with them.
Think of it as a request: the model says "I need the result of get_weather({ city: 'Paris' })," and your code executes that function, then sends the result back. The model uses that result to form its final answer.
Here's the full loop:
This loop is the core of every AI agent. Whether you're building a customer service bot that looks up orders, a coding assistant that reads files, or a research agent that searches the web, it all reduces to this same cycle: model requests a function, your code runs it, model gets the result.
Let's see the simplest possible implementation. This example defines one tool (get the current weather) and handles the model's function call request.
import OpenAI from "openai";
const client = new OpenAI();
// Define the tool
const tools: OpenAI.Responses.Tool[] = [
{
type: "function",
name: "get_weather",
description: "Get current weather for a city",
parameters: {
type: "object",
properties: {
city: { type: "string", description: "City name" },
},
required: ["city"],
additionalProperties: false,
},
strict: true,
},
];
// Your actual function
function getWeather(city: string) {
const data: Record<string, { temp: number; condition: string }> = {
paris: { temp: 18, condition: "cloudy" },
tokyo: { temp: 22, condition: "sunny" },
london: { temp: 14, condition: "rainy" },
};
return data[city.toLowerCase()] ?? { temp: 20, condition: "unknown" };
}
// Send request with tools
const response = await client.responses.create({
model: "gpt-4.1-nano",
input: [{ role: "user", content: "What's the weather in Paris?" }],
tools,
});
// Handle the tool call
const toolCall = response.output.find((o) => o.type === "function_call");
if (toolCall && toolCall.type === "function_call") {
const args = JSON.parse(toolCall.arguments);
const result = getWeather(args.city);
// Send the result back to the model
const finalResponse = await client.responses.create({
model: "gpt-4.1-nano",
input: [
{ role: "user", content: "What's the weather in Paris?" },
toolCall,
{
type: "function_call_output",
call_id: toolCall.call_id,
output: JSON.stringify(result),
},
],
tools,
});
// finalResponse.output_text: "It's 18°C and cloudy in Paris."
console.log(finalResponse.output_text);
}from openai import OpenAI
client = OpenAI()
# Define the tool
tools = [
{
"type": "function",
"name": "get_weather",
"description": "Get current weather for a city",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "City name"},
},
"required": ["city"],
"additionalProperties": False,
},
"strict": True,
}
]
# Your actual function
def get_weather(city: str) -> dict:
data = {
"paris": {"temp": 18, "condition": "cloudy"},
"tokyo": {"temp": 22, "condition": "sunny"},
"london": {"temp": 14, "condition": "rainy"},
}
return data.get(city.lower(), {"temp": 20, "condition": "unknown"})
# Send request with tools
response = client.responses.create(
model="gpt-4.1-nano",
input=[{"role": "user", "content": "What's the weather in Paris?"}],
tools=tools,
)
# Handle the tool call
tool_call = next(
(o for o in response.output if o.type == "function_call"), None
)
if tool_call:
import json
args = json.loads(tool_call.arguments)
result = get_weather(args["city"])
# Send the result back to the model
final_response = client.responses.create(
model="gpt-4.1-nano",
input=[
{"role": "user", "content": "What's the weather in Paris?"},
tool_call,
{
"type": "function_call_output",
"call_id": tool_call.call_id,
"output": json.dumps(result),
},
],
tools=tools,
)
# "It's 18°C and cloudy in Paris."
print(final_response.output_text)Notice the two-step process: you send the message with tool definitions, the model responds with a function call (not text), you execute the function and send the result back, and then the model generates a text response. This is the fundamental pattern. Everything else is built on top of it.
How Does the Model See Your Tools?
Here's a detail that surprises most developers: the model never sees your function implementations. When you pass tool definitions to an API, the provider injects them into the model's context as structured schema descriptions. The model sees JSON Schema definitions that tell it what tools exist, what arguments they accept, and what they do.
Here's what the model's actual context looks like when you register a weather tool. This isn't pseudocode. It's a simplified version of what gets injected before your first user message.
// What you send to the API:
const tools = [
{
type: "function",
name: "get_weather",
description: "Get current weather for a city",
parameters: {
type: "object",
properties: {
city: { type: "string", description: "City name" },
},
required: ["city"],
additionalProperties: false,
},
strict: true,
},
];
// What the model sees in its context (conceptually):
// System: You have access to the following tools:
//
// get_weather: Get current weather for a city
// Parameters: { city: string (required) }
//
// When you need to use a tool, respond with a function_call
// containing the tool name and arguments as JSON.This has a direct cost implication. Every tool definition consumes tokens from your context window. A tool with 5 parameters and detailed descriptions might use 100-200 tokens. Register 20 tools and you've spent 2,000-4,000 tokens before the user says anything.
The trade-off matters. More tools give the model more capabilities, but each one adds token overhead and increases the chance of the model selecting the wrong tool. If you're building agents with dozens of tools, you'll want strategies for dynamic tool loading, where you only inject the relevant tools based on the conversation context. Anthropic's tool search feature and the broader MCP protocol both address this problem. The fragmentation across providers is exactly why standards like MCP matter. Our article on why every provider invented their own tool format digs into this further.
There's also the distinction between function calling and JSON mode. JSON mode forces the model to output valid JSON, but doesn't constrain the shape. Function calling constrains both: the output must be valid JSON that matches a specific schema. OpenAI's strict: true mode takes this further by guaranteeing schema compliance at the model level, not just validating after the fact.
# JSON mode: model outputs any valid JSON
response = client.responses.create(
model="gpt-4.1-nano",
input=[{"role": "user", "content": "Describe the weather as JSON"}],
text={"format": {"type": "json_object"}},
)
# Could return: {"weather": "nice"} or {"temp": 72, "unit": "F"} - any shape
# Function calling: model outputs JSON matching YOUR schema
response = client.responses.create(
model="gpt-4.1-nano",
input=[{"role": "user", "content": "What's the weather in Paris?"}],
tools=tools, # Constrained to get_weather({ city: string })
)
# Always returns: function_call with { "city": "Paris" }This constraint is what makes function calling useful for agents. Without it, you'd need to parse arbitrary JSON and hope it matches what your code expects. With strict schemas, you get guarantees.
How Do OpenAI, Anthropic, and Google Differ?
All three major providers support function calling. The core concept is identical: define tools, model requests calls, your code executes them. The syntax is where they diverge, and the divergence goes deeper than you'd expect.
Here's a comparison:
| Feature | OpenAI | Anthropic | |
|---|---|---|---|
| API | Responses API | Messages API | GenerativeAI SDK |
| Tool definition | tools[] with JSON Schema | tools[] with input_schema | FunctionDeclaration[] in config |
| Call format | function_call in output | tool_use content block | functionCall in response parts |
| Result format | function_call_output | tool_result content block | functionResponse in parts |
| Parallel calls | Yes (parallel_tool_calls) | Yes (multiple tool_use blocks) | Yes (multiple functionCall parts) |
| Strict mode | strict: true | Not available | Not available |
| Tool choice | tool_choice: "auto" / "required" / { name } | tool_choice: { type: "auto" / "any" / "tool" } | function_calling_config: { mode } |
Let's build the same agent across all three. Two tools: a calculator and a weather lookup. Same functionality, three different syntaxes.
OpenAI (Responses API)
OpenAI uses the Responses API with a tools array. Each tool has a type, name, description, and parameters object following JSON Schema. The model returns function_call items in the output array.
import OpenAI from "openai";
const client = new OpenAI();
const tools: OpenAI.Responses.Tool[] = [
{
type: "function",
name: "calculate",
description: "Evaluate a math expression",
parameters: {
type: "object",
properties: {
expression: { type: "string", description: "Math expression like '2 + 2'" },
},
required: ["expression"],
additionalProperties: false,
},
strict: true,
},
{
type: "function",
name: "get_weather",
description: "Get current weather for a city",
parameters: {
type: "object",
properties: {
city: { type: "string", description: "City name" },
},
required: ["city"],
additionalProperties: false,
},
strict: true,
},
];
function handleToolCall(name: string, args: Record<string, string>): string {
if (name === "calculate") {
try {
return JSON.stringify({ result: Function(`return ${args.expression}`)() });
} catch {
return JSON.stringify({ error: "Invalid expression" });
}
}
if (name === "get_weather") {
return JSON.stringify({ temp: 18, condition: "cloudy", city: args.city });
}
return JSON.stringify({ error: "Unknown tool" });
}
async function chat(userMessage: string) {
let input: OpenAI.Responses.ResponseInput = [
{ role: "user", content: userMessage },
];
// Loop until the model produces text (not tool calls)
while (true) {
const response = await client.responses.create({
model: "gpt-4.1-nano",
input,
tools,
});
const toolCalls = response.output.filter((o) => o.type === "function_call");
if (toolCalls.length === 0) {
return response.output_text;
}
// Execute all tool calls and append results
for (const call of toolCalls) {
if (call.type === "function_call") {
const args = JSON.parse(call.arguments);
const result = handleToolCall(call.name, args);
input = [
...input,
call,
{ type: "function_call_output", call_id: call.call_id, output: result },
];
}
}
}
}
console.log(await chat("What's 247 * 38, and what's the weather in Tokyo?"));import json
from openai import OpenAI
client = OpenAI()
tools = [
{
"type": "function",
"name": "calculate",
"description": "Evaluate a math expression",
"parameters": {
"type": "object",
"properties": {
"expression": {
"type": "string",
"description": "Math expression like '2 + 2'",
},
},
"required": ["expression"],
"additionalProperties": False,
},
"strict": True,
},
{
"type": "function",
"name": "get_weather",
"description": "Get current weather for a city",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "City name"},
},
"required": ["city"],
"additionalProperties": False,
},
"strict": True,
},
]
def handle_tool_call(name: str, args: dict) -> str:
if name == "calculate":
try:
# Warning: eval is unsafe in production. Use a safe math parser.
result = eval(args["expression"])
return json.dumps({"result": result})
except Exception:
return json.dumps({"error": "Invalid expression"})
if name == "get_weather":
return json.dumps({"temp": 18, "condition": "cloudy", "city": args["city"]})
return json.dumps({"error": "Unknown tool"})
def chat(user_message: str) -> str:
input_messages = [{"role": "user", "content": user_message}]
while True:
response = client.responses.create(
model="gpt-4.1-nano",
input=input_messages,
tools=tools,
)
tool_calls = [o for o in response.output if o.type == "function_call"]
if not tool_calls:
return response.output_text
for call in tool_calls:
args = json.loads(call.arguments)
result = handle_tool_call(call.name, args)
input_messages.append(call)
input_messages.append(
{
"type": "function_call_output",
"call_id": call.call_id,
"output": result,
}
)
print(chat("What's 247 * 38, and what's the weather in Tokyo?"))Anthropic (Messages API)
Anthropic's approach is structurally different. Tools are defined with an input_schema field instead of parameters. The model returns tool_use content blocks, and you send results back as tool_result blocks with the matching tool_use_id.
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();
const tools: Anthropic.Messages.Tool[] = [
{
name: "calculate",
description: "Evaluate a math expression",
input_schema: {
type: "object" as const,
properties: {
expression: { type: "string", description: "Math expression like '2 + 2'" },
},
required: ["expression"],
},
},
{
name: "get_weather",
description: "Get current weather for a city",
input_schema: {
type: "object" as const,
properties: {
city: { type: "string", description: "City name" },
},
required: ["city"],
},
},
];
function handleToolCall(name: string, input: Record<string, string>): string {
if (name === "calculate") {
try {
return JSON.stringify({ result: Function(`return ${input.expression}`)() });
} catch {
return JSON.stringify({ error: "Invalid expression" });
}
}
if (name === "get_weather") {
return JSON.stringify({ temp: 18, condition: "cloudy", city: input.city });
}
return JSON.stringify({ error: "Unknown tool" });
}
async function chat(userMessage: string) {
const messages: Anthropic.Messages.MessageParam[] = [
{ role: "user", content: userMessage },
];
while (true) {
const response = await client.messages.create({
model: "claude-sonnet-4-5-20250514",
max_tokens: 1024,
tools,
messages,
});
// Check if the model wants to use tools
if (response.stop_reason !== "tool_use") {
const textBlock = response.content.find((b) => b.type === "text");
return textBlock?.type === "text" ? textBlock.text : "";
}
// Process each tool_use block
const toolResults: Anthropic.Messages.ToolResultBlockParam[] = [];
for (const block of response.content) {
if (block.type === "tool_use") {
const result = handleToolCall(block.name, block.input as Record<string, string>);
toolResults.push({
type: "tool_result",
tool_use_id: block.id,
content: result,
});
}
}
// Add the assistant's response and tool results to the conversation
messages.push({ role: "assistant", content: response.content });
messages.push({ role: "user", content: toolResults });
}
}
console.log(await chat("What's 247 * 38, and what's the weather in Tokyo?"));import json
import anthropic
client = anthropic.Anthropic()
tools = [
{
"name": "calculate",
"description": "Evaluate a math expression",
"input_schema": {
"type": "object",
"properties": {
"expression": {
"type": "string",
"description": "Math expression like '2 + 2'",
},
},
"required": ["expression"],
},
},
{
"name": "get_weather",
"description": "Get current weather for a city",
"input_schema": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "City name"},
},
"required": ["city"],
},
},
]
def handle_tool_call(name: str, tool_input: dict) -> str:
if name == "calculate":
try:
result = eval(tool_input["expression"])
return json.dumps({"result": result})
except Exception:
return json.dumps({"error": "Invalid expression"})
if name == "get_weather":
return json.dumps(
{"temp": 18, "condition": "cloudy", "city": tool_input["city"]}
)
return json.dumps({"error": "Unknown tool"})
def chat(user_message: str) -> str:
messages = [{"role": "user", "content": user_message}]
while True:
response = client.messages.create(
model="claude-sonnet-4-5-20250514",
max_tokens=1024,
tools=tools,
messages=messages,
)
if response.stop_reason != "tool_use":
text_block = next(
(b for b in response.content if b.type == "text"), None
)
return text_block.text if text_block else ""
tool_results = []
for block in response.content:
if block.type == "tool_use":
result = handle_tool_call(block.name, block.input)
tool_results.append(
{
"type": "tool_result",
"tool_use_id": block.id,
"content": result,
}
)
messages.append({"role": "assistant", "content": response.content})
messages.append({"role": "user", "content": tool_results})
print(chat("What's 247 * 38, and what's the weather in Tokyo?"))Google Gemini (GenerativeAI SDK)
Google takes a different structural approach. Tools are defined as FunctionDeclaration objects and passed in the model configuration rather than per-request. Function calls appear as parts in the model response, and results are sent back as FunctionResponse parts.
import { GoogleGenAI, Type } from "@google/genai";
const ai = new GoogleGenAI({ apiKey: process.env.GOOGLE_API_KEY! });
const tools = [
{
functionDeclarations: [
{
name: "calculate",
description: "Evaluate a math expression",
parameters: {
type: Type.OBJECT,
properties: {
expression: { type: Type.STRING, description: "Math expression" },
},
required: ["expression"],
},
},
{
name: "get_weather",
description: "Get current weather for a city",
parameters: {
type: Type.OBJECT,
properties: {
city: { type: Type.STRING, description: "City name" },
},
required: ["city"],
},
},
],
},
];
function handleToolCall(name: string, args: Record<string, string>) {
if (name === "calculate") {
try {
return { result: Function(`return ${args.expression}`)() };
} catch {
return { error: "Invalid expression" };
}
}
if (name === "get_weather") {
return { temp: 18, condition: "cloudy", city: args.city };
}
return { error: "Unknown tool" };
}
async function chat(userMessage: string) {
const chatSession = ai.chats.create({
model: "gemini-2.5-flash",
config: { tools },
});
let response = await chatSession.sendMessage({ message: userMessage });
// Loop while the model returns function calls
while (response.functionCalls && response.functionCalls.length > 0) {
const functionResponses = response.functionCalls.map((call) => ({
name: call.name,
response: handleToolCall(call.name, call.args as Record<string, string>),
}));
response = await chatSession.sendMessage({
message: functionResponses.map((fr) => ({
functionResponse: fr,
})),
});
}
return response.text;
}
console.log(await chat("What's 247 * 38, and what's the weather in Tokyo?"));from google import genai
from google.genai import types
client = genai.Client(api_key="YOUR_API_KEY")
# Define tools as Python functions with type hints
def calculate(expression: str) -> dict:
"""Evaluate a math expression."""
try:
result = eval(expression)
return {"result": result}
except Exception:
return {"error": "Invalid expression"}
def get_weather(city: str) -> dict:
"""Get current weather for a city."""
return {"temp": 18, "condition": "cloudy", "city": city}
# Gemini Python SDK auto-generates schemas from function signatures
tools = [calculate, get_weather]
chat = client.chats.create(
model="gemini-2.5-flash",
config=types.GenerateContentConfig(tools=tools),
)
# With automatic function calling (enabled by default),
# the SDK handles the entire loop for you
response = chat.send_message("What's 247 * 38, and what's the weather in Tokyo?")
print(response.text)The Python Gemini SDK deserves special attention. It can auto-generate JSON Schema from Python function signatures and docstrings, and it handles the tool call loop automatically by default. You pass real Python functions as tools, and the SDK executes them when the model requests it. That's significantly less boilerplate than OpenAI or Anthropic.
When Does the Model Call Multiple Tools at Once?
Parallel tool calling happens when the model determines that multiple independent operations can run at the same time. Instead of calling one tool, waiting for the result, then calling the next, the model emits all the calls in a single response. Your code executes them concurrently and returns all results at once.
This matters for performance. If a user asks "Find flights from NYC to London and hotels in London for next week," those are two independent lookups. Sequential execution means the user waits for both API calls one after another. Parallel execution cuts that wait roughly in half.
All three major providers support parallel calls. OpenAI does it by default (you can disable it with parallel_tool_calls: false). Anthropic returns multiple tool_use blocks in a single response. Google returns multiple functionCall parts.
The tool_choice parameter controls how aggressively the model uses tools. Here are the options across providers:
| Behavior | OpenAI | Anthropic | |
|---|---|---|---|
| Model decides | tool_choice: "auto" | tool_choice: { type: "auto" } | mode: "AUTO" |
| Must use a tool | tool_choice: "required" | tool_choice: { type: "any" } | mode: "ANY" |
| Use specific tool | tool_choice: { type: "function", name: "x" } | tool_choice: { type: "tool", name: "x" } | mode: "ANY", allowed: ["x"] |
| No tools | tool_choice: "none" | Remove tools | mode: "NONE" |
Let's build a travel planner that searches flights and hotels in parallel, then calculates the total budget sequentially (because it depends on the results of the first two calls).
import OpenAI from "openai";
const client = new OpenAI();
const tools: OpenAI.Responses.Tool[] = [
{
type: "function",
name: "search_flights",
description: "Search for flights between two cities",
parameters: {
type: "object",
properties: {
from: { type: "string", description: "Departure city" },
to: { type: "string", description: "Arrival city" },
date: { type: "string", description: "Travel date (YYYY-MM-DD)" },
},
required: ["from", "to", "date"],
additionalProperties: false,
},
strict: true,
},
{
type: "function",
name: "search_hotels",
description: "Search for hotels in a city",
parameters: {
type: "object",
properties: {
city: { type: "string", description: "City name" },
checkin: { type: "string", description: "Check-in date (YYYY-MM-DD)" },
nights: { type: "number", description: "Number of nights" },
},
required: ["city", "checkin", "nights"],
additionalProperties: false,
},
strict: true,
},
{
type: "function",
name: "calculate_budget",
description: "Calculate total trip budget from flight and hotel costs",
parameters: {
type: "object",
properties: {
flight_cost: { type: "number", description: "Round-trip flight cost" },
hotel_cost: { type: "number", description: "Total hotel cost" },
daily_expenses: { type: "number", description: "Estimated daily expenses" },
days: { type: "number", description: "Number of days" },
},
required: ["flight_cost", "hotel_cost", "daily_expenses", "days"],
additionalProperties: false,
},
strict: true,
},
];
// Simulated tool implementations
function executeTool(name: string, args: Record<string, unknown>): string {
switch (name) {
case "search_flights":
return JSON.stringify({
flights: [
{ airline: "United", price: 450, duration: "7h 30m" },
{ airline: "British Airways", price: 520, duration: "7h 00m" },
],
});
case "search_hotels":
return JSON.stringify({
hotels: [
{ name: "The Grand", price_per_night: 180, rating: 4.5 },
{ name: "City View Inn", price_per_night: 120, rating: 4.2 },
],
});
case "calculate_budget":
const total =
(args.flight_cost as number) +
(args.hotel_cost as number) +
(args.daily_expenses as number) * (args.days as number);
return JSON.stringify({ total_budget: total, currency: "USD" });
default:
return JSON.stringify({ error: "Unknown tool" });
}
}
async function planTrip(query: string) {
let input: OpenAI.Responses.ResponseInput = [
{ role: "user", content: query },
];
while (true) {
const response = await client.responses.create({
model: "gpt-4.1-nano",
input,
tools,
});
const calls = response.output.filter((o) => o.type === "function_call");
if (calls.length === 0) {
return response.output_text;
}
// Execute all calls (parallel calls arrive in the same batch)
for (const call of calls) {
if (call.type === "function_call") {
const args = JSON.parse(call.arguments);
const result = executeTool(call.name, args);
input = [
...input,
call,
{ type: "function_call_output", call_id: call.call_id, output: result },
];
}
}
}
}
const plan = await planTrip(
"Plan a 5-night trip from NYC to London on 2026-06-15. " +
"Find flights and hotels, then calculate my total budget with $100/day expenses."
);
console.log(plan);import json
from openai import OpenAI
client = OpenAI()
tools = [
{
"type": "function",
"name": "search_flights",
"description": "Search for flights between two cities",
"parameters": {
"type": "object",
"properties": {
"from": {"type": "string", "description": "Departure city"},
"to": {"type": "string", "description": "Arrival city"},
"date": {"type": "string", "description": "Travel date (YYYY-MM-DD)"},
},
"required": ["from", "to", "date"],
"additionalProperties": False,
},
"strict": True,
},
{
"type": "function",
"name": "search_hotels",
"description": "Search for hotels in a city",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "City name"},
"checkin": {"type": "string", "description": "Check-in date"},
"nights": {"type": "number", "description": "Number of nights"},
},
"required": ["city", "checkin", "nights"],
"additionalProperties": False,
},
"strict": True,
},
{
"type": "function",
"name": "calculate_budget",
"description": "Calculate total trip budget from flight and hotel costs",
"parameters": {
"type": "object",
"properties": {
"flight_cost": {"type": "number", "description": "Flight cost"},
"hotel_cost": {"type": "number", "description": "Hotel cost"},
"daily_expenses": {"type": "number", "description": "Daily expenses"},
"days": {"type": "number", "description": "Number of days"},
},
"required": ["flight_cost", "hotel_cost", "daily_expenses", "days"],
"additionalProperties": False,
},
"strict": True,
},
]
def execute_tool(name: str, args: dict) -> str:
if name == "search_flights":
return json.dumps(
{
"flights": [
{"airline": "United", "price": 450, "duration": "7h 30m"},
{"airline": "British Airways", "price": 520, "duration": "7h"},
],
}
)
if name == "search_hotels":
return json.dumps(
{
"hotels": [
{"name": "The Grand", "price_per_night": 180, "rating": 4.5},
{"name": "City View Inn", "price_per_night": 120, "rating": 4.2},
],
}
)
if name == "calculate_budget":
total = (
args["flight_cost"]
+ args["hotel_cost"]
+ args["daily_expenses"] * args["days"]
)
return json.dumps({"total_budget": total, "currency": "USD"})
return json.dumps({"error": "Unknown tool"})
def plan_trip(query: str) -> str:
input_messages = [{"role": "user", "content": query}]
while True:
response = client.responses.create(
model="gpt-4.1-nano",
input=input_messages,
tools=tools,
)
calls = [o for o in response.output if o.type == "function_call"]
if not calls:
return response.output_text
for call in calls:
args = json.loads(call.arguments)
result = execute_tool(call.name, args)
input_messages.append(call)
input_messages.append(
{
"type": "function_call_output",
"call_id": call.call_id,
"output": result,
}
)
plan = plan_trip(
"Plan a 5-night trip from NYC to London on 2026-06-15. "
"Find flights and hotels, then calculate my total budget with $100/day expenses."
)
print(plan)The model typically handles this in two rounds. In round one, it calls search_flights and search_hotels in parallel (both appear in the same response). In round two, after receiving those results, it calls calculate_budget with the prices from the search results. Two model calls instead of three, because the parallel calls collapse into one.
How Do You Make Tool Calls Reliable?
Without validation, tool calls can arrive with malformed arguments, missing required fields, or hallucinated function names. Strict mode helps, but it's not available on every provider. You need validation on your side, and you need retry logic for when things go wrong.
The strategy is straightforward: validate arguments before execution, return structured errors when validation fails, and let the model self-correct. Here's how to do it with Zod in TypeScript and Pydantic in Python.
import { z } from "zod";
import OpenAI from "openai";
// Define schemas with Zod
const WeatherArgs = z.object({
city: z.string().min(1, "City name is required"),
units: z.enum(["celsius", "fahrenheit"]).default("celsius"),
});
const SearchArgs = z.object({
query: z.string().min(3, "Query must be at least 3 characters"),
max_results: z.number().int().min(1).max(20).default(5),
});
// Registry: maps tool names to their schemas and handlers
const toolRegistry = {
get_weather: {
schema: WeatherArgs,
handler: (args: z.infer<typeof WeatherArgs>) => ({
temp: 18,
condition: "cloudy",
city: args.city,
units: args.units,
}),
},
search: {
schema: SearchArgs,
handler: (args: z.infer<typeof SearchArgs>) => ({
results: [`Result 1 for "${args.query}"`, `Result 2 for "${args.query}"`],
total: args.max_results,
}),
},
} as const;
type ToolName = keyof typeof toolRegistry;
function executeToolCall(name: string, rawArgs: string): string {
// Guard against hallucinated tool names
if (!(name in toolRegistry)) {
return JSON.stringify({
error: `Unknown tool: "${name}". Available tools: ${Object.keys(toolRegistry).join(", ")}`,
});
}
const tool = toolRegistry[name as ToolName];
// Validate arguments with Zod
const parsed = tool.schema.safeParse(JSON.parse(rawArgs));
if (!parsed.success) {
return JSON.stringify({
error: "Invalid arguments",
details: parsed.error.issues.map((i) => ({
field: i.path.join("."),
message: i.message,
})),
});
}
// Execute with validated args
try {
const result = tool.handler(parsed.data as any);
return JSON.stringify(result);
} catch (err) {
return JSON.stringify({
error: `Execution failed: ${err instanceof Error ? err.message : "Unknown error"}`,
});
}
}from pydantic import BaseModel, Field
import json
# Define schemas with Pydantic
class WeatherArgs(BaseModel):
city: str = Field(min_length=1, description="City name")
units: str = Field(default="celsius", pattern="^(celsius|fahrenheit)$")
class SearchArgs(BaseModel):
query: str = Field(min_length=3, description="Search query")
max_results: int = Field(default=5, ge=1, le=20)
# Registry: maps tool names to schemas and handlers
TOOL_REGISTRY = {
"get_weather": {
"schema": WeatherArgs,
"handler": lambda args: {
"temp": 18,
"condition": "cloudy",
"city": args.city,
"units": args.units,
},
},
"search": {
"schema": SearchArgs,
"handler": lambda args: {
"results": [
f'Result 1 for "{args.query}"',
f'Result 2 for "{args.query}"',
],
"total": args.max_results,
},
},
}
def execute_tool_call(name: str, raw_args: str) -> str:
# Guard against hallucinated tool names
if name not in TOOL_REGISTRY:
return json.dumps(
{
"error": f'Unknown tool: "{name}". '
f'Available: {", ".join(TOOL_REGISTRY.keys())}',
}
)
tool = TOOL_REGISTRY[name]
# Validate arguments with Pydantic
try:
parsed = tool["schema"].model_validate_json(raw_args)
except Exception as e:
return json.dumps({"error": "Invalid arguments", "details": str(e)})
# Execute with validated args
try:
result = tool["handler"](parsed)
return json.dumps(result)
except Exception as e:
return json.dumps({"error": f"Execution failed: {str(e)}"})The key insight: return errors as structured tool results, not as exceptions. When the model receives { "error": "Unknown tool: search_web. Available: search, get_weather" }, it can self-correct and retry with the right tool name. If you throw an exception, the entire loop breaks.
For retries, limit the loop to a maximum number of iterations (3-5 is common). If the model hasn't produced a final text response after that many rounds of tool calls, something is wrong, and you should bail out with an error message rather than burning tokens in an infinite loop.
How Do You Build a Multi-Tool Research Agent?
A research agent needs to search for information, read specific pages, summarize what it finds, and save notes for later. That's four tools working together in a multi-turn loop. The model decides which tool to call and when, chaining results across multiple rounds until it has enough information to answer the question.
This example builds a fully working research agent in about 80 lines. It uses simulated tool implementations so you can run it without external API keys, but swapping in real search and fetch APIs is straightforward.
import OpenAI from "openai";
const client = new OpenAI();
// In-memory note storage
const notes: { title: string; content: string }[] = [];
const tools: OpenAI.Responses.Tool[] = [
{
type: "function",
name: "web_search",
description: "Search the web for information. Returns titles and URLs.",
parameters: {
type: "object",
properties: {
query: { type: "string", description: "Search query" },
},
required: ["query"],
additionalProperties: false,
},
strict: true,
},
{
type: "function",
name: "read_url",
description: "Read the text content of a web page.",
parameters: {
type: "object",
properties: {
url: { type: "string", description: "URL to read" },
},
required: ["url"],
additionalProperties: false,
},
strict: true,
},
{
type: "function",
name: "save_note",
description: "Save a research note with a title and content.",
parameters: {
type: "object",
properties: {
title: { type: "string", description: "Note title" },
content: { type: "string", description: "Note content (markdown)" },
},
required: ["title", "content"],
additionalProperties: false,
},
strict: true,
},
{
type: "function",
name: "list_notes",
description: "List all saved research notes.",
parameters: {
type: "object",
properties: {},
required: [],
additionalProperties: false,
},
strict: true,
},
];
function executeTool(name: string, args: Record<string, string>): string {
switch (name) {
case "web_search":
// Simulated search results
return JSON.stringify({
results: [
{
title: `Understanding ${args.query}`,
url: `https://example.com/article-1`,
snippet: `A practical guide to ${args.query} covering key concepts.`,
},
{
title: `${args.query}: Best Practices 2026`,
url: `https://example.com/article-2`,
snippet: `Industry best practices and patterns for ${args.query}.`,
},
],
});
case "read_url":
// Simulated page content
return JSON.stringify({
title: "Article Title",
content: `This article covers the topic in depth. Key points: 1) The concept originated in 2023. 2) Adoption grew 300% in 2025. 3) Current best practices include validation, retry logic, and monitoring. The technology enables AI models to interact with external systems through structured requests.`,
word_count: 47,
});
case "save_note":
notes.push({ title: args.title, content: args.content });
return JSON.stringify({ saved: true, total_notes: notes.length });
case "list_notes":
return JSON.stringify({ notes, count: notes.length });
default:
return JSON.stringify({ error: `Unknown tool: ${name}` });
}
}
async function research(question: string, maxRounds = 10): Promise<string> {
const input: OpenAI.Responses.ResponseInput = [
{
role: "system",
content:
"You are a research assistant. Search the web, read articles, " +
"save important findings as notes, and synthesize a final answer. " +
"Be thorough: search, read at least 2 sources, save key findings.",
},
{ role: "user", content: question },
];
for (let round = 0; round < maxRounds; round++) {
const response = await client.responses.create({
model: "gpt-4.1-nano",
input,
tools,
});
const calls = response.output.filter((o) => o.type === "function_call");
if (calls.length === 0) {
return response.output_text;
}
for (const call of calls) {
if (call.type === "function_call") {
const args = JSON.parse(call.arguments);
const result = executeTool(call.name, args);
input.push(call);
input.push({
type: "function_call_output",
call_id: call.call_id,
output: result,
});
}
}
}
return "Research incomplete: max rounds reached.";
}
const answer = await research("What is function calling in AI and why does it matter?");
console.log(answer);
console.log("\nSaved notes:", notes);import json
from openai import OpenAI
client = OpenAI()
# In-memory note storage
notes: list[dict] = []
tools = [
{
"type": "function",
"name": "web_search",
"description": "Search the web for information. Returns titles and URLs.",
"parameters": {
"type": "object",
"properties": {
"query": {"type": "string", "description": "Search query"},
},
"required": ["query"],
"additionalProperties": False,
},
"strict": True,
},
{
"type": "function",
"name": "read_url",
"description": "Read the text content of a web page.",
"parameters": {
"type": "object",
"properties": {
"url": {"type": "string", "description": "URL to read"},
},
"required": ["url"],
"additionalProperties": False,
},
"strict": True,
},
{
"type": "function",
"name": "save_note",
"description": "Save a research note with a title and content.",
"parameters": {
"type": "object",
"properties": {
"title": {"type": "string", "description": "Note title"},
"content": {"type": "string", "description": "Note content"},
},
"required": ["title", "content"],
"additionalProperties": False,
},
"strict": True,
},
{
"type": "function",
"name": "list_notes",
"description": "List all saved research notes.",
"parameters": {
"type": "object",
"properties": {},
"required": [],
"additionalProperties": False,
},
"strict": True,
},
]
def execute_tool(name: str, args: dict) -> str:
if name == "web_search":
return json.dumps(
{
"results": [
{
"title": f"Understanding {args['query']}",
"url": "https://example.com/article-1",
"snippet": f"A practical guide to {args['query']}.",
},
{
"title": f"{args['query']}: Best Practices 2026",
"url": "https://example.com/article-2",
"snippet": f"Industry best practices for {args['query']}.",
},
],
}
)
if name == "read_url":
return json.dumps(
{
"title": "Article Title",
"content": (
"Key points: 1) The concept originated in 2023. "
"2) Adoption grew 300% in 2025. 3) Best practices include "
"validation, retry logic, and monitoring."
),
"word_count": 30,
}
)
if name == "save_note":
notes.append({"title": args["title"], "content": args["content"]})
return json.dumps({"saved": True, "total_notes": len(notes)})
if name == "list_notes":
return json.dumps({"notes": notes, "count": len(notes)})
return json.dumps({"error": f"Unknown tool: {name}"})
def research(question: str, max_rounds: int = 10) -> str:
input_messages = [
{
"role": "system",
"content": (
"You are a research assistant. Search the web, read articles, "
"save important findings as notes, and synthesize a final answer. "
"Be thorough: search, read at least 2 sources, save key findings."
),
},
{"role": "user", "content": question},
]
for _ in range(max_rounds):
response = client.responses.create(
model="gpt-4.1-nano",
input=input_messages,
tools=tools,
)
calls = [o for o in response.output if o.type == "function_call"]
if not calls:
return response.output_text
for call in calls:
args = json.loads(call.arguments)
result = execute_tool(call.name, args)
input_messages.append(call)
input_messages.append(
{
"type": "function_call_output",
"call_id": call.call_id,
"output": result,
}
)
return "Research incomplete: max rounds reached."
answer = research("What is function calling in AI and why does it matter?")
print(answer)
print("\nSaved notes:", notes)The agent typically runs through 3-5 rounds: search for the topic, read two articles, save notes from each, then synthesize a final answer. Each round adds to the conversation history, so the model has full context of what it's already found. This is the same loop pattern from the first example, just with more tools and a system prompt that guides the agent's behavior.
If you're building agents with persistent memory across sessions, you'd replace the in-memory notes array with a database-backed storage system. The agent could then recall research from previous conversations, building knowledge over time. There's a deeper problem here though: most agents have 50 tools and zero memory, which makes them powerful but forgetful. Our guide on building a RAG pipeline covers how to make that retrieval layer fast and accurate, and our article on why RAG quality depends on chunking, not the model explains the retrieval pitfalls you'll hit first.
What Breaks in Production?
The most common production failures are rate limits from external APIs, tool execution timeouts, runaway token costs from deep tool-call loops, and missing security boundaries. Function calling works well in demos. Production is where these edge cases live, and here's how to handle each one.
Rate limiting and timeouts. External APIs have rate limits. Your tool might call a third-party service that returns a 429 status, or a database query that takes 30 seconds. Set timeouts on every tool execution (5-10 seconds is reasonable for most APIs) and return structured error messages when they fire.
async function executeWithTimeout(
fn: () => Promise<unknown>,
timeoutMs: number = 5000
): Promise<string> {
try {
const result = await Promise.race([
fn(),
new Promise((_, reject) =>
setTimeout(() => reject(new Error("Tool execution timed out")), timeoutMs)
),
]);
return JSON.stringify(result);
} catch (err) {
return JSON.stringify({
error: err instanceof Error ? err.message : "Execution failed",
retry: true,
});
}
}import asyncio
import json
async def execute_with_timeout(fn, timeout_seconds: float = 5.0) -> str:
try:
result = await asyncio.wait_for(fn(), timeout=timeout_seconds)
return json.dumps(result)
except asyncio.TimeoutError:
return json.dumps({"error": "Tool execution timed out", "retry": True})
except Exception as e:
return json.dumps({"error": str(e), "retry": True})Cost tracking. Each tool call round-trip costs tokens. A research agent that runs 8 rounds with GPT-4 can cost $0.15-0.50 per query. Track token usage per conversation and set hard limits. The model's context window grows with each round because you're appending tool calls and results to the message history.
Security: whitelist your tools. Never let the model call arbitrary functions. The tool registry pattern from the validation section handles this: if the model hallucinates a tool name that isn't in your registry, it gets an error, not code execution. In production, also validate that tool arguments don't contain injection attacks (SQL injection in a database query tool, path traversal in a file read tool).
// Allowlist: only these tools can be executed
const ALLOWED_TOOLS = new Set(["get_weather", "search", "calculate"]);
function safeguardedExecution(name: string, args: string): string {
if (!ALLOWED_TOOLS.has(name)) {
return JSON.stringify({ error: `Tool "${name}" is not permitted` });
}
// ... validate args and execute
return executeToolCall(name, args);
}Monitoring and observability. Log every tool call: what was requested, what arguments the model chose, how long execution took, and what result came back. When an agent misbehaves, these logs are how you diagnose it. Scorecards and automated evaluation frameworks let you score tool call accuracy across hundreds of conversations rather than debugging one at a time. Our guide on evaluating AI agents covers the measurement side in depth.
Testing with scenarios. Before shipping a tool-using agent, run it through simulated scenarios that exercise each tool with realistic inputs. Does it handle the "no results found" case? What happens when two tools return conflicting data? Does it recover gracefully when a tool times out? These are the questions that scenario testing answers before real users hit them.
Connection to MCP
Function calling is the model-level primitive: how a single LLM requests tool execution in a single conversation. But what happens when you have 50 tools across 10 different services? What if tools need to be discovered at runtime rather than hardcoded in your tool definitions?
That's where the Model Context Protocol (MCP) comes in. MCP is the transport and discovery layer that sits above function calling. An MCP server exposes tools through a standardized protocol, and any MCP-compatible client can discover and call them without knowing the tool definitions in advance.
Think of it this way: function calling is how the model talks to your code. MCP is how your code discovers what tools exist. They're complementary, not competing. Every MCP tool call ultimately becomes a function call at the model level.
The Primitive That Makes Agents Possible
Function calling is the dividing line between chatbots and agents. Before it existed, LLMs could only generate text. They could describe actions but never take them.
What makes this powerful isn't any single tool call. It's the loop. A model that can call one tool is useful. A model that can call multiple tools in sequence, reason about intermediate results, and decide what to do next is an agent. Every production agent system, from customer support bots to coding assistants to research tools, runs on this same cycle: define tools, send schemas, handle calls, return results, repeat.
The pattern is identical across OpenAI, Anthropic, and Google, even when the API syntax isn't. And it connects to everything else in the modern agent stack: MCP for tool discovery, RAG for knowledge retrieval, embeddings for similarity search, evals for quality measurement, and tool management for keeping it all organized.
The model generates the request. Your infrastructure makes it real.
Give Your Agent Real Tools
Register, test, and manage your agent's function calls with built-in MCP support and execution monitoring.
Start BuildingCo-founder
Building the platform for AI agents at Chanl — tools, testing, and observability for customer experience.
Learn Agentic AI
One lesson a week — practical techniques for building, testing, and shipping AI agents. From prompt engineering to production monitoring. Learn by doing.



