How AI Models Are Learning to Do Instead of Just Say
Function calling allows large language models (LLMs) to interact with external tools, APIs and functions based on the user's input. Instead of just generating text, the LLM can determine that a specific action needs to be taken, and then request that action to be performed by an external function.
Function calling allows users to interact with complex systems using natural language while the LLM handles the underlying function executions. So the LLMs can focus on solving practical problems instead of just generating text.
For example, if a user asks for the weather, the model can call a weather API to get real-time data instead of just generating a generic response. It might help you to remind you to take an umbrella with you, if there’s a chance of rain.
Function calling process explained
Let’s break down how function calling works inside an LLM
- User Query: The process starts when a user asks a question or requests an action (e.g., "What leads are in my CRM?" or "Check if product X is in stock").
- LLM Processing: The LLM analyzes the query and recognizes it needs external data or an action to fulfill the request. For example:
- If the user asks about leads in their CRM, the LLM identifies the need to fetch live data.
- If the user wants inventory details, it triggers a database lookup.
- Function Call Decision: The LLM decides to execute a function call, which can be either:
- API Calls: Connects to external services (e.g., a CRM API to retrieve real-time opportunities from Salesforce).
- Custom Functions: Accesses internal tools or databases (e.g., Inventory Access to check stock levels).
- Data Retrieval: The function fetches the required data (e.g., leads from the Salesforce API, product availability from a warehouse database).
- Data Integration: The retrieved data is sent back to the LLM, which processes it and generates a contextual, accurate response for the user.
Function calling use-cases and how it improve the performance
By being able to call functions, the LLM isn't just limited to text generation. It can perform actions like retrieving live data or interacting with other software. That makes the model more dynamic and useful in real-world applications. Like:
- Providing up-to-date information: If the model can pull in up-to-date information through function calls, it can provide more accurate answers. For instance, answering questions about current events without function calling might lead to outdated info, but with access to a news API, the answers stay current.
- Automation of repetitive tasks: Function calling could automate repetitive tasks. Like, if a user wants to schedule a meeting, the LLM could call a calendar API to add an event automatically. This saves time and reduces the need for manual input.
- Connecting with other services: LLMs can become part of a larger ecosystem, connecting with databases, CRMs, or other enterprise systems. This makes them more versatile in professional environments.
- Handling complex workflows: Instead of just answering a single question, the LLM can orchestrate multiple function calls to solve a multi-step problem. Like planning a trip by checking flight availability, booking a hotel, and renting a car through different APIs.
- Update information without re-training: As new functions or APIs become available, the LLM can be updated to use them without retraining the entire model. This keeps the system up-to-date with minimal effort.
Some examples of function calling
You might have already seen or experienced function calling if you've used any GPTs from the ChatGPT marketplace. These GPTs execute custom functions let people create specialized tools like to-do list builders, prompt enhancers, app connectors, and Q&A bots. The built-in 'Tasks' feature in ChatGPT uses this too – it can set reminders by triggering functions at specific times.
Claude’s Model Context Protocol (MCP) does something similar. With Sonnet 3.5, Claude can activate tools like Brave Search for web results, tap into its graph memory system, or link to other apps. Both systems show how AI now uses these "function calls" to connect its core intelligence to real-world tools.
AI Models that support function calling
Note: Sometimes function calling is also known as tool calling. These two are the same terms.
Here's an overview of some major companies and their models that support this feature:
- OpenAI: GPT-4o
- Meta: Llama 3.3
- Google: Gemini 2.0 Flash Experimental
- Anthropic: Claude
- Cohere: Command R+
- Mistral: Mistral Large, Mistral Small, Codestral, Ministral 8B, Ministral 3B, Pixtral 12B, Pixtral Large, Mistral Nemo
While there are many more, these models are easily accessible via API and some are open source available for local use via Ollama
Function Calling in Action — Let’s create an AI Search Tool using Ollama
We’re building a search tool where Llama 3.2 acts as a decision-maker – it first analyzes whether a query requires real-time web data. If yes, it triggers the web_search tool automatically via Ollama’s tool-calling API. This mimics how Perplexity balances AI reasoning with live data. What we will need:
- Ollama: Hosts the Llama 3.2 model locally.
- Python 3.11+: Required for async/await patterns (critical for performance)
- SearchAPI: Free tier supports 100 requests/day. (Link)
Let’s start with defining the function:
Note: Ollama calls this as tool calling which is same as function calling.
This is going to be our tool, whose parameters are based on the answers returned by Google via the SearchAPI
# Define our search tool
search_tool = {
'type': 'function',
'function': {
'name': 'web_search',
'description': 'Search the web for current information on a topic',
'parameters': {
'type': 'object',
'required': ['query'],
'properties': {
'query': {
'type': 'string',
'description': 'The search query to look up'
}
}
}
}
}
This will be used in our file main.py
where we will be making a call to Ollama and then try to decide whether to call the tool or not.
Our project structure will look something like this:
project_folder/
├── .env
├── search_tool.py
└── main.py
File Explanation:
- .env: Isolates API keys from code
- search_tool.py: Separates search logic for reusability
- main.py: Focuses on orchestration (model ↔ tool interaction)
Packages can be installed by doing:
pip install ollama python-dotenv requests
Main Workflow (main.py)
- AsyncClient: Handles concurrent tool calls without blocking
- Tool Response Handling:
- Checks for tool_calls in response
- Uses messages array to maintain conversation state
- Error Propagation: Returns auth/search errors directly to users
The main.py’s code looks like:
import asyncio
from ollama import AsyncClient
from search_tool import web_search, extract_content
async def process_query(query: str) -> str:
client = AsyncClient()
# Define our search tool
search_tool = {
'type': 'function',
'function': {
'name': 'web_search',
'description': 'Search the web for current information on a topic',
'parameters': {
'type': 'object',
'required': ['query'],
'properties': {
'query': {
'type': 'string',
'description': 'The search query to look up'
}
}
}
}
}
# First, let Ollama decide if it needs to search
response = await client.chat(
'llama3.2',
messages=[{
'role': 'user',
'content': f'Answer this question: {query}'
}],
tools=[search_tool]
)
# Initialize available functions
available_functions = {
'web_search': web_search
}
# Check if Ollama wants to use the search tool
if response.message.tool_calls:
print("Searching the web...")
for tool in response.message.tool_calls:
if function_to_call := available_functions.get(tool.function.name):
# Call the search function
search_results = function_to_call(**tool.function.arguments)
if "error" in search_results:
if search_results["error"] == "authentication_failed":
return "Authentication failed. Please check your API key."
return f"Search error: {search_results['error']}"
# Extract relevant content
content = extract_content(search_results)
if not content:
return "No relevant information found."
# Add the search results to the conversation
messages = [
{'role': 'user', 'content': query},
response.message,
{
'role': 'tool',
'name': tool.function.name,
'content': content
}
]
# Get final response from Ollama with the search results
final_response = await client.chat(
'llama3.2',
messages=messages
)
return final_response.message.content
# If no tool calls, return the direct response
return response.message.content
async def main():
question = input("What would you like to know? ")
print("\nProcessing your question...")
answer = await process_query(question)
print("\nAnswer:")
print(answer)
if __name__ == "__main__":
asyncio.run(main())
Now let’s focus on creating our search_tool where we have to save the function to make the API call & get the results. The code is based on the schema available here.
Search Implementation (search_tool.py)
- API Error Handling: Catches 401/429 explicitly
- Content Extraction:
- Prioritizes answer_box (featured snippets)
- Limits to 4 results to avoid token overflow
The search_tool.py’s code looks like:
import os
import requests
from typing import Dict, Any
from dotenv import load_dotenv
# Search the web using SearchAPI
def web_search(query: str) -> Dict[Any, Any]:
load_dotenv()
api_key = os.getenv('SEARCH_API_KEY')
if not api_key:
return {"error": "API key not found in environment variables"}
url = "https://www.searchapi.io/api/v1/search"
headers = {
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json"
}
params = {
"engine": "google_news",
"q": query,
"num": 5
}
try:
response = requests.get(url, headers=headers, params=params)
if response.status_code == 401:
return {"error": "Invalid API key or authentication failed"}
elif response.status_code == 429:
return {"error": "Rate limit exceeded"}
response.raise_for_status()
return response.json()
except requests.exceptions.RequestException as e:
error_msg = f"Error fetching search results: {e}"
if hasattr(e, 'response') and e.response:
try:
error_details = e.response.json()
error_msg = f"{error_msg} - {error_details.get('message', '')}"
except:
pass
return {"error": error_msg}
# Extract relevant content from search results since a lot of data is returned from the API
def extract_content(search_results: dict) -> str:
content = []
if "organic_results" in search_results:
for result in search_results["organic_results"][:4]: # Taking the top 4 results
if "snippet" in result:
content.append(result["snippet"])
if "answer_box" in search_results and search_results["answer_box"]:
if "answer" in search_results["answer_box"]:
content.insert(0, search_results["answer_box"]["answer"])
return "\n\n".join(content)
Create a .env file and store your API key in it. Get your SearchAPI’s key here.
SEARCH_API_KEY=ABCD123
And then just get started by:
python main.py
And then you can ask any question and Ollama will decide whether to use tool or not.
You can find the full code on GitHub here.
Conclusion
Function calling lets AI models do more than just talk—they can now trigger actions, like pulling live CRM data or checking inventory, by connecting to tools and APIs. Instead of generic answers, they solve real problems by bridging natural language with software and real-world data. This turns chatbots into dynamic assistants that automate tasks, fetch updates, and interact with other systems, reshaping how we use AI for everyday needs.
Are you currently using function calling inside of your AI agents and applications? Feel free to get in touch with our team.
References & Tools:
- https://github.com/ollama/ollama-python
- https://www.searchapi.io/
- https://ollama.com/library/llama3.2
- https://www.searchapi.io/docs/google
Ready to get started?
Scale your integration strategy and deliver the integrations your customers need in record time.