---
title: "Scaling MCP Tools with Anthropic's Defer Loading"
img: https://s3.us-east-2.amazonaws.com/unified-article-images/scaling_mcp_tools_with_anthropic_defer_loading-icon.png
date: 2025-12-25T18:46:00.000Z
tag: Guides
description: "Learn how to use Anthropic's `defer_loading` tool search features with Unified's MCP server to efficiently manage hundreds of tools while maintaining high..."
url: "https://docs.unified.to/guides/scaling_mcp_tools_with_anthropic_defer_loading"
---

# Scaling MCP Tools with Anthropic's Defer Loading
------
_December 25, 2025_

Learn how to use Anthropic's `defer_loading` tool search features with Unified's MCP server to efficiently manage hundreds of tools while maintaining high accuracy and context efficiency.


When building AI applications with MCP servers, including the [Unified MCP server](https://www.notion.so/mcp/overview), you quickly encounter a critical challenge: most LLM models struggle with large numbers of tools. 


While Unified can provide thousands of tools across different integrations, traditional approaches hit two key limitations:

- **Context window bloat**: Tool definitions consume massive portions of your context (50 tools ≈ 10-20K tokens)
- **Tool selection degradation**: An LLM's ability to correctly select tools degrades significantly beyond 30-50 tools

Anthropic's new **defer_loading** feature solves both first problems by dynamically discovering and loading tools on-demand instead of loading all tool definitions upfront.


## The Problem: Too Many Tools


The Unified MCP server can expose tools from any connected integration— CRM, ATS, HRIS, ticketing, storage, and more. A single connection might offer 50+ tools, and with multiple integrations, you could easily have 200+ available tools.  Overall, the Unified MCP server currently support more than 22,000 tools.


Traditional approach problems:

- Loading 200 tool definitions uses 40,000-80,000 tokens
- The LLM API struggles to select the correct tool from such a large set
- You waste context on tools that won't be used in that conversation

## The Solution: Anthropic's Defer Loading Tool Search


Anthropic's `defer_loading` feature works with two tool search variants:


### Tool Search Variants


**1. Regex Tool Search** (`tool_search_tool_regex_20251119`)

- Claude constructs regex patterns to search for tools
- Best for exact matches and pattern-based discovery
- Fast and efficient for well-named tools

**2. BM25 Tool Search** (`tool_search_tool_bm25_20251119`)

- Claude uses natural language queries to search
- Better for semantic understanding
- More flexible for varied naming conventions

### How It Works

1. You include a tool search tool in your tools list
2. You provide all tool definitions with `defer_loading: true`
3. Claude sees only the tool search tool initially
4. When Claude needs additional tools, it searches dynamically
5. The API returns 3-5 most relevant tools
6. These are automatically expanded into full definitions
7. Claude selects and invokes the appropriate tool

## Implementation


To use the new `defer_loading` tool option, follow these instructions (found [here](https://platform.claude.com/docs/en/agents-and-tools/tool-use/tool-search-tool#mcp-integration)) when calling Anthropic's ``/v1/messages`` API:

- Add `"mcp-client-2025-11-20"`  to the `anthropic-beta` header:
`anthropic-beta: advanced-tool-use-2025-11-20,mcp-client-2025-11-20`
- Add the Unified MCP server's URL as usual

```json
"mcp_servers": [
  {
    "type": "url",
    "name": "unified-salesforce-server",
    "url": "https://mcp-api.unified.to?token=x&connection=y"
  }
],
```

- Then add an additional `tools` array with configuration on which tools to defer:

```json
"tools": [
  {
    "type": "tool_search_tool_regex_20251119",
    "name": "tool_search_tool_regex"
  },
  {
    "type": "mcp_toolset",
    "mcp_server_name": "unified-salesforce-server",
    "default_config": {
      "defer_loading": true
    },
    "configs": {
      "list_crm_contacts": {
        "defer_loading": false
      }
    }
  }
],
```

- Use the `default_config.defer_loading: true` option to make all tools deferrable
- Use the `configs.{tool_id}.defer_loading: true/false` to set an individual tool to defer (or not)

Or, alternatively, you can call the Unified MCP `/tools` endpoint with parameters `?type=anthropic&defer_tools=all` and then feed that `tools` result into Anthropic's API.


## Best Practices


### 1. Keep Core Tools Non-Deferred


If you have 3-5 tools that are frequently used, keep them as non-deferred.


### 2. Use Permissions to Scope Tools


Reduce the tool catalog size by requesting only the permissions you need:


```typescript
// Instead of all tools across all categories
permissions: 'crm_contact_read,crm_contact_write,crm_deal_read'

// This is better than loading 200+ tools from all categories
```


### 3. Restrict Tools


Use the Unified MCP `tools` parameter to restrict which tools will be returned back to the LLM API.  


This has a different effect than the `defer_tools` parameter as it doesn't return restricted tools to the LLM API at all, while deferring tools means that the LLM API knows about the tool, but doesnt process it until it is needed.


### 4. Monitor Token Usage


Track your token consumption to understand the benefits:


```typescript
console.log(`Input tokens: ${response.usage.input_tokens}`);
console.log(`Output tokens: ${response.usage.output_tokens}`);
console.log(`Tool search requests: ${response.usage.server_tool_use?.tool_search_requests}`);
```


### 5. Combine with Prompt Caching


Use [prompt caching](https://www.notion.so/guides/building_ai_applications_with_unified_and_langbase) with defer_loading for multi-turn conversations:


```typescript
messages.push({
    role: "user",
    content: "Now find their recent deals",
    cache_control: { type: "ephemeral" }
});
```


## Tool Search Limits


Be aware of these limits:

- **Maximum tools**: 10,000 tools in your catalog
- **Search results**: Returns 3-5 most relevant tools per search
- **Pattern length**: Maximum 200 characters for regex patterns
- **Model support**: Claude Sonnet 4.5+ and Opus 4.5+ only

## When to Use Defer Loading


**Good use cases:**

- 20+ tools available from Unified connections
- Multiple integrations (CRM + ATS + HRIS + Storage)
- Building multi-tenant applications where each tenant has different integrations
- Context window is getting tight with tool definitions
- Tool selection accuracy is degrading

**When traditional tool calling might be better:**

- Less than 10 tools total
- All tools are frequently used in every request
- Very focused single-integration use case

## Real-World Example: Multi-Integration Assistant


Here's a practical example of a customer support assistant that accesses multiple integrations:


```typescript
async function createSupportAssistant(crm_connection_id, hris_connection_id, zendesk_connection_id) {
    // Fetch tools from multiple Unified connections
    const crmTools = await fetchUnifiedTools(crm_connection_id, 'crm_contact_read,crm_deal_read', { type: 'anthropic', defer_tools: 'list_crm_'});
    const ticketingTools = await fetchUnifiedTools(zendesk_connection_id, 'ticketing_ticket_read,ticketing_ticket_write', { type: 'anthropic', defer_tools: 'list_crm_'});
    const hrisTools = await fetchUnifiedTools(hris_connection_id, 'hris_employee_read', { type: 'anthropic', defer_tools: 'list_crm_'});

    // Combine all tools with defer_loading
    const tools = [
        ...crmTools,
        ...ticketingTools,
        ...hrisTools
    ];

    // Total: 150+ tools, with all listX tools being deffered
    return await anthropic.beta.messages.create({
        model: "claude-sonnet-4-5-20250929",
        betas: ["advanced-tool-use-2025-11-20"],
        max_tokens: 4096,
        messages: [{
            role: "user",
            content: "Customer John Doe from Acme Corp called about ticket #12345. Show me his account info, open tickets, and any recent deals."
        }],
        tools: tools
    });
}
```


## Resources

- [Unified MCP Server Overview](https://docs.unified.to/mcp/overview)
- [MCP Installation & Usage](https://docs.unified.to/mcp/installation)
- [MCP Server Options](https://docs.unified.to/mcp/server-options)
- [Anthropic Tool Search Documentation](https://platform.claude.com/docs/en/agents-and-tools/tool-use/tool-search-tool)
- [Anthropic MCP Integration Guide](https://platform.claude.com/docs/en/agents-and-tools/tool-use/tool-search-tool#mcp-integration)

## TLDR


Anthropic's `defer_loading` feature is a game-changer for building AI applications with Unified's MCP server. By the LLM API dynamically loading tools on-demand, you can:

- **Scale to hundreds of tools** across multiple integrations
- **Reduce context usage** by 80-90%
- **Improve tool selection accuracy** significantly
- **Build more capable AI assistants** that access diverse data sources

Start by adding the tool search tool and marking your Unified MCP tools as deferred. Monitor your token usage and tool selection accuracy to see the immediate benefits.


The combination of Unified's extensive integration network and Anthropic's intelligent tool search opens up possibilities for building truly comprehensive AI agents that can work across your entire SaaS ecosystem.