MCP Tool Definition Lazy Loading Middleware to Stop Context Window Bloat

dev tool real project •• multiple requests

MCP servers burn 55,000+ tokens on tool definitions before an AI agent processes a single user message. One team reported 72% of their 200K context window consumed by three MCP servers. Developers building with AI agents need middleware that dynamically loads only the tool definitions relevant to the current task.

builder note

Don't try to fix the MCP spec. Build a proxy that intercepts MCP tool registration, clusters tools by capability, and only injects the relevant cluster when the agent's intent is classified. The Scalekit benchmark data showing 4-32x token savings vs CLI gives you a clear ROI story.

landscape (3 existing solutions)

No middleware exists that sits between MCP servers and LLM clients to dynamically load/unload tool schemas based on task context. The protocol itself has no lazy loading spec. Current workarounds are either abandoning MCP for CLI or manually pruning tool lists.

Apideck CLI Replaces MCP with CLI entirely rather than fixing MCP. Requires agent framework to support shell execution. Not middleware.
MCP Protocol (manual pruning) Protocol lacks built-in lazy loading or tool grouping. Developers must manually audit and collapse tools, which is tedious and fragile.
Perplexity Agent API Handles tool execution internally but locks you into Perplexity's ecosystem. Not a general middleware layer.

sources (3)

other https://www.apideck.com/blog/mcp-server-eating-context-windo... "143,000 of 200,000 tokens burned on tool definitions alone" 2026-03-16
other https://dev.to/allentcm/why-i-switched-from-mcp-to-cli-3ifb "Atlassian MCP consumed 40-50% of the context window before a single useful thing" 2026-04-01
other https://www.junia.ai/blog/mcp-context-window-problem "tool bloat hurts AI agent performance" 2026-03-25
MCPAI agentscontext windowLLM toolingdeveloper infrastructure