When MCP (Model Context Protocol) came out, it solved a critical problem around connectivity. It became the “USB-C for AI,” making it simple to connect agents with tools, APIs, and data sources, and that is why adoption has been so strong across the ecosystem.
The gap, however, is in how MCP exposes everything at once. Raw API surfaces, large schemas, and dozens of parameters are all passed directly into the model prompt, which inflates context, consumes unnecessary tokens, increases cost, and reduces accuracy.
This is where “tool masking” becomes important. Instead of presenting the full API surface to the model, we can shape it for the specific task at hand. If the agent only needs a stock price, a simple mask that provides just that is sufficient. If it needs revenue, another mask can expose only revenue. The underlying handler remains the same, but the agent-facing surface is streamlined and optimized for the job.
The benefits are clear. Smaller prompts lead to faster responses and lower costs, while accuracy improves because the model is not distracted by irrelevant options.
At enterprise scale, where millions of tokens are processed every minute, these optimizations quickly add up to significant gains.
I believe tool masking is more than just technical hygiene; it is essentially prompt engineering at the tool boundary. While MCP has already addressed connectivity, the real challenge now is execution quality. The way forward is not to expose one large surface for everything but to design multiple masks from the same handler, each carefully tuned for a specific use case.
That is how we move towards building smaller, smarter, and more reliable agents.
Are you overloading your AI agents with too many tools?