If you're running ToolPiper alongside Claude Code, you have access to 147 MCP tools without lifting a finger. Browser automation, OCR, scrape, audio, pose, RAG, GitHub, Reddit, Google Search Console, the whole catalog is one MCP server connection away. That's a lot of capability. It's also a lot of tool definitions sitting in your context window, every turn, whether you use them or not.
This piece walks through how we keep that cheap. Two of the three layers you can configure yourself. The third runs automatically and is the part of ModelPiper we're most proud of.
Why an MCP catalog costs tokens
Every MCP tool a client connects to has a name, a description, and a JSON Schema for its parameters. Claude Code reads all of that at session start so it knows what's available. With 147 tools the naive worst case is several thousand tokens of tool catalog burned on every single turn before you've even typed a question.
Multiply that by a long debugging session and the math gets painful, especially on cloud models priced per million input tokens. The catalog isn't doing work. It's just reminding the model that the work could be done. Reducing what the model sees per turn is the whole game.
Layer 1: Claude Code already does part of this for you
Claude Code uses deferred schema loading for MCP servers. At connection time, only the tool names and short descriptions enter the context. The full JSON schemas, parameter shapes, enums, nested objects, are not loaded until the model asks for one specifically through an internal tool-search step.
You can see this directly. Run Claude Code with ToolPiper connected and look at the MCP tools listing. You'll see something like:
Their schemas are NOT loaded. Use ToolSearch with query "select:<name>..." to load tool schemas before calling them.
That's the cost-control mechanism doing its job. The 147 names and short blurbs cost a few thousand tokens. The full schemas, which are the bulk of the weight, only cost tokens for the tools the model actually decides to invoke.
This is great. It's also not enough on its own, because the catalog list itself still scales with the number of tools you've enabled. The next two layers are where the real savings come from.
Layer 2: User-chosen permissions
This is the layer most people skip and shouldn't. ToolPiper has a permission settings panel where every MCP tool can be denied individually. Denied tools never get reported to Claude Code. They don't show up in the deferred list. They don't cost a single token.
[Screenshot: ToolPiper Settings, Tool Permissions panel, showing the per-tool toggle list]
The right strategy is honest self-assessment. Will you ever ask Claude Code to scrape Reddit during a coding session? Probably not. Run a pose-detection benchmark? Also probably not. Narrate a video? Definitely not. Disable those, and the catalog Claude Code sees shrinks by whatever you trimmed.
A practical baseline for a developer using Claude Code:
- Keep: file ops, git, shell, scrape, http_request, search_code, browser_*, screenshot, transcribe, the things you'd actually invoke during a coding session.
- Disable for now: video_*, pose_*, voice_clone, reddit_*, hn_*, gsc_*, queue_*, anything outreach-related. Flip them back on for a session when you actually need them.
[Screenshot: A trimmed permission list showing roughly 40 tools enabled instead of 147]
ToolPiper picks up the change without a restart and Claude Code reflects it on the next reconnect. Five minutes of setup, savings on every turn after.
Layer 3: ToolGate
This is where the engineering shows up.
ToolGate is ModelPiper's on-device gating engine. When a tool call goes through ToolPiper, ToolGate runs first and decides which subset of the enabled catalog to actually ship to the model for this specific query.
It composes three things in order:
1. Permissions pre-filter. The deny list from Layer 2 is enforced before anything else runs. Denied tools cannot ship under any circumstance. This is a hard guarantee, not a heuristic. If you turned a tool off, the model never sees it, period.
2. Tier-aware budgeting. The amount of tool catalog ToolGate is willing to ship is a function of your model's context window. A small local model with a tight context gets a lean tool budget and a small set of highly relevant tools. A long-context cloud model gets more headroom. The boundaries are set so that tools never starve the actual conversation of room. We also clamp against the physical context size of the loaded model, not the configured one, so a model with a smaller KV cache than its advertised window can't get blown out by a tool catalog that doesn't fit.
3. Semantic relevance. Inside whatever budget the tier allows, ToolGate selects the tools most relevant to the current query. It runs entirely on-device using a retrieval model optimized for the Apple Neural Engine. No prompt data leaves your machine. The selection happens in single-digit milliseconds.
The combined effect: out of 147 enabled tools, a typical query ships 5 to 10 to the model. The other 130-plus stay in the catalog, ready to be selected if a future query calls for them, but they don't cost tokens on the turns where they aren't relevant.
[Screenshot: ToolPiper logs viewer showing a toolgate.recommend log line with tier, tools_returned, total_tokens]
We don't surface the internals because the value is the result, not the recipe. What matters is that it's measurable. Every tool selection emits a log event with the tier, the budget, the count of tools returned, and the total tokens shipped. You can watch it run.
Schema engineering, not just schema selection
Selecting fewer tools is half the savings. The other half is making each shipped tool cost less.
Every MCP tool in ToolPiper is built with a compact schema mode. When ToolGate ships a tool under a tight tier, it strips schemas to the minimum the model needs to call them correctly: required parameters, type info, enum choices, and a one-line description. Optional documentation, examples, verbose enums, and edge-case fields are dropped. The tool still works. The model still picks the right arguments. The wire cost drops.
This is invisible to you and invisible to Claude Code. You get a reduced tool footprint without losing tool capability.
What a real Claude Code turn looks like
Putting all three layers together, here's what happens when you ask Claude Code to do something:
- Claude Code already has 147 deferred names in context. Cost: a couple thousand tokens.
- You ask: "scrape this URL and pull out the API examples."
- ToolGate runs on-device. Permissions filter applies. Tier is computed from the loaded model's physical context. Semantic match against the catalog returns the relevant subset, probably
scrape,read_file,write_file, and a handful of supporting tools. - Schemas are compacted to the minimum needed.
- Claude Code receives the tool definitions for that turn, runs them, and responds.
The catalog you don't use that turn isn't in the prompt. The schemas you don't need at full fidelity aren't in the prompt. The tools you've denied weren't in the catalog to begin with.
How to measure it on your machine
If you have ToolPiper running, open the logs viewer and watch for toolgate.recommend entries while you work. Each line tells you exactly how many tools shipped, the tier they were sized for, and the token count.
On a healthy setup against a moderately-sized local model, most queries return 5 to 10 tools and total token counts in the low thousands. If those numbers look high to you, work the layers in order. First check your ToolPiper permission list and trim what you don't use. Then look at your context-window setting. A larger configured context gives ToolGate more room to ship tools, which is great for capability but costs you on every turn. Pick a context size that matches what your work actually needs.
[Screenshot: Log viewer filtered to toolgate.* events showing several real recommendation lines]
Why we built it this way
The honest tension in MCP is that more tools means more capability and also more weight. The naive answer is "install fewer servers." Our answer is to gate intelligently, on-device, per turn, with the user holding the master switch.
If you're using Claude Code daily and ToolPiper is sitting next to it, you're already getting Layer 1 from Anthropic and Layer 3 from us automatically. The work that's left to you is Layer 2: open Settings, look at the permission list, and turn off the tools you'll never use. Everything else takes care of itself.
[Screenshot: Final view of the permission settings with a clean trimmed list]
Try it
ToolPiper is a free download from modelpiper.com/download. Install it, connect Claude Code to its MCP server, and the three-layer setup is running by default. The permission list is yours to tune.
