The Model Context Protocol (MCP) is becoming the standard interface between AI assistants and external tools. Most MCP servers are single-purpose: one for browser automation, one for database access, one for file management. ToolPiper takes a different approach — it ships 41 MCP tools spanning LLM inference, text-to-speech, speech-to-text, embeddings, OCR, vision, pose estimation, browser automation, RAG, and real-time stream processing. All from a single native macOS application, written entirely in Swift.

This paper describes the architecture decisions behind building a unified MCP server at this scale, the two-transport design that makes it work with every AI client, and the tool design principles we developed along the way.

Why One Server Instead of Many

The local AI MCP ecosystem is fragmented. A developer who wants local LLM inference, speech-to-text, and browser automation currently needs three separate MCP servers — typically three Node.js processes, three sets of dependencies, three configuration entries. The largest competitor in the space has roughly 139 GitHub stars. No existing server covers more than two of the capability categories ToolPiper provides.

Unification isn't just a convenience play. It enables workflows that fragmented servers can't support. A single MCP tool call can transcribe an audio file, pass the text to a local LLM for summarization, and speak the summary — all through one server process that shares model state, memory budgets, and authentication. There's no inter-process serialization, no conflicting port allocations, no version drift between components.

Two Transports: stdio and Streamable HTTP

MCP defines two transport mechanisms: stdio (the original, universally supported) and Streamable HTTP (the newer spec from March 2025). We implement both, sharing a single source of truth for tool definitions and handler logic.

stdio: The Universal Fallback

The stdio transport is a separate Swift executable (toolpiper-mcp) bundled inside the ToolPiper app at Contents/MacOS/toolpiper-mcp. It's a thin JSON-RPC bridge: it reads MCP requests from stdin, translates them to HTTP calls against ToolPiper's REST API on localhost:9998, and writes responses to stdout.

AI Client (Claude Code, Cursor, etc.)
    ↓ JSON-RPC (stdio)
toolpiper-mcp (Swift CLI)
    ↓ HTTP (localhost:9998)
ToolPiper.app (running, all backends available)

The CLI uses the official Swift MCP SDK (v0.11.0). Authentication is automatic — the CLI reads a session key from ~/Library/Application Support/ToolPiper/.session-key (written by ToolPiper on launch, 0600 permissions) and caches it for 30 seconds using an OSAllocatedUnfairLock for thread safety. Zero user configuration required.

On every app launch, ToolPiper installs a symlink at ~/.toolpiper/mcp pointing to the running binary. This handles app relocations and updates transparently. Users configure their AI client once:

claude mcp add toolpiper -- ~/.toolpiper/mcp

Streamable HTTP: Zero Extra Processes

For clients that support the newer HTTP transport, ToolPiper serves the MCP protocol directly — no CLI middleman, no extra process:

AI Client
    ↓ Streamable HTTP (POST localhost:9998/mcp)
ToolPiper.app (MCP server built-in)
    ↓
Backends (llama.cpp, FluidAudio, MLX Audio, Apple Intelligence, Vision, etc.)

MCPRoutes.swift implements the full JSON-RPC protocol over HTTP. Sessions are tracked via Mcp-Session-Id headers with a 30-minute reaper. The HTTP transport uses an in-process loopback client (MCPLoopbackClient) that calls the same handler functions as the stdio bridge — ensuring identical behavior across both transports.

Single Source of Truth: Shared Definitions and Handlers

With two transports that must behave identically, we needed a strict separation between tool definitions, handler logic, and transport mechanics. The solution is two shared files compiled by both targets:

MCPToolDefinitions.swift — an enum with static properties defining all 41 tools and 4 resources. Each definition includes the tool name, description, JSON Schema for inputs, and MCP annotations (readOnlyHint, idempotentHint, openWorldHint). The stdio target converts these to MCP SDK Tool types; the HTTP target converts them to [String: Any] dictionaries.
MCPToolHandlers.swift — an enum with all 41 tool handler functions and 4 resource handlers. Each handler takes parsed input parameters and an MCPTransportClient protocol conformance (which abstracts the difference between HTTP loopback and stdio-to-HTTP bridging). The handler code is identical regardless of transport.

This architecture means adding a new tool is a three-step process: define it in MCPToolDefinitions, implement the handler in MCPToolHandlers, done. Both transports pick it up automatically.

Tool Design: 105 Endpoints to 41 Tools

ToolPiper's REST API has over 105 HTTP endpoints. Exposing all of them as MCP tools would overwhelm any AI agent. We curated them down to 41 high-value tools organized in six tiers:

Tier	Category	Tools	Examples
1	Core AI	8	chat, transcribe, speak, embed, ocr, models, load_model, status
2	Advanced AI	5	analyze_image, analyze_text, rag_query, rag_collections, image_upscale
3	Browser	14	browser_snapshot, browser_action, browser_assert, browser_console, browser_record, browser_eval, browser_network, browser_storage, browser_performance, browser_coverage, browser_intercept, browser_webauthn, browser_autofill, browser_manage
4	PiperTest	6	test_list, test_get, test_save, test_delete, test_run, test_export
5	Pose & Stream	4	pose_detect, pose_formats, stream_start, stream_stop
6	Scrape & Detect	4	scrape, browser_detect, video_upscale, benchmark_upscale

The guiding principle: a good REST API is not a good MCP server. REST endpoints are fine-grained by design — separate calls for connect, navigate, click, type. MCP tools should be coarse-grained, wrapping entire flows that an AI agent would naturally perform as a single step.

For example, the browser_action tool accepts a selector and an action type (click, fill, select, hover, scroll, etc.) in a single call. The REST API underneath involves selector resolution, element location, input dispatch, and AX tree diffing — but the AI sees one tool with one clear purpose.

Output Formatting: Semantic Text, Not JSON

Every MCP tool returns semantic plain text, not raw JSON. This follows the pattern established by Playwright MCP and reflects a key insight: AI models process natural language better than nested JSON structures.

Confirmations are terse: "Done.", "Enabled.", "Deleted." — not {"success": true}
Accessibility trees render naturally: real newlines and indentation, not JSON-escaped \n
Wrappers are unwrapped: tools that call endpoints returning {"text": "..."} extract the inner value
Diffs use visual prefixes: + for added, - for removed, ~ for changed AX nodes
Structured data falls back to pretty JSON: model lists and storage dumps use indented, multi-line JSON when the data is inherently structured

Why Swift-Native

Most MCP servers are written in TypeScript/Node.js. We chose Swift for several reasons:

No runtime dependency. Users don't need Node.js, npm, or npx installed. The MCP binary is a compiled executable inside the app bundle.
Shared types with the host app. ToolPiper is a Swift macOS app. The MCP server shares its type definitions, JSON codable structs, and build system. When we add a field to a model config, both the app and the MCP server see it immediately — there's no TypeScript↔Swift translation layer to maintain.
Performance. The stdio bridge handles JSON-RPC parsing, HTTP bridging, and session key caching with no garbage collection pauses and predictable memory usage. For the HTTP transport, tool execution is in-process with zero serialization overhead.
Official SDK support. The Swift MCP SDK is production-ready and used by companies like MacPaw and Loopwork AI. Swift is a first-class citizen in the MCP ecosystem.

Error Messages as Marketing

Every error the MCP server produces is an opportunity. When ToolPiper isn't running and the CLI can't connect, the error message reads:

ToolPiper is not running. Download it at modelpiper.com

This is deliberate. The MCP server is the top of the adoption funnel — developers discover it through MCP directories, configure it in their AI client, and encounter ToolPiper when they first try to use it. The error IS the install prompt. All MCP tools are free tier to maximize this funnel. Pro upsell happens through the app UI, not MCP gating.

Annotations on Everything

MCP tool annotations help AI clients make better decisions about tool usage. We set three annotations on every tool:

readOnlyHint — true for query tools (chat, transcribe, status, snapshot), false for mutation tools (browser_action, test_save, load_model)
openWorldHint: false — on all tools, because everything runs locally. The AI knows these tools don't access external services or the internet (except browser tools, which interact with whatever page is open)
idempotentHint: true — on tools like load_model where calling twice with the same input produces the same result

Resources: Read-Only State

In addition to tools, the MCP server exposes 4 resources for AI clients that support the MCP resources spec:

status — server health, loaded backends, connected browser state
models — available model list with download status and capabilities
backends — inference backend states (llama.cpp, FluidAudio, MLX Audio, Apple Intelligence)
tests — saved PiperTest sessions for the testing feature

Resources are read-only snapshots. They complement tools by giving AI clients ambient context about system state without requiring explicit tool calls.

What We Learned

Building 41 MCP tools taught us several things that aren't obvious from the MCP specification alone:

Fewer tools are better. We started with plans for 60+ tools mirroring the REST API. Agents performed worse with more tools — they spent tokens reasoning about which tool to use instead of using the right one. Curation down to 41 improved task completion rates.
Descriptions matter more than schemas. The natural-language tool description drives AI tool selection more than the JSON Schema does. We write descriptions that say when to use the tool, not just what it does. "Run a prompt through a local LLM on this machine via ToolPiper" tells the AI this is for local inference, not cloud API calls.
Plain text output outperforms JSON. When tools returned raw JSON, AI models spent tokens parsing and explaining the structure. Plain text responses let the model incorporate results directly into its reasoning.
The shared-definition pattern scales. Two files (MCPToolDefinitions + MCPToolHandlers) compiled by two targets hasn't caused any maintenance burden through 41 tools. Adding a tool takes 10 minutes, and both transports work immediately.

ToolPiper's MCP server is available today. Install ToolPiper from modelpiper.com and configure your AI client with claude mcp add toolpiper -- ~/.toolpiper/mcp.

Building 41 MCP Tools in Swift: Architecture of a Unified Local AI Server