ModelPiper is a suite of seven apps that turns a Mac into a complete local AI workstation. The core, ToolPiper, bundles nine inference backends behind one localhost:9998 server and over 300 MCP tools - no cloud, no Docker, no Python, no API keys. It's macOS-only because the capabilities that make it good are Apple Silicon-specific. You can install the app and verify every claim yourself.
Your Mac has a chip that was designed to run neural networks. A dedicated Neural Engine, a GPU that shares memory with the CPU, hardware-accelerated matrix math on every core. Apple built the silicon. Then they left the software to someone else.
I spent a year building it.
What was wrong with local AI on Mac?
A year ago, my local AI setup looked like this. Ollama for inference. A separate Whisper.cpp install for transcription. A Python script wrapping a TTS model. Docker running Open WebUI for a chat interface. A shell script piping audio between them. Five tools, three programming languages, two package managers, and one Docker VM eating 4GB of RAM.
It worked the way duct tape works. Every update broke something. The transcription version expected a different Python than the TTS server. Docker Desktop pushed a release that changed its networking. Ollama changed an API response format and the shell script choked. I was spending more time maintaining the stack than using it.
The problem was clear. Local AI on Mac was a parts bin, not a product. Every capability existed somewhere. Nobody had assembled them into something you could install and use.
What's in the suite?
Seven apps in one monorepo. Four native macOS apps in Swift, one Angular web app, one browser extension, one IDE extension. They share a Swift package and a TypeScript layer, talk to each other over localhost HTTP, and collectively cover everything I was doing with that five-tool stack - plus things the stack couldn't do at all.
ToolPiper is the engine. It bundles nine AI backends: llama.cpp for language models on the Metal GPU, FluidAudio for speech-to-text and text-to-speech on the Neural Engine, Apple Vision for OCR and pose estimation, MLX Audio for high-quality TTS, Apple NL Embedding for zero-setup vector search, and a custom CoreML model (PiperSR) for image and video upscale on the Neural Engine. It runs an HTTP server on port 9998 that exposes all of these as REST APIs, plus an MCP server with over 300 tools. One command gives Claude Code access to local inference, browser automation, desktop control, OCR, document search, and video editing.
ModelPiper is the visual interface. An Angular web app that connects to ToolPiper: chat, a visual pipeline builder where you drag blocks and connect models into multi-step workflows, image tools, a test authoring suite, an agent workspace, and API docs. It's how you interact with everything ToolPiper does.
VisionPiper is screen capture. A menu bar app that captures movable regions, records video, converts to GIF, and streams frames over WebSocket. Built because every screen capture tool I tried either couldn't stream or couldn't record a movable region.
AudioPiper is audio capture. A multi-source mixer that records from mics, system audio, and individual apps via Core Audio Taps. It exists because I needed to pipe audio from a browser tab into a transcription model, and macOS doesn't do per-app capture without a kernel extension.
VisionPiper, AudioPiper, and system control fold the rest in. ToolPiper carries 26 action domains (around 142 actions) that AI agents can execute: window management, display control, Bluetooth, calendar, contacts, Finder, Spotlight, input simulation. Push-to-talk dictation on the right Option key and voice commands on the right Command key route through speech-to-text, then a language model, then action execution. That's the piece that turns a model into an operating system interface.
MediaPiper is a browser extension for Chrome, Firefox, and Safari. Full-size image hover previews with CDN pattern detection, because I kept opening images in new tabs to see them at full resolution hundreds of times a day.
CodePiper is a VS Code extension. A fork of Continue.dev (Apache 2.0) that connects to ToolPiper as the backend - local AI coding assistance without sending your code to a cloud API.
Why a suite instead of one app?
The honest answer: because the capabilities don't fit in one process.
Audio capture needs a background daemon with real-time Core Audio access. Screen capture needs a menu bar app with screen recording permission. System actions need an accessibility-privileged process. Browser integration needs an extension. IDE integration needs a different extension. The inference engine needs to manage GPU memory and model lifecycle.
macOS has strong process boundaries and permission scopes. An app that requested accessibility, screen recording, microphone, contacts, calendar, and automation all at once would be a security review nightmare, and Apple would reject it from the App Store. Separate apps, each with the minimum permissions for its function, is how macOS is designed to work.
They share a common Swift package (PiperKit) for networking, authentication, logging, and UI components, and they discover each other over localhost. From your perspective, ToolPiper is the one app you install. The others are specialized tools you add when you need them.
Why Swift instead of Electron?
Every app is native. ToolPiper uses about 50MB of resident memory before it loads a model. Electron would have been faster to build and would have cost 200MB or more of RAM that models need. On an 8GB Mac, that difference decides whether a 7B model fits or swaps to disk. Compare that to Docker Desktop sitting at 2-4GB before you've run anything.
Why bundle llama.cpp instead of wrapping Ollama?
ToolPiper doesn't just connect to Ollama. It bundles llama.cpp directly. Same engine, same GGUF models, same Metal GPU speed - within a few percent of standalone Ollama on identical models. But it also connects to Ollama as a backend if you already have models there.
The point was never to replace Ollama. The point is that local AI shouldn't require installing a separate model server, configuring CORS, and hoping the versions stay compatible. It should be: install the app, a model downloads, you're chatting.
Why build on MCP?
The Model Context Protocol turned out to be the most important architectural decision in the whole project. Over 300 tools exposed over stdio and HTTP mean Claude Code, Cursor, Windsurf, or any MCP client gets access to everything ToolPiper can do: browser automation over Chrome DevTools Protocol, OCR, document search, image upscale, video editing, pose estimation, system actions. One protocol, every backend.
Building ToolPiper as an MCP server made it immediately useful to the developer tools ecosystem without writing a separate integration for each tool. That was worth more than building a great chat UI.
What does it look like in practice?
A voice chat pipeline is three blocks in the visual builder: microphone, then speech-to-text (Parakeet v3 on the Neural Engine), then a language model, then text-to-speech (PocketTTS on the Neural Engine), then the speaker. Round-trip latency is about 1.5 seconds with a 3B model on an M2 Max. Not fast enough to replace a phone call. Fast enough to brainstorm while cooking.
A document Q&A pipeline: drop a folder of PDFs. OCR extracts text, Apple NL Embedding generates vectors with zero setup, and an HNSW index with BM25 hybrid retrieval handles the search. Ask questions in natural language, get answers with source citations. All on-device, works on a plane.
A test session in PiperTest: describe what you want to test in a chat. The AI inspects your live app through Chrome DevTools Protocol, navigates pages, fills forms, clicks buttons, and records the interaction as structured steps. When selectors break, it re-inspects the accessibility tree and fixes them. Export to Playwright or Cypress when you're ready for CI.
A voice command: hold the right Command key and say "turn down the brightness and open Safari to my email." Speech-to-text transcribes it, the model parses intent against 142 action definitions, and the router dispatches a brightness change and an app launch. A notification confirms. No typing.
How does this compare to Ollama?
Ollama is good infrastructure. A Go binary that serves models over a REST API, works on every platform, and has broad third-party support. But it's a model runner, not a model platform.
I wrote a full comparison and tested every Ollama frontend on Mac. The short version: if you need cross-platform, server deployment, or Docker orchestration, Ollama wins. If you're on a Mac and want voice, vision, pipelines, resource monitoring, MCP tools, or anything beyond text chat, that's the gap this suite fills. ToolPiper connects to Ollama as a backend, so they coexist without friction.
What are the limitations?
It's macOS only. No Linux, no Windows. The capabilities that make it good - Neural Engine, Metal, Core Audio Taps, IOKit, Apple Vision - are Apple-specific. A cross-platform version would be a different, lesser product.
It's a lot of surface area. Seven apps, over 300 MCP tools, nine inference backends, a pipeline builder, a test suite, an agent workspace. The basic chat flow is three clicks, but the full platform takes time to explore.
It's newer than the alternatives. Open WebUI has a larger community. LM Studio has broader platform support. Ollama has deeper third-party integration. The documentation and the community are still growing.
Local models are not cloud models. A 7B model on your Mac doesn't match a frontier cloud model on hard reasoning. Voice chat has a 1.5-second round-trip, not the sub-second response of a cloud voice mode. Local AI is good enough for a wide range of tasks, runs with complete privacy, and costs nothing per query. It is not a blanket replacement for cloud AI.
I built these tools because I wanted them to exist. A year in, they do. Download ToolPiper at modelpiper.com. The free tier covers chat, transcription, basic pipelines, and the full model runner with no account required. Pro is $10/month for everything else.
