article2026-06-11by Ben Racicot

Why I Built a Suite of AI Tools for macOS

TL;DR

Local AI on Mac existed as a parts bin - Ollama for inference, Whisper.cpp for transcription, a Python TTS server, Docker for a chat UI, a shell script holding it together. Every update broke something. I spent a year assembling those capabilities into one native suite: ToolPiper bundles nine inference backends behind a localhost server and 300+ MCP tools. It's seven apps because macOS permission scoping won't let one process do screen recording, accessibility, microphone, and contacts at once. Native Swift, not Electron. Bundled llama.cpp, not just an Ollama wrapper. macOS-only because the capabilities that make it good are Apple Silicon-specific.

Why I Built a Suite of AI Tools for macOS

ModelPiper is a suite of seven apps that turns a Mac into a complete local AI workstation. The core, ToolPiper, bundles nine inference backends behind one localhost:9998 server and over 300 MCP tools - no cloud, no Docker, no Python, no API keys. It's macOS-only because the capabilities that make it good are Apple Silicon-specific. You can install the app and verify every claim yourself.

Your Mac has a chip that was designed to run neural networks. A dedicated Neural Engine, a GPU that shares memory with the CPU, hardware-accelerated matrix math on every core. Apple built the silicon. Then they left the software to someone else.

I spent a year building it.

What was wrong with local AI on Mac?

A year ago, my local AI setup looked like this. Ollama for inference. A separate Whisper.cpp install for transcription. A Python script wrapping a TTS model. Docker running Open WebUI for a chat interface. A shell script piping audio between them. Five tools, three programming languages, two package managers, and one Docker VM eating 4GB of RAM.

It worked the way duct tape works. Every update broke something. The transcription version expected a different Python than the TTS server. Docker Desktop pushed a release that changed its networking. Ollama changed an API response format and the shell script choked. I was spending more time maintaining the stack than using it.

The problem was clear. Local AI on Mac was a parts bin, not a product. Every capability existed somewhere. Nobody had assembled them into something you could install and use.

What's in the suite?

Seven apps in one monorepo. Four native macOS apps in Swift, one Angular web app, one browser extension, one IDE extension. They share a Swift package and a TypeScript layer, talk to each other over localhost HTTP, and collectively cover everything I was doing with that five-tool stack - plus things the stack couldn't do at all.

ToolPiper is the engine. It bundles nine AI backends: llama.cpp for language models on the Metal GPU, FluidAudio for speech-to-text and text-to-speech on the Neural Engine, Apple Vision for OCR and pose estimation, MLX Audio for high-quality TTS, Apple NL Embedding for zero-setup vector search, and a custom CoreML model (PiperSR) for image and video upscale on the Neural Engine. It runs an HTTP server on port 9998 that exposes all of these as REST APIs, plus an MCP server with over 300 tools. One command gives Claude Code access to local inference, browser automation, desktop control, OCR, document search, and video editing.

ModelPiper is the visual interface. An Angular web app that connects to ToolPiper: chat, a visual pipeline builder where you drag blocks and connect models into multi-step workflows, image tools, a test authoring suite, an agent workspace, and API docs. It's how you interact with everything ToolPiper does.

VisionPiper is screen capture. A menu bar app that captures movable regions, records video, converts to GIF, and streams frames over WebSocket. Built because every screen capture tool I tried either couldn't stream or couldn't record a movable region.

AudioPiper is audio capture. A multi-source mixer that records from mics, system audio, and individual apps via Core Audio Taps. It exists because I needed to pipe audio from a browser tab into a transcription model, and macOS doesn't do per-app capture without a kernel extension.

VisionPiper, AudioPiper, and system control fold the rest in. ToolPiper carries 26 action domains (around 142 actions) that AI agents can execute: window management, display control, Bluetooth, calendar, contacts, Finder, Spotlight, input simulation. Push-to-talk dictation on the right Option key and voice commands on the right Command key route through speech-to-text, then a language model, then action execution. That's the piece that turns a model into an operating system interface.

MediaPiper is a browser extension for Chrome, Firefox, and Safari. Full-size image hover previews with CDN pattern detection, because I kept opening images in new tabs to see them at full resolution hundreds of times a day.

CodePiper is a VS Code extension. A fork of Continue.dev (Apache 2.0) that connects to ToolPiper as the backend - local AI coding assistance without sending your code to a cloud API.

Why a suite instead of one app?

The honest answer: because the capabilities don't fit in one process.

Audio capture needs a background daemon with real-time Core Audio access. Screen capture needs a menu bar app with screen recording permission. System actions need an accessibility-privileged process. Browser integration needs an extension. IDE integration needs a different extension. The inference engine needs to manage GPU memory and model lifecycle.

macOS has strong process boundaries and permission scopes. An app that requested accessibility, screen recording, microphone, contacts, calendar, and automation all at once would be a security review nightmare, and Apple would reject it from the App Store. Separate apps, each with the minimum permissions for its function, is how macOS is designed to work.

They share a common Swift package (PiperKit) for networking, authentication, logging, and UI components, and they discover each other over localhost. From your perspective, ToolPiper is the one app you install. The others are specialized tools you add when you need them.

Why Swift instead of Electron?

Every app is native. ToolPiper uses about 50MB of resident memory before it loads a model. Electron would have been faster to build and would have cost 200MB or more of RAM that models need. On an 8GB Mac, that difference decides whether a 7B model fits or swaps to disk. Compare that to Docker Desktop sitting at 2-4GB before you've run anything.

Why bundle llama.cpp instead of wrapping Ollama?

ToolPiper doesn't just connect to Ollama. It bundles llama.cpp directly. Same engine, same GGUF models, same Metal GPU speed - within a few percent of standalone Ollama on identical models. But it also connects to Ollama as a backend if you already have models there.

The point was never to replace Ollama. The point is that local AI shouldn't require installing a separate model server, configuring CORS, and hoping the versions stay compatible. It should be: install the app, a model downloads, you're chatting.

Why build on MCP?

The Model Context Protocol turned out to be the most important architectural decision in the whole project. Over 300 tools exposed over stdio and HTTP mean Claude Code, Cursor, Windsurf, or any MCP client gets access to everything ToolPiper can do: browser automation over Chrome DevTools Protocol, OCR, document search, image upscale, video editing, pose estimation, system actions. One protocol, every backend.

Building ToolPiper as an MCP server made it immediately useful to the developer tools ecosystem without writing a separate integration for each tool. That was worth more than building a great chat UI.

What does it look like in practice?

A voice chat pipeline is three blocks in the visual builder: microphone, then speech-to-text (Parakeet v3 on the Neural Engine), then a language model, then text-to-speech (PocketTTS on the Neural Engine), then the speaker. Round-trip latency is about 1.5 seconds with a 3B model on an M2 Max. Not fast enough to replace a phone call. Fast enough to brainstorm while cooking.

A document Q&A pipeline: drop a folder of PDFs. OCR extracts text, Apple NL Embedding generates vectors with zero setup, and an HNSW index with BM25 hybrid retrieval handles the search. Ask questions in natural language, get answers with source citations. All on-device, works on a plane. Local RAG over your files is one of the three Pro features at $10/month. The OCR that feeds it is free.

A test session in PiperTest: describe what you want to test in a chat. The AI inspects your live app through Chrome DevTools Protocol, navigates pages, fills forms, clicks buttons, and records the interaction as structured steps. When selectors break, it re-inspects the accessibility tree and fixes them. Export to Playwright or Cypress when you're ready for CI. The browser primitives underneath are free: snapshots, the Chrome DevTools Protocol, the accessibility tree, and browser automation. PiperTest itself is Max at $49/month, and that is what buys the test format, the runner, the self-healing, and the export.

A voice command: hold the right Command key and say "turn down the brightness and open Safari to my email." Speech-to-text transcribes it, the model parses intent against 142 action definitions, and the router dispatches a brightness change and an app launch. A notification confirms. No typing.

How does this compare to Ollama?

Ollama is good infrastructure. A Go binary that serves models over a REST API, works on every platform, and has broad third-party support. But it's a model runner, not a model platform.

I wrote a full comparison and tested every Ollama frontend on Mac. The short version: if you need cross-platform, server deployment, or Docker orchestration, Ollama wins. If you're on a Mac and want voice, vision, pipelines, resource monitoring, MCP tools, or anything beyond text chat, that's the gap this suite fills. ToolPiper connects to Ollama as a backend, so they coexist without friction.

What are the limitations?

It's macOS only. No Linux, no Windows. The capabilities that make it good - Neural Engine, Metal, Core Audio Taps, IOKit, Apple Vision - are Apple-specific. A cross-platform version would be a different, lesser product.

It's a lot of surface area. Seven apps, over 300 MCP tools, nine inference backends, a pipeline builder, a test suite, an agent workspace. The basic chat flow is three clicks, but the full platform takes time to explore.

It's newer than the alternatives. Open WebUI has a larger community. LM Studio has broader platform support. Ollama has deeper third-party integration. The documentation and the community are still growing.

Local models are not cloud models. A 7B model on your Mac doesn't match a frontier cloud model on hard reasoning. Voice chat has a 1.5-second round-trip, not the sub-second response of a cloud voice mode. Local AI is good enough for a wide range of tasks, runs with complete privacy, and costs nothing per query. It is not a blanket replacement for cloud AI.

I built these tools because I wanted them to exist. A year in, they do. Download ToolPiper at modelpiper.com. The free tier covers 330 of ToolPiper's 387 tools with no account and no caps: chat, all nine local inference backends, all speech (transcription, text-to-speech, voice cloning, push-to-talk dictation, voice chat), all the Apple Intelligence and Vision tools, full browser automation, the developer API and tokens, the MCP server, system control, the visual pipeline builder, and the full model runner with unlimited downloads. Pro is $10/month for exactly three things: local RAG over your files, web scraping with YouTube transcripts, and a cloud API proxy for your own keys. Studio is $29/month for image and video upscaling, the video pipeline, pose detection, and the outreach toolkit. Max is $49/month for CodePiper, PiperTest, and API discovery and replay.

The Seven Apps and What Each One Does

App	Platform	Role	Built On
ToolPiper	macOS (Swift)	Inference engine, HTTP + MCP server	llama.cpp, FluidAudio, Apple Vision, MLX, CoreML
ModelPiper	Web (Angular)	Chat, visual pipeline builder, test suite	ToolPiper REST API
VisionPiper	macOS (Swift)	Screen capture, recording, GIF, frame streaming	ScreenCaptureKit, WebSocket
AudioPiper	macOS (Swift)	Multi-source audio mixer and recorder	Core Audio Taps
MediaPiper	Browser extension	Full-size image hover previews	Chrome MV3, Firefox MV2, Safari
CodePiper	VS Code extension	Local AI coding assistant	Continue.dev fork (Apache 2.0)

Frequently Asked Questions

Why is ModelPiper seven separate apps instead of one?

macOS permission scoping. A single app that requested accessibility, screen recording, microphone, contacts, calendar, and automation permissions all at once would fail App Store review and present a large security surface. Separate apps, each with the minimum permissions for its function, is how macOS is designed to work. They share a common Swift package for networking, auth, and logging, and discover each other over localhost. ToolPiper is the one app you install - the others are specialized tools you add when you need them.

Does ToolPiper replace Ollama?

It can, but it doesn't have to. ToolPiper bundles llama.cpp directly, so it runs the same GGUF models at the same Metal GPU speed without a separate server, CORS configuration, or Docker. It also connects to Ollama as a backend if you already have models there. The two coexist without friction - most people who try both end up using ToolPiper's interface with their existing Ollama models.

Why is ModelPiper macOS only?

The capabilities that make it good are Apple Silicon-specific: the Neural Engine for speech-to-text, text-to-speech, and image upscale; Metal for language model inference; Core Audio Taps for per-app audio capture; IOKit for GPU monitoring; Apple Vision for OCR and pose estimation. These are not portable APIs. A cross-platform version would be a different, lesser product.

What does the free tier include?

330 of ToolPiper's 387 tools, with no account and no caps. That covers chat and all nine local inference backends, all speech (transcription, text-to-speech, voice cloning, push-to-talk dictation, voice chat), all the Apple Intelligence, Vision, OCR, and NaturalLanguage tools, full browser automation over the Chrome DevTools Protocol and the accessibility tree, the developer API and tokens, system control and actions, clipboard and snippets, the visual pipeline builder, the MCP server, and the full local model runner with unlimited downloads and multi-model loading. The HTTP API on localhost:9998 is free too. Pro is $10/month for exactly three things: local RAG over your files, web scraping with YouTube transcripts, and a cloud API proxy for your own keys. Image and video upscaling, the video pipeline, pose detection, and the outreach toolkit are Studio at $29/month. CodePiper, PiperTest, and API discovery and replay are Max at $49/month.

Why native Swift instead of Electron?

Memory. ToolPiper uses about 50MB of resident memory before loading a model. An Electron build would cost 200MB or more before any model loads. On an 8GB Mac, that difference decides whether a 7B model fits in unified memory or swaps to disk and grinds to single-digit tokens per second.

Local AImacOSApple SiliconMCPSwiftPrivacy

Why PiperKit ExistsThe market case: cloud AI lost the trust argument, open source closed the quality gap, the next chips finish the job Best Ollama Frontend for Mac: Every GUI Option ComparedHow ToolPiper stacks up against every Ollama GUI on Mac Ollama vs ToolPiper: Architecture ComparisonModel runner versus model platform - where each one wins Why Voice AI Should Stay LocalThe privacy and latency case for on-device voice