ToolPiper bundles local inference, 104 MCP tools, browser automation, voice AI, RAG, and video upscale into one native macOS app. No cloud, no Docker, no Python.
Nine inference backends on Apple Silicon — llama.cpp on Metal GPU, Apple Intelligence on Neural Engine, FluidAudio for speech, MLX Audio for voice synthesis, Apple Vision for OCR and pose detection. One install replaces Ollama + Open WebUI + Playwright MCP + a filesystem MCP + LangChain.
Run any GGUF model. Qwen 3.5, Llama 3.2, DeepSeek R1. 30+ tok/s on M2 Air. Flash attention, speculative decoding, and Jinja templates out of the box.
Neural Engine inference. Summarization, rewriting, Smart Reply. Zero GPU impact — runs on dedicated ML hardware alongside your other models.
Whisper-class transcription at 210x realtime. PocketTTS with zero GPU impact. Both run on the Neural Engine, leaving Metal GPU free for LLM inference.
Soprano, Orpheus, voice cloning via Qwen3 TTS on Metal GPU. High-quality voice synthesis with support for multiple speakers and styles.
On-device text extraction from images and PDFs. No model download needed — uses the Vision framework built into macOS.
Real-time RAM monitoring via proc_pid_rusage. Automatic model eviction under memory pressure. Your Mac stays responsive without manual intervention.
chat, transcribe, speak, embed, ocr, analyze_image, analyze_text, load_model. All inference runs on your Mac's Neural Engine and Metal GPU.
AX-native selectors, self-healing, 7 assertion types, visual recording, network interception, code coverage, WebAuthn, and autofill testing.
Visual test format with self-healing selectors. Record, run, heal, and export tests as Playwright or Cypress code for your CI pipeline.
26 action domains via ActionPiper integration. Windows, audio, display, network, Bluetooth, Dock, Spaces, Finder, calendar, contacts, and more.
AI-driven screenplay-to-MP4 pipeline. Generate screenplays, rehearse, record, render, narrate, and edit — all through MCP tools.
Local embeddings, vector search with HNSW index, web scraping in 7 formats with 16-framework detection. Build knowledge bases without cloud APIs.
Drop-in replacement for OpenAI SDK. Change base_url to localhost:9998, that's it. Streaming and non-streaming. Works with LangChain, LlamaIndex, Continue.dev, Aider, and anything that accepts a custom OpenAI base URL.
Local vector embeddings for RAG pipelines. Apple NL embedding (zero-setup, 512-dim) or llama.cpp embedding models. Content-addressed cache for repeat queries.
Text-to-speech synthesis. Three engines (FluidAudio, MLX Audio, PocketTTS), eight voices, all on-device. Same API format as OpenAI's TTS endpoint.
Speech-to-text transcription at 210x realtime on Neural Engine. Parakeet V3 model. Same API format as OpenAI's Whisper endpoint.
Route cloud requests through ToolPiper with Keychain key injection. Your API keys never appear in code or .env files. One base URL handles both local and cloud models.
tp_dev_<64hex> tokens for team sharing and CI pipelines. SHA-256 hashed, stored in macOS Keychain. Works as api_key in any OpenAI SDK.
Query Chrome's real accessibility tree via CDP's Accessibility.queryAXTree. Not a DOM simulation, not injected JavaScript. The actual computed AX tree that screen readers consume.
3 modes — passive, fuzzy AX match, AI-assisted. Broken selectors repair in 5-15ms with zero external calls. Free, not a paid add-on.
Browse your app normally. Every interaction becomes an AX-enriched test step with element metadata, page context, and a mutation diff showing what changed.
Deterministic, idiomatic code for CI. AX selectors map to each framework's native format: role:button:Sign In becomes page.getByRole('button', { name: 'Sign In' }).
Parakeet V3 at 210x realtime on Neural Engine. Whisper-class accuracy with zero GPU impact. Transcribe meetings, lectures, and voice memos locally.
Three engines, eight voices, all on-device. FluidAudio on Neural Engine for low-latency. MLX Audio on Metal GPU for studio-quality synthesis.
Clone any voice from a 10-second sample via Qwen3 TTS on Metal GPU. Create custom voices for narration, accessibility, or creative projects.
PiperSR at 44.4 FPS on Apple Neural Engine. Double-buffered ANE+Metal pipeline. Real-time 2x upscale from 360p to 720p.
Right Option = dictate anywhere (STT to paste). Right Command = voice command (STT to LLM with tool definitions to ActionRouter). System-wide hotkeys.
CDP-based scraper using a real browser. 16-framework detection, readiness-aware extraction, 7 output formats: markdown, text, readability, AX tree, HTML, links, screenshot.
| ToolPiper | Ollama | LM Studio | Open WebUI | |
|---|---|---|---|---|
| Setup | One app from the Mac App Store | Homebrew + CLI | DMG download | Docker + Ollama required |
| Inference engine | llama.cpp + 8 other backends | llama.cpp only | llama.cpp only | No engine (proxies to Ollama) |
| MCP tools | 104 tools, stdio + HTTP transports | None | None | None |
| Browser automation | 14 CDP tools with AX selectors | None | None | None |
| Voice AI | STT + TTS + voice cloning | None | None | None |
| Resource monitoring | Real-time RAM, automatic eviction | None (reads RAM once at startup) | Removed in v4.0 | None (most-requested feature) |
| API compatibility | OpenAI-compatible (chat, embed, speech, transcribe) | OpenAI-compatible (chat, embed) | OpenAI-compatible (chat) | Web UI only |
| Cloud proxy | Keychain key injection, one base URL | None | None | None |
| Testing | PiperTest with self-healing + Playwright/Cypress export | None | None | None |
| Price | Free / $9.99 Pro | Free | Free | Free |
Download from the Mac App Store. Launch. A starter model downloads automatically.
Open ModelPiper in your browser, or run claude mcp add toolpiper -- ~/.toolpiper/mcp for MCP.
Chat, transcribe, automate browsers, run tests, build agents — all on localhost.
All inference runs on your Mac's GPU and Neural Engine. No data leaves your machine.
llama.cpp, Apple Intelligence, FluidAudio, MLX Audio, Apple Vision OCR, CoreML upscale — coordinated by one process.
OpenAI-compatible API, Model Context Protocol, GGUF models, Playwright/Cypress export. No proprietary lock-in.
Start free. Upgrade if you want Pro features.
forever
/month
No. ToolPiper includes the same llama.cpp engine that Ollama uses, plus eight additional backends. It runs the same GGUF models at the same speed. If you already use Ollama, ModelPiper can connect to it as an external provider — but ToolPiper replaces the entire Ollama + Open WebUI stack with zero configuration.
Any GGUF model from HuggingFace (135,000+ available). ToolPiper also ships curated presets tested on Apple Silicon: Qwen 3.5, Llama 3.2, DeepSeek R1, Phi-4, Gemma 3, and more. Each preset shows exact RAM usage so you never download something your Mac can't run.
Yes. One command: claude mcp add toolpiper -- ~/.toolpiper/mcp. Restart Claude Code and all 104 tools are available. Also works with Cursor, Windsurf, and any MCP-compatible client via stdio or Streamable HTTP transport.
8GB minimum for small models (0.8B-3B). 16GB recommended for the mainstream sweet spot (7B-8B models alongside normal app usage). 32GB opens up 14B models and multi-model workflows. ToolPiper's resource intelligence monitors RAM in real-time and automatically evicts models under memory pressure.
Yes. All 104 MCP tools, all inference backends, browser automation, PiperTest, self-healing, assertions, Playwright/Cypress export, voice AI, RAG, and the OpenAI-compatible API are included in the free tier. ToolPiper Pro ($9.99/month) adds cloud API proxy, developer tokens, Video Creator, and priority model downloads.
Yes, completely. All inference runs on your Mac's hardware. Once models are downloaded, everything works without an internet connection — chat, transcribe, speak, embed, OCR, browser automation, testing, and desktop control. The only features that require network access are cloud API proxy and the social/research tools (GitHub, Hacker News, Reddit).
One app. Nine backends. 104 tools. Zero configuration.