Free on Mac App Store

Your Mac is an AI platform.

ToolPiper bundles local inference, 104 MCP tools, browser automation, voice AI, RAG, and video upscale into one native macOS app. No cloud, no Docker, no Python.

Nine inference backends on Apple Silicon — llama.cpp on Metal GPU, Apple Intelligence on Neural Engine, FluidAudio for speech, MLX Audio for voice synthesis, Apple Vision for OCR and pose detection. One install replaces Ollama + Open WebUI + Playwright MCP + a filesystem MCP + LangChain.

9 Backends

Local AI Inference

llama.cpp on Metal GPU

Run any GGUF model. Qwen 3.5, Llama 3.2, DeepSeek R1. 30+ tok/s on M2 Air. Flash attention, speculative decoding, and Jinja templates out of the box.

Apple Intelligence

Neural Engine inference. Summarization, rewriting, Smart Reply. Zero GPU impact — runs on dedicated ML hardware alongside your other models.

FluidAudio STT/TTS

Whisper-class transcription at 210x realtime. PocketTTS with zero GPU impact. Both run on the Neural Engine, leaving Metal GPU free for LLM inference.

MLX Audio TTS

Soprano, Orpheus, voice cloning via Qwen3 TTS on Metal GPU. High-quality voice synthesis with support for multiple speakers and styles.

Apple Vision OCR

On-device text extraction from images and PDFs. No model download needed — uses the Vision framework built into macOS.

Resource Intelligence

Real-time RAM monitoring via proc_pid_rusage. Automatic model eviction under memory pressure. Your Mac stays responsive without manual intervention.

Works with Claude Code, Cursor, Windsurf

104 MCP Tools

Core AI (8 tools)

chat, transcribe, speak, embed, ocr, analyze_image, analyze_text, load_model. All inference runs on your Mac's Neural Engine and Metal GPU.

Browser Automation (14 tools)

AX-native selectors, self-healing, 7 assertion types, visual recording, network interception, code coverage, WebAuthn, and autofill testing.

PiperTest (6 tools)

Visual test format with self-healing selectors. Record, run, heal, and export tests as Playwright or Cypress code for your CI pipeline.

Desktop Control (29 tools)

26 action domains via ActionPiper integration. Windows, audio, display, network, Bluetooth, Dock, Spaces, Finder, calendar, contacts, and more.

Video Creator (12 tools)

AI-driven screenplay-to-MP4 pipeline. Generate screenplays, rehearse, record, render, narrate, and edit — all through MCP tools.

RAG & Knowledge (5 tools)

Local embeddings, vector search with HNSW index, web scraping in 7 formats with 16-framework detection. Build knowledge bases without cloud APIs.

Drop-in Replacement

OpenAI-Compatible API

/v1/chat/completions

Drop-in replacement for OpenAI SDK. Change base_url to localhost:9998, that's it. Streaming and non-streaming. Works with LangChain, LlamaIndex, Continue.dev, Aider, and anything that accepts a custom OpenAI base URL.

/v1/embeddings

Local vector embeddings for RAG pipelines. Apple NL embedding (zero-setup, 512-dim) or llama.cpp embedding models. Content-addressed cache for repeat queries.

/v1/audio/speech

Text-to-speech synthesis. Three engines (FluidAudio, MLX Audio, PocketTTS), eight voices, all on-device. Same API format as OpenAI's TTS endpoint.

/v1/audio/transcriptions

Speech-to-text transcription at 210x realtime on Neural Engine. Parakeet V3 model. Same API format as OpenAI's Whisper endpoint.

Cloud API Proxy

Route cloud requests through ToolPiper with Keychain key injection. Your API keys never appear in code or .env files. One base URL handles both local and cloud models.

Developer Tokens

tp_dev_<64hex> tokens for team sharing and CI pipelines. SHA-256 hashed, stored in macOS Keychain. Works as api_key in any OpenAI SDK.

AX-Native

Browser Automation & Testing

AX-Native Selectors

Query Chrome's real accessibility tree via CDP's Accessibility.queryAXTree. Not a DOM simulation, not injected JavaScript. The actual computed AX tree that screen readers consume.

Self-Healing

3 modes — passive, fuzzy AX match, AI-assisted. Broken selectors repair in 5-15ms with zero external calls. Free, not a paid add-on.

Visual Recording

Browse your app normally. Every interaction becomes an AX-enriched test step with element metadata, page context, and a mutation diff showing what changed.

Export to Playwright/Cypress

Deterministic, idiomatic code for CI. AX selectors map to each framework's native format: role:button:Sign In becomes page.getByRole('button', { name: 'Sign In' }).

On-Device

Voice AI & Media

Speech-to-Text

Parakeet V3 at 210x realtime on Neural Engine. Whisper-class accuracy with zero GPU impact. Transcribe meetings, lectures, and voice memos locally.

Text-to-Speech

Three engines, eight voices, all on-device. FluidAudio on Neural Engine for low-latency. MLX Audio on Metal GPU for studio-quality synthesis.

Voice Cloning

Clone any voice from a 10-second sample via Qwen3 TTS on Metal GPU. Create custom voices for narration, accessibility, or creative projects.

Video Upscale

PiperSR at 44.4 FPS on Apple Neural Engine. Double-buffered ANE+Metal pipeline. Real-time 2x upscale from 360p to 720p.

Push-to-Talk

Right Option = dictate anywhere (STT to paste). Right Command = voice command (STT to LLM with tool definitions to ActionRouter). System-wide hotkeys.

Web Scraping

CDP-based scraper using a real browser. 16-framework detection, readiness-aware extraction, 7 output formats: markdown, text, readability, AX tree, HTML, links, screenshot.

ToolPiper vs The Alternatives

ToolPiperOllamaLM StudioOpen WebUI
SetupOne app from the Mac App StoreHomebrew + CLIDMG downloadDocker + Ollama required
Inference enginellama.cpp + 8 other backendsllama.cpp onlyllama.cpp onlyNo engine (proxies to Ollama)
MCP tools104 tools, stdio + HTTP transportsNoneNoneNone
Browser automation14 CDP tools with AX selectorsNoneNoneNone
Voice AISTT + TTS + voice cloningNoneNoneNone
Resource monitoringReal-time RAM, automatic evictionNone (reads RAM once at startup)Removed in v4.0None (most-requested feature)
API compatibilityOpenAI-compatible (chat, embed, speech, transcribe)OpenAI-compatible (chat, embed)OpenAI-compatible (chat)Web UI only
Cloud proxyKeychain key injection, one base URLNoneNoneNone
TestingPiperTest with self-healing + Playwright/Cypress exportNoneNoneNone
PriceFree / $9.99 ProFreeFreeFree

How It Works

1

Install

Download from the Mac App Store. Launch. A starter model downloads automatically.

2

Connect

Open ModelPiper in your browser, or run claude mcp add toolpiper -- ~/.toolpiper/mcp for MCP.

3

Build

Chat, transcribe, automate browsers, run tests, build agents — all on localhost.

Everything Stays Local

All inference runs on your Mac's GPU and Neural Engine. No data leaves your machine.

9 Backends, 1 App

llama.cpp, Apple Intelligence, FluidAudio, MLX Audio, Apple Vision OCR, CoreML upscale — coordinated by one process.

Open Standards

OpenAI-compatible API, Model Context Protocol, GGUF models, Playwright/Cypress export. No proprietary lock-in.

Simple Pricing

Start free. Upgrade if you want Pro features.

Free
$0

forever

  • All 104 MCP tools
  • Local inference (all backends)
  • Browser automation & PiperTest
  • OpenAI-compatible API
  • Voice AI (STT + TTS)
  • RAG & embeddings
Pro
$9.99

/month

  • Everything in Free
  • Cloud API proxy with Keychain keys
  • Developer tokens
  • Video Creator pipeline
  • Priority model downloads
Download ToolPiper

Frequently Asked Questions

Do I need Ollama?

No. ToolPiper includes the same llama.cpp engine that Ollama uses, plus eight additional backends. It runs the same GGUF models at the same speed. If you already use Ollama, ModelPiper can connect to it as an external provider — but ToolPiper replaces the entire Ollama + Open WebUI stack with zero configuration.

What models can I run?

Any GGUF model from HuggingFace (135,000+ available). ToolPiper also ships curated presets tested on Apple Silicon: Qwen 3.5, Llama 3.2, DeepSeek R1, Phi-4, Gemma 3, and more. Each preset shows exact RAM usage so you never download something your Mac can't run.

Does it work with Claude Code?

Yes. One command: claude mcp add toolpiper -- ~/.toolpiper/mcp. Restart Claude Code and all 104 tools are available. Also works with Cursor, Windsurf, and any MCP-compatible client via stdio or Streamable HTTP transport.

How much RAM do I need?

8GB minimum for small models (0.8B-3B). 16GB recommended for the mainstream sweet spot (7B-8B models alongside normal app usage). 32GB opens up 14B models and multi-model workflows. ToolPiper's resource intelligence monitors RAM in real-time and automatically evicts models under memory pressure.

Is it really free?

Yes. All 104 MCP tools, all inference backends, browser automation, PiperTest, self-healing, assertions, Playwright/Cypress export, voice AI, RAG, and the OpenAI-compatible API are included in the free tier. ToolPiper Pro ($9.99/month) adds cloud API proxy, developer tokens, Video Creator, and priority model downloads.

Does it work offline?

Yes, completely. All inference runs on your Mac's hardware. Once models are downloaded, everything works without an internet connection — chat, transcribe, speak, embed, OCR, browser automation, testing, and desktop control. The only features that require network access are cloud API proxy and the social/research tools (GitHub, Hacker News, Reddit).

Replace your entire local AI stack.

One app. Nine backends. 104 tools. Zero configuration.