Free Direct Download

Your Mac is an AI platform.

ToolPiper bundles local inference, over 300 MCP tools, browser automation, voice AI, RAG, and video upscale into one native macOS app. No cloud, no Docker, no Python.

Nine inference backends on Apple Silicon — llama.cpp on Metal GPU, Apple Intelligence on Neural Engine, FluidAudio for speech, MLX Audio for voice synthesis, Apple Vision for OCR and pose detection. One install replaces Ollama + Open WebUI + Playwright MCP + a filesystem MCP + LangChain.

9 Backends

Local AI Inference

llama.cpp on Metal GPU

Run any GGUF model. Qwen 3.5, Llama 3.2, DeepSeek R1. 30+ tok/s on M2 Air. Flash attention, speculative decoding, and Jinja templates out of the box.

Apple Intelligence

Neural Engine inference. Summarization, rewriting, Smart Reply. Zero GPU impact — runs on dedicated ML hardware alongside your other models.

FluidAudio STT/TTS

Whisper-class transcription at 210x realtime. PocketTTS with zero GPU impact. Both run on the Neural Engine, leaving Metal GPU free for LLM inference.

MLX Audio TTS

Soprano, Orpheus, voice cloning via Qwen3 TTS on Metal GPU. High-quality voice synthesis with support for multiple speakers and styles.

Apple Vision OCR

On-device text extraction from images and PDFs. No model download needed — uses the Vision framework built into macOS.

Resource Intelligence

Real-time RAM monitoring via proc_pid_rusage. Automatic model eviction under memory pressure. Your Mac stays responsive without manual intervention.

Works with Claude Code, Cursor, Windsurf

300+ MCP Tools

Core AI (31 tools)

chat, audio_transcribe, audio_speak, audio_voice_clone, text_embed, vision_ocr, model_load/list/search/download, voice_chat memory, endpoint management. All inference on Neural Engine and Metal GPU.

System Control (162 tools)

26 native macOS domains, all in-process via ActionRouter. Windows, displays, audio devices, network, Bluetooth, Dock, Spaces, Finder, calendar, contacts, focus, clipboard history, notifications, processes, reminders, timers.

Browser & Web (28 tools)

AX-native selectors, self-healing, 7 assertion types, visual recording, network interception, coverage, WebAuthn, autofill, plus web_scrape, http_request, youtube_transcript, and web_api_discover.

Video Creator (17 tools)

AI-driven screenplay-to-MP4 pipeline. screenplay, rehearse, record, render, narrate, plus full timeline edit (composition, narration, screenplay, timeline, export), import_media, and clip library.

Filesystem & Git (18 tools)

file_read, file_write, file_create, file_delete, file_list_directory, file_pick_directory, code_search, workspace_search, plus git_status, git_diff, git_commit, git_log, git_push, git_checkout.

Outreach & Social (15 tools)

github_repo_list, github_activity, github_compare, hn_search, hn_trending, reddit_search, reddit_post, gsc_analytics, gsc_inspect, queue_publish, queue_add, queue_list. Build your distribution loop.

Analysis & RAG (11 tools)

rag_ingest, rag_query, rag_collection_list with local embeddings + HNSW vector index. image_analyze, text_analyze, image_upscale, video_upscale, image_transform, pdf_extract, qr_generate.

Capture & Motion (8 tools)

vision_screenshot, vision_color_pick, audio_record, plus 60fps pose streaming: pose_detect, pose_format_list, pose_stream_start, vision_stream_start/stop.

Testing & Integrations (14 tools)

PiperTest (test_save, test_run, test_list, test_get, test_export, test_delete) with self-healing selectors. Plus OAuth (4), Sieve content filters (4) for end-to-end workflows.

Drop-in Replacement

OpenAI-Compatible API

/v1/chat/completions

Drop-in replacement for OpenAI SDK. Change base_url to localhost:9998, that's it. Streaming and non-streaming. Works with LangChain, LlamaIndex, Continue.dev, Aider, and anything that accepts a custom OpenAI base URL.

/v1/embeddings

Local vector embeddings for RAG pipelines. Apple NL embedding (zero-setup, 512-dim) or llama.cpp embedding models. Content-addressed cache for repeat queries.

/v1/audio/speech

Text-to-speech synthesis. Three engines (FluidAudio, MLX Audio, PocketTTS), eight voices, all on-device. Same API format as OpenAI's TTS endpoint.

/v1/audio/transcriptions

Speech-to-text transcription at 210x realtime on Neural Engine. Parakeet V3 model. Same API format as OpenAI's Whisper endpoint.

Cloud API Proxy

Route cloud requests through ToolPiper with Keychain key injection. Your API keys never appear in code or .env files. One base URL handles both local and cloud models.

Developer Tokens

tp_<64hex> Bearer tokens for team sharing and CI pipelines. SHA-256 hashed, stored in macOS Keychain. Works as api_key in any OpenAI SDK.

AX-Native

Browser Automation & Testing

AX-Native Selectors

Query Chrome's real accessibility tree via CDP's Accessibility.queryAXTree. Not a DOM simulation, not injected JavaScript. The actual computed AX tree that screen readers consume.

Self-Healing

3 modes — passive, fuzzy AX match, AI-assisted. Broken selectors repair in 5-15ms with zero external calls. Free, not a paid add-on.

Visual Recording

Browse your app normally. Every interaction becomes an AX-enriched test step with element metadata, page context, and a mutation diff showing what changed.

Export to Playwright/Cypress

Deterministic, idiomatic code for CI. AX selectors map to each framework's native format: role:button:Sign In becomes page.getByRole('button', { name: 'Sign In' }).

On-Device

Voice AI & Media

Speech-to-Text

Parakeet V3 at 210x realtime on Neural Engine. Whisper-class accuracy with zero GPU impact. Transcribe meetings, lectures, and voice memos locally.

Text-to-Speech

Three engines, eight voices, all on-device. FluidAudio on Neural Engine for low-latency. MLX Audio on Metal GPU for studio-quality synthesis.

Voice Cloning

Clone any voice from a 10-second sample via Qwen3 TTS on Metal GPU. Create custom voices for narration, accessibility, or creative projects.

Video Upscale

PiperSR at 44.4 FPS on Apple Neural Engine. Double-buffered ANE+Metal pipeline. Real-time 2x upscale from 360p to 720p.

Push-to-Talk

Right Option = dictate anywhere (STT to paste). Right Command = voice command (STT to LLM with tool definitions to ActionRouter). System-wide hotkeys.

Web Scraping

CDP-based scraper using a real browser. 16-framework detection, readiness-aware extraction, 7 output formats: markdown, text, readability, AX tree, HTML, links, screenshot.

ToolPiper vs The Alternatives

ToolPiperOllamaLM StudioOpen WebUI
SetupSigned DMG, one drag to ApplicationsHomebrew + CLIDMG downloadDocker + Ollama required
Inference enginellama.cpp + 8 other backendsllama.cpp onlyllama.cpp onlyNo engine (proxies to Ollama)
MCP toolsover 300 tools, stdio + HTTP transportsNoneNoneNone
Browser automation14 CDP tools with AX selectorsNoneNoneNone
Voice AISTT + TTS + voice cloningNoneNoneNone
Resource monitoringReal-time RAM, automatic evictionNone (reads RAM once at startup)Removed in v4.0None (most-requested feature)
API compatibilityOpenAI-compatible (chat, embed, speech, transcribe)OpenAI-compatible (chat, embed)OpenAI-compatible (chat)Web UI only
Cloud proxyKeychain key injection, one base URLNoneNoneNone
TestingPiperTest with self-healing + Playwright/Cypress exportNoneNoneNone
PriceFree; Pro $10/mo, Studio $29/mo, Max $49/moFreeFreeFree

How It Works

1

Install

Download the signed DMG from modelpiper.com/download. Drag to Applications. Launch. A starter model downloads automatically.

2

Connect

Open ModelPiper in your browser, or run claude mcp add toolpiper -- ~/.toolpiper/mcp for MCP.

3

Build

Chat, transcribe, automate browsers, run tests, build agents — all on localhost.

Everything Stays Local

All inference runs on your Mac's GPU and Neural Engine. No data leaves your machine.

9 Backends, 1 App

llama.cpp, Apple Intelligence, FluidAudio, MLX Audio, Apple Vision OCR, CoreML upscale — coordinated by one process.

Open Standards

OpenAI-compatible API, Model Context Protocol, GGUF models, Playwright/Cypress export. No proprietary lock-in.

Simple Pricing

Start free. Upgrade for native engines, Studio creator tools, or Max developer tools.

Free
$0

forever

  • Local chat with bundled model
  • Connect Ollama, OpenAI, Anthropic
  • Visual pipeline builder
  • OpenAI-compatible API
  • Free companion apps (Vision, Audio, Media)
Download ToolPiper
Pro
$10

/month

  • Everything in Free
  • All 9 inference backends
  • Apple Intelligence integration
  • Full voice suite (STT + TTS)
  • Local RAG over your files
  • All 300+ MCP tools
  • Cloud API proxy with Keychain keys
  • Unlimited model downloads
Get Pro
Coming soon
$29

/month

  • Everything in Pro
  • Image upscaling (ANE-native)
  • Video upscaling (60fps real-time)
  • Video editing pipeline
  • Pose detection (60fps streaming)
  • Outreach toolkit
Coming soon
$49

/month

  • Everything in Studio
  • CodePiper (IDE AI extension)
  • PiperTest (self-healing browser tests)
  • Full browser automation (CDP + AX)
  • API discovery toolkit
  • Priority support

Frequently Asked Questions

Do I need Ollama?

No. ToolPiper includes the same llama.cpp engine that Ollama uses, plus eight additional backends. It runs the same GGUF models at the same speed. If you already use Ollama, ModelPiper can connect to it as an external provider — but ToolPiper replaces the entire Ollama + Open WebUI stack with zero configuration.

What models can I run?

Any GGUF model from HuggingFace (135,000+ available). ToolPiper also ships curated presets tested on Apple Silicon: Qwen 3.5, Llama 3.2, DeepSeek R1, Phi-4, Gemma 3, and more. Each preset shows exact RAM usage so you never download something your Mac can't run.

Does it work with Claude Code?

Yes. One command: claude mcp add toolpiper -- ~/.toolpiper/mcp. Restart Claude Code and all 300+ tools are available. Also works with Cursor, Windsurf, and any MCP-compatible client via stdio or Streamable HTTP transport.

How much RAM do I need?

8GB minimum for small models (0.8B-3B). 16GB recommended for the mainstream sweet spot (7B-8B models alongside normal app usage). 32GB opens up 14B models and multi-model workflows. ToolPiper's resource intelligence monitors RAM in real-time and automatically evicts models under memory pressure.

Is it really free?

The free tier includes local chat with a bundled model, the visual pipeline builder, the OpenAI-compatible API, and connections to Ollama / OpenAI / Anthropic via your own keys. Pro ($10/mo) unlocks all 9 inference backends, Apple Intelligence, the full voice suite (STT + TTS), local RAG, all 300+ MCP tools, the cloud API proxy, and unlimited model downloads. Studio ($29/mo, coming soon) adds image and video upscaling, video editing, pose detection, and the outreach toolkit. Max ($49/mo, coming soon) adds CodePiper, PiperTest with self-healing, full browser automation, API discovery, and priority support.

Does it work offline?

Yes, completely. All inference runs on your Mac's hardware. Once models are downloaded, everything works without an internet connection — chat, transcribe, speak, embed, OCR, browser automation, testing, and desktop control. The only features that require network access are cloud API proxy and the social/research tools (GitHub, Hacker News, Reddit).

Replace your entire local AI stack.

One app. Nine backends. Over 300 tools. Zero configuration.