Free Direct Download

Everything Ollama does, free. Then it does the rest of your Mac.

Model downloads, the native llama.cpp engine, multi-model, a local OpenAI-compatible API — free, no account, no caps, no terminal. Then 300+ MCP tools, browser automation, voice, vision, and pipelines on top.

Powered by llama.cpp: ToolPiper embeds upstream llama-server directly — not a fork — currently build b9533, with the exact version shown in the About panel. Models are standard GGUF files. One install replaces Ollama + Open WebUI + Playwright MCP + a filesystem MCP + LangChain.

Download for Mac Set up MCP

By Ben Racicot, Founder & Lead Engineer— Updated 2026-06-09

9 Backends

Local AI Inference

llama.cpp on Metal GPU

Run any GGUF model. Qwen 3.5, Llama 3.2, DeepSeek R1. 30+ tok/s on M2 Air. Flash attention, speculative decoding, and Jinja templates out of the box.

Apple Intelligence

Neural Engine inference. Summarization, rewriting, Smart Reply. Zero GPU impact — runs on dedicated ML hardware alongside your other models.

FluidAudio STT/TTS

Whisper-class transcription at 210x realtime. PocketTTS with zero GPU impact. Both run on the Neural Engine, leaving Metal GPU free for LLM inference.

MLX Audio TTS

Soprano, Orpheus, voice cloning via Qwen3 TTS on Metal GPU. High-quality voice synthesis with support for multiple speakers and styles.

Apple Vision OCR

On-device text extraction from images and PDFs. No model download needed — uses the Vision framework built into macOS.

Resource Intelligence

Real-time RAM monitoring via proc_pid_rusage. Automatic model eviction under memory pressure. Your Mac stays responsive without manual intervention.

Works with Claude Code, Cursor, Windsurf

300+ MCP Tools

Core AI (31 tools)

chat, audio_transcribe, audio_speak, audio_voice_clone, text_embed, vision_ocr, model_load/list/search/download, voice_chat memory, endpoint management. All inference on Neural Engine and Metal GPU.

System Control (162 tools)

26 native macOS domains, all in-process via ActionRouter. Windows, displays, audio devices, network, Bluetooth, Dock, Spaces, Finder, calendar, contacts, focus, clipboard history, notifications, processes, reminders, timers.

Browser & Web (28 tools)

AX-native selectors, self-healing, 7 assertion types, visual recording, network interception, coverage, WebAuthn, autofill, plus web_scrape, http_request, youtube_transcript, and web_api_discover.

Video Creator (17 tools)

AI-driven screenplay-to-MP4 pipeline. screenplay, rehearse, record, render, narrate, plus full timeline edit (composition, narration, screenplay, timeline, export), import_media, and clip library.

Filesystem & Git (18 tools)

file_read, file_write, file_create, file_delete, file_list_directory, file_pick_directory, code_search, workspace_search, plus git_status, git_diff, git_commit, git_log, git_push, git_checkout.

Outreach & Social (15 tools)

github_repo_list, github_activity, github_compare, hn_search, hn_trending, reddit_search, reddit_post, gsc_analytics, gsc_inspect, queue_publish, queue_add, queue_list. Build your distribution loop.

Analysis & RAG (11 tools)

rag_ingest, rag_query, rag_collection_list with local embeddings + HNSW vector index. image_analyze, text_analyze, image_upscale, video_upscale, image_transform, pdf_extract, qr_generate.

Capture & Motion (8 tools)

vision_screenshot, vision_color_pick, audio_record, plus 60fps pose streaming: pose_detect, pose_format_list, pose_stream_start, vision_stream_start/stop.

Testing & Integrations (14 tools)

PiperTest (test_save, test_run, test_list, test_get, test_export, test_delete) with self-healing selectors. Plus OAuth (4), Sieve content filters (4) for end-to-end workflows.

See all 300+ tools

Drop-in Replacement

OpenAI-Compatible API

/v1/chat/completions

Drop-in replacement for OpenAI SDK. Change base_url to localhost:9998, that's it. Streaming and non-streaming. Works with LangChain, LlamaIndex, Continue.dev, Aider, and anything that accepts a custom OpenAI base URL.

/v1/embeddings

Local vector embeddings for RAG pipelines. Apple NL embedding (zero-setup, 512-dim) or llama.cpp embedding models. Content-addressed cache for repeat queries.

/v1/audio/speech

Text-to-speech synthesis. Three engines (FluidAudio, MLX Audio, PocketTTS), eight voices, all on-device. Same API format as OpenAI's TTS endpoint.

/v1/audio/transcriptions

Speech-to-text transcription at 210x realtime on Neural Engine. Parakeet V3 model. Same API format as OpenAI's Whisper endpoint.

Cloud API Proxy

Route cloud requests through ToolPiper with Keychain key injection. Your API keys never appear in code or .env files. One base URL handles both local and cloud models.

Developer Tokens

tp_<64hex> Bearer tokens for team sharing and CI pipelines. SHA-256 hashed, stored in macOS Keychain. Works as api_key in any OpenAI SDK.

Set up the local API

AX-Native

Browser Automation & Testing

AX-Native Selectors

Query Chrome's real accessibility tree via CDP's Accessibility.queryAXTree. Not a DOM simulation, not injected JavaScript. The actual computed AX tree that screen readers consume.

Self-Healing

3 modes — passive, fuzzy AX match, AI-assisted. Broken selectors repair in 5-15ms with zero external calls. Free, not a paid add-on.

Visual Recording

Browse your app normally. Every interaction becomes an AX-enriched test step with element metadata, page context, and a mutation diff showing what changed.

Export to Playwright/Cypress

Deterministic, idiomatic code for CI. AX selectors map to each framework's native format: role:button:Sign In becomes page.getByRole('button', { name: 'Sign In' }).

Learn about PiperTest

On-Device

Voice AI & Media

Speech-to-Text

Parakeet V3 at 210x realtime on Neural Engine. Whisper-class accuracy with zero GPU impact. Transcribe meetings, lectures, and voice memos locally.

Text-to-Speech

Three engines, eight voices, all on-device. FluidAudio on Neural Engine for low-latency. MLX Audio on Metal GPU for studio-quality synthesis.

Voice Cloning

Clone any voice from a 10-second sample via Qwen3 TTS on Metal GPU. Create custom voices for narration, accessibility, or creative projects.

Video Upscale

PiperSR at 44.4 FPS on Apple Neural Engine. Double-buffered ANE+Metal pipeline. Real-time 2x upscale from 360p to 720p.

Push-to-Talk

Right Option = dictate anywhere (STT to paste). Right Command = voice command (STT to LLM with tool definitions to ActionRouter). System-wide hotkeys.

Web Scraping

CDP-based scraper using a real browser. 16-framework detection, readiness-aware extraction, 7 output formats: markdown, text, readability, AX tree, HTML, links, screenshot.

ToolPiper vs The Alternatives

	ToolPiper	Ollama	LM Studio	Open WebUI
Setup	Signed DMG, one drag to Applications	Homebrew + CLI	DMG download	Docker + Ollama required
Inference engine	llama.cpp + 8 other backends	llama.cpp only	llama.cpp only	No engine (proxies to Ollama)
MCP tools	over 300 tools, stdio + HTTP transports	None	None	None
Browser automation	14 CDP tools with AX selectors	None	None	None
Voice AI	STT + TTS + voice cloning	None	None	None
Resource monitoring	Real-time RAM, automatic eviction	None (reads RAM once at startup)	Removed in v4.0	None (most-requested feature)
API compatibility	OpenAI-compatible (chat, embed, speech, transcribe)	OpenAI-compatible (chat, embed)	OpenAI-compatible (chat)	Web UI only
Cloud proxy	Keychain key injection, one base URL	None	None	None
Testing	PiperTest with self-healing + Playwright/Cypress export	None	None	None
Price	Free; Pro $10/mo, Studio $29/mo, Max $49/mo	Free	Free	Free

How It Works

Install

Download the signed DMG from modelpiper.com/download. Drag to Applications. Launch. A starter model downloads automatically.

Connect

Open ModelPiper in your browser, or run claude mcp add toolpiper -- ~/.toolpiper/mcp for MCP.

Build

Chat, transcribe, automate browsers, run tests, build agents — all on localhost.

Everything Stays Local

All inference runs on your Mac's GPU and Neural Engine. No data leaves your machine.

Powered by llama.cpp

The inference engine is upstream llama-server, embedded directly — not a fork. Currently build b9533; the exact version ships in the About panel.

9 Backends, 1 App

llama.cpp, Apple Intelligence, FluidAudio, MLX Audio, Apple Vision OCR, CoreML upscale — coordinated by one process.

Open Standards

OpenAI-compatible API, Model Context Protocol, GGUF models, Playwright/Cypress export. No proprietary lock-in.

Simple Pricing

The model runner is free for everyone — no account, no caps. Paid tiers add voice, media, and developer tools.

Free

forever

Native llama.cpp engine — any GGUF model
Unlimited downloads, multi-model switching
Local OpenAI-compatible API + embeddings
MCP server with all 300+ tools
Transcription (STT) and visual pipeline builder
Free companion apps (Vision, Audio, Media)

Download ToolPiper

Pro

$10

/month

Everything in Free
Push-to-talk dictation, anywhere on your Mac
Text-to-speech — three engines, eight voices
Apple Intelligence on the Neural Engine
Local RAG over your files
Cloud API proxy with Keychain keys
All 9 inference backends

Get Pro

Coming soon

$29

/month

Everything in Pro
Image upscaling (ANE-native)
Video upscaling (60fps real-time)
Video editing pipeline
Pose detection (60fps streaming)
Outreach toolkit

Coming soon

$49

/month

Everything in Studio
CodePiper (IDE AI extension)
PiperTest (self-healing browser tests)
Full browser automation (CDP + AX)
API discovery toolkit
Priority support

Frequently Asked Questions

Do I need Ollama?

No. ToolPiper embeds the same llama.cpp engine Ollama wraps — directly, not as a fork — and the whole runner is free: unlimited GGUF downloads, multi-model switching, the local OpenAI-compatible API, no account. Models are stored as standard GGUF files you can use with any llama.cpp tool, not a proprietary blob format. If you keep Ollama installed, ModelPiper can still connect to it as a backend, but there's nothing it provides that ToolPiper doesn't run natively.

Will my Ollama tools keep working?

Yes - ToolPiper serves the Ollama API itself. Flip on the compatibility listener (Settings → General, off by default) and anything that talks to localhost:11434 talks to ToolPiper: model list, streamed chat, embeddings, pulls, deletes, all on the native engine. It's served as the legacy dialect - every response carries a standards-based deprecation header pointing at the first-party /v1 API, which is where new integrations should land.

What models can I run?

Any GGUF model from HuggingFace (135,000+ available). ToolPiper also ships curated presets tested on Apple Silicon: Qwen 3.5, Llama 3.2, DeepSeek R1, Phi-4, Gemma 3, and more. Each preset shows exact RAM usage so you never download something your Mac can't run.

Does it work with Claude Code?

Yes. One command: claude mcp add toolpiper -- ~/.toolpiper/mcp. Restart Claude Code and all 300+ tools are available. Also works with Cursor, Windsurf, and any MCP-compatible client via stdio or Streamable HTTP transport.

How much RAM do I need?

8GB minimum for small models (0.8B-3B). 16GB recommended for the mainstream sweet spot (7B-8B models alongside normal app usage). 32GB opens up 14B models and multi-model workflows. ToolPiper's resource intelligence monitors RAM in real-time and automatically evicts models under memory pressure.

Is it really free?

The whole model runner is free with no account and no caps: the native llama.cpp engine, unlimited model downloads, multi-model switching, the local OpenAI-compatible API, embeddings, transcription, the visual pipeline builder, and all 300+ MCP tools. Pro ($10/mo) adds push-to-talk dictation, text-to-speech, Apple Intelligence, local RAG, and the cloud API proxy. Studio ($29/mo, coming soon) adds image and video upscaling, video editing, pose detection, and the outreach toolkit. Max ($49/mo, coming soon) adds CodePiper, PiperTest with self-healing, full browser automation, API discovery, and priority support.

Does it work offline?

Yes, completely. All inference runs on your Mac's hardware. Once models are downloaded, everything works without an internet connection — chat, transcribe, speak, embed, OCR, browser automation, testing, and desktop control. The only features that require network access are cloud API proxy and the social/research tools (GitHub, Hacker News, Reddit).

Ready to run AI locally?

One app. Nine backends. Over 300 tools. Zero configuration.

Download for Mac

Everything Ollama does, free. Then it does the rest of your Mac.

Local AI Inference

llama.cpp on Metal GPU

Apple Intelligence

FluidAudio STT/TTS

MLX Audio TTS

Apple Vision OCR

Resource Intelligence

300+ MCP Tools

Core AI (31 tools)

System Control (162 tools)

Browser & Web (28 tools)

Video Creator (17 tools)

Filesystem & Git (18 tools)

Outreach & Social (15 tools)

Analysis & RAG (11 tools)

Capture & Motion (8 tools)

Testing & Integrations (14 tools)

OpenAI-Compatible API

/v1/chat/completions

/v1/embeddings

/v1/audio/speech

/v1/audio/transcriptions

Cloud API Proxy

Developer Tokens

Browser Automation & Testing

AX-Native Selectors

Self-Healing

Visual Recording

Export to Playwright/Cypress

Voice AI & Media

Speech-to-Text

Text-to-Speech

Voice Cloning

Video Upscale

Push-to-Talk

Web Scraping

ToolPiper vs The Alternatives

How It Works

Install

Connect

Build

Everything Stays Local

Powered by llama.cpp

9 Backends, 1 App

Open Standards

Simple Pricing

Frequently Asked Questions

Ready to run AI locally?

Learn More

Local AI Chat

AI Developer Tools

Voice AI