Comparison · Updated 2026-04-28

The best local backend for Claude Code.

Claude Code runs against any HTTP endpoint that speaks the Anthropic Messages API. Four serious options: ToolPiper, Ollama, vLLM, LM Studio. They are not equivalent. Here is what is actually different.

The short version

Pick Ollama / vLLM / LM Studio if you want a single local model server and nothing else. They speak the protocol; that is the feature.
Pick ToolPiper if you want Claude Code to route across local + cloud providers in the same conversation, with 147 MCP tools, voice, browser automation, and Keychain-locked BYOK — without configuring any of it.
ToolPiper is the only backend that ships conversational provider switching ("use my local one") and context-aware recommendations ("you're at 47K, switch to Gemini Flash"). That is the moat, not raw inference speed.

Feature matrix

Feature	ToolPiper	Ollama	vLLM	LM Studio
Anthropic Messages API (`POST /v1/messages`)	Native, with provider switching	No	Yes (since v0.7)	Yes (since 0.3.x)
OpenAI Chat Completions (`/v1/chat/completions`)	Yes	Yes	Yes	Yes
Streaming with cancellation	<200 ms upstream abort on Esc	Yes	Yes	Yes
Tool use round-trip	Anthropic + OpenAI shapes	OpenAI only	OpenAI only	OpenAI only
Inference backends	9 (llama.cpp, Apple Intelligence, FluidAudio, MLX, Apple Vision, …)	1 (llama.cpp)	1 (vLLM engine)	1 (llama.cpp)
Apple Intelligence (Neural Engine) backend	Yes — first to ship it	No	No	No
BYOK cloud proxy (OpenAI / Anthropic / Gemini / etc.)	Yes — Keychain-locked, server-side injection	No	No	No
Cross-provider routing in one base URL	Yes — `endpoint_set` MCP tool	No	No	No
Conversational provider switching	"Switch to my local one" works in chat	No	No	No
Context-aware backend recommendation	`endpoint_recommend` reads conversation size + telemetry	No	No	No
Native /model picker integration	Yes — auto-writes `ANTHROPIC_CUSTOM_MODEL_OPTION_*`	No	No	No
Zero-config Claude Code install	One click — token + settings.json + MCP register	Manual env vars	Manual env vars + DEFAULT_*_MODEL dance	Manual env vars + DEFAULT_*_MODEL dance
MCP server bundled	147 tools — browser, files, system, video, voice	No	No	No
MCP toolbelt injection (when client has no tools)	Header-gated, PiperMatch-retrieved	No	No	No
Browser automation (CDP)	17 AX-native tools	No	No	No
Voice AI (STT + TTS + voice clone)	Yes	No	No	No
OCR (Apple Vision)	Yes	No	No	No
Image / video upscale on Neural Engine	PiperSR — 44 fps real-time 2× video	No	No	No
Resource intelligence (auto model eviction)	Real-time RAM monitoring + eviction	Reads RAM once at startup	Manual	Removed in 4.0
Distribution	Native macOS DMG	Homebrew + CLI	Python wheel + GPU container	Cross-platform DMG
Sandboxing	Native, signed	No	No	No
Price	Free; Pro $9.99/mo for cloud BYOK	Free	Free	Free

Performance baselines

Performance numbers depend heavily on your Mac, your model, and your prompt mix. ToolPiper ships scripts/anthropic-perf-baseline.ts — a harness that captures p50 / p95 TTFT and tokens/sec across every provider you have configured, including Ollama / vLLM / LM Studio if you wire them as endpoints. The output lives at docs/features/anthropic-perf-baseline.md and updates on every run. The numbers below are workload definitions; the live comparison is in the repo.

Workload	ToolPiper	Definition
Single-turn 2K prompt (TTFT, p50)	See live numbers	Captured per `EndpointConfig` by `scripts/anthropic-perf-baseline.ts`. Runs across every provider you have configured.
Multi-turn 20K with tools (tokens/sec, p50)	See live numbers	Same harness — Phase 7 deliverable, populated on every fresh ToolPiper run.
Tool-heavy 50K (TTFT, p95)	See live numbers	Skipped automatically on backends whose context window is below 50K — annotated `context_overflow` in the output.

Pick a recipe

Free · On-device

Try it in two minutes

Download ToolPiper, click "Configure for Claude Code", run claude. Done.

Download ToolPiper Open Setup Guide

The best local backend for Claude Code.

The short version

Feature matrix

Performance baselines

Pick a recipe

Apple Intelligence

Local Qwen

OpenAI with your key

Switch providers in chat

Try it in two minutes