Comparison · Updated 2026-04-28

The best local backend for Claude Code.

Claude Code runs against any HTTP endpoint that speaks the Anthropic Messages API. Four serious options: ToolPiper, Ollama, vLLM, LM Studio. They are not equivalent. Here is what is actually different.

The short version

  • Pick Ollama / vLLM / LM Studio if you want a single local model server and nothing else. They speak the protocol; that is the feature.
  • Pick ToolPiper if you want Claude Code to route across local + cloud providers in the same conversation, with 147 MCP tools, voice, browser automation, and Keychain-locked BYOK — without configuring any of it.
  • ToolPiper is the only backend that ships conversational provider switching ("use my local one") and context-aware recommendations ("you're at 47K, switch to Gemini Flash"). That is the moat, not raw inference speed.

Feature matrix

FeatureToolPiperOllamavLLMLM Studio
Anthropic Messages API (`POST /v1/messages`)Native, with provider switchingNoYes (since v0.7)Yes (since 0.3.x)
OpenAI Chat Completions (`/v1/chat/completions`)YesYesYesYes
Streaming with cancellation<200 ms upstream abort on EscYesYesYes
Tool use round-tripAnthropic + OpenAI shapesOpenAI onlyOpenAI onlyOpenAI only
Inference backends9 (llama.cpp, Apple Intelligence, FluidAudio, MLX, Apple Vision, …)1 (llama.cpp)1 (vLLM engine)1 (llama.cpp)
Apple Intelligence (Neural Engine) backendYes — first to ship itNoNoNo
BYOK cloud proxy (OpenAI / Anthropic / Gemini / etc.)Yes — Keychain-locked, server-side injectionNoNoNo
Cross-provider routing in one base URLYes — `endpoint_set` MCP toolNoNoNo
Conversational provider switching"Switch to my local one" works in chatNoNoNo
Context-aware backend recommendation`endpoint_recommend` reads conversation size + telemetryNoNoNo
Native /model picker integrationYes — auto-writes `ANTHROPIC_CUSTOM_MODEL_OPTION_*`NoNoNo
Zero-config Claude Code installOne click — token + settings.json + MCP registerManual env varsManual env vars + DEFAULT_*_MODEL danceManual env vars + DEFAULT_*_MODEL dance
MCP server bundled147 tools — browser, files, system, video, voiceNoNoNo
MCP toolbelt injection (when client has no tools)Header-gated, PiperMatch-retrievedNoNoNo
Browser automation (CDP)17 AX-native toolsNoNoNo
Voice AI (STT + TTS + voice clone)YesNoNoNo
OCR (Apple Vision)YesNoNoNo
Image / video upscale on Neural EnginePiperSR — 44 fps real-time 2× videoNoNoNo
Resource intelligence (auto model eviction)Real-time RAM monitoring + evictionReads RAM once at startupManualRemoved in 4.0
DistributionNative macOS DMGHomebrew + CLIPython wheel + GPU containerCross-platform DMG
SandboxingNative, signedNoNoNo
PriceFree; Pro $9.99/mo for cloud BYOKFreeFreeFree

Performance baselines

Performance numbers depend heavily on your Mac, your model, and your prompt mix. ToolPiper ships scripts/anthropic-perf-baseline.ts — a harness that captures p50 / p95 TTFT and tokens/sec across every provider you have configured, including Ollama / vLLM / LM Studio if you wire them as endpoints. The output lives at docs/features/anthropic-perf-baseline.md and updates on every run. The numbers below are workload definitions; the live comparison is in the repo.

WorkloadToolPiperDefinition
Single-turn 2K prompt (TTFT, p50)See live numbersCaptured per `EndpointConfig` by `scripts/anthropic-perf-baseline.ts`. Runs across every provider you have configured.
Multi-turn 20K with tools (tokens/sec, p50)See live numbersSame harness — Phase 7 deliverable, populated on every fresh ToolPiper run.
Tool-heavy 50K (TTFT, p95)See live numbersSkipped automatically on backends whose context window is below 50K — annotated `context_overflow` in the output.

Pick a recipe

Try it in two minutes

Download ToolPiper, click "Configure for Claude Code", run claude. Done.