Best balance of capability and privacy. Qwen runs on your Mac's Metal GPU via llama.cpp, supports tool use, and gives Claude Code 32K of context to work with — without sending a byte to any cloud.
Launch ToolPiper. Open Models → Browse. Search for Qwen. Pick the largest GGUF that fits your RAM — ToolPiper shows the predicted footprint per quantization. Qwen2.5-Coder-14B-Instruct-Q4_K_M is a strong default for 16 GB Macs; Qwen2.5-Coder-32B-Instruct-Q4_K_M for 32 GB+.
In ToolPiper → Endpoints → Add Endpoint → llama.cpp (local). Pick the Qwen model you just downloaded. Set the context window to 32 K. Save. ToolPiper warms the model in the background; first request is fast even on a cold start.
ToolPiper → Docs → Claude Code → Configure for Claude Code. We mint a dev token, register the MCP server, and write ~/.claude/settings.json. Your Qwen endpoint shows up in Claude Code's /model picker.
Open any terminal. ToolPiper handles the rest.
claude
# /model → "Local Qwen"
# Now every prompt runs on your Mac.In ToolPiper, edit the endpoint and bump the context window override. ToolPiper re-warms with the new size. Claude Code will pick up the change next time it reads settings.json (a fresh claude launch).
GGUF on Metal GPU, llama.cpp, your filesystem. Nothing leaves the Mac. ToolPiper's resource scheduler evicts cleanly when memory pressure hits.
Qwen2.5-Coder is tool-trained — Bash, Edit, Read, Write all work. Claude Code's full feature set, locally.
Hit a hard problem? Say "switch to my OpenAI endpoint" mid-session. ToolPiper reroutes; the conversation continues with cloud horsepower for that one task.
On 16 GB: Qwen2.5-Coder-14B Q4_K_M. On 32 GB: Qwen2.5-Coder-32B Q4_K_M. ToolPiper's model browser shows predicted memory + tokens/sec on your specific Mac, so you don't have to guess.
For most edits, yes. Claude Code's <code class="font-mono text-xs text-white/70">endpoint_recommend</code> tool will suggest a longer-context backend automatically when a conversation overflows — it sees your whole endpoint list and picks the right one.
ToolPiper uses the same llama.cpp engine Ollama wraps. The difference is the platform around it: 147 MCP tools, browser automation, an Anthropic proxy with provider switching, and resource intelligence. See the <a href="/compare/claude-code-backends" class="text-primary-400 hover:text-primary-300">comparison page</a>.
Yes. ToolPiper's endpoint serves <code class="font-mono text-xs text-white/70">/v1/chat/completions</code> (OpenAI-shape) and <code class="font-mono text-xs text-white/70">/v1/messages</code> (Anthropic-shape) simultaneously. Same model, both shapes, no extra config.
ToolPiper is a free download. Configure once and Claude Code routes through your Mac.