Local · 32K context · Tool use

Claude Code on local Qwen.

Best balance of capability and privacy. Qwen runs on your Mac's Metal GPU via llama.cpp, supports tool use, and gives Claude Code 32K of context to work with — without sending a byte to any cloud.

Before you start

  • Apple Silicon Mac with at least 16 GB unified memory (32 GB recommended for Qwen-32B; 16 GB fits Qwen-14B comfortably).
  • Claude Code installed.
  • ToolPiper installed (free DMG from modelpiper.com/download).

Setup

  1. 1

    Download a Qwen model in ToolPiper

    Launch ToolPiper. Open Models → Browse. Search for Qwen. Pick the largest GGUF that fits your RAM — ToolPiper shows the predicted footprint per quantization. Qwen2.5-Coder-14B-Instruct-Q4_K_M is a strong default for 16 GB Macs; Qwen2.5-Coder-32B-Instruct-Q4_K_M for 32 GB+.

  2. 2

    Add an llama.cpp endpoint pointing at the model

    In ToolPiper → Endpoints → Add Endpoint → llama.cpp (local). Pick the Qwen model you just downloaded. Set the context window to 32 K. Save. ToolPiper warms the model in the background; first request is fast even on a cold start.

  3. 3

    Click "Configure for Claude Code"

    ToolPiper → Docs → Claude Code → Configure for Claude Code. We mint a dev token, register the MCP server, and write ~/.claude/settings.json. Your Qwen endpoint shows up in Claude Code's /model picker.

  4. 4

    Run Claude Code, pick Qwen, ship code

    Open any terminal. ToolPiper handles the rest.

    shell
    claude
    # /model → "Local Qwen"
    # Now every prompt runs on your Mac.
  5. 5

    Tune the context window if you want more

    In ToolPiper, edit the endpoint and bump the context window override. ToolPiper re-warms with the new size. Claude Code will pick up the change next time it reads settings.json (a fresh claude launch).

Why this recipe

Fully local

GGUF on Metal GPU, llama.cpp, your filesystem. Nothing leaves the Mac. ToolPiper's resource scheduler evicts cleanly when memory pressure hits.

Real tool use

Qwen2.5-Coder is tool-trained — Bash, Edit, Read, Write all work. Claude Code's full feature set, locally.

Burst to cloud anytime

Hit a hard problem? Say "switch to my OpenAI endpoint" mid-session. ToolPiper reroutes; the conversation continues with cloud horsepower for that one task.

Frequently asked

Which Qwen size should I pick?

On 16 GB: Qwen2.5-Coder-14B Q4_K_M. On 32 GB: Qwen2.5-Coder-32B Q4_K_M. ToolPiper's model browser shows predicted memory + tokens/sec on your specific Mac, so you don't have to guess.

Is 32 K context enough for Claude Code?

For most edits, yes. Claude Code's <code class="font-mono text-xs text-white/70">endpoint_recommend</code> tool will suggest a longer-context backend automatically when a conversation overflows — it sees your whole endpoint list and picks the right one.

How does this compare to Ollama?

ToolPiper uses the same llama.cpp engine Ollama wraps. The difference is the platform around it: 147 MCP tools, browser automation, an Anthropic proxy with provider switching, and resource intelligence. See the <a href="/compare/claude-code-backends" class="text-primary-400 hover:text-primary-300">comparison page</a>.

Can I run Claude Code and other AI clients against the same Qwen model?

Yes. ToolPiper's endpoint serves <code class="font-mono text-xs text-white/70">/v1/chat/completions</code> (OpenAI-shape) and <code class="font-mono text-xs text-white/70">/v1/messages</code> (Anthropic-shape) simultaneously. Same model, both shapes, no extra config.

Ready to try it?

ToolPiper is a free download. Configure once and Claude Code routes through your Mac.

Other Claude Code recipes