You already know how to use ChatGPT. Type a question, get an answer. Write a prompt, get a draft. Paste some code, get a fix. The interaction model isn't what needs reinventing.

What needs reinventing is where it runs.

Every message you send to ChatGPT, Claude, or Gemini gets transmitted to a remote server, processed there, and logged. OpenAI's data retention policies have changed multiple times. Enterprise customers negotiate custom terms. Everyone else trusts the defaults — and the defaults include using your conversations for model improvement unless you explicitly opt out, buried three menus deep.

You can run the same interaction — the same quality of conversation — entirely on your Mac. No data leaves your machine. No account required. No API key. No internet connection.

What You Need (and What You Don't)

You don't need: A terminal. Docker. Python. Homebrew. Ollama. A GPU. An API key. A subscription to anything.

You do need: A Mac with Apple Silicon (M1 or later) and at least 8GB of RAM. That's it. If you bought a Mac in the last four years, you qualify.

How It Works

Local chat runs a large language model directly on your Mac's hardware. The model file — typically 1–4GB for a good conversational model — lives on your disk. When you type a message, the model processes it using your Mac's GPU and Neural Engine. The response streams back to you in real time, just like ChatGPT.

The key difference: the entire loop — your input, the model's computation, the output — happens on your hardware. There's no network request. There's nothing to intercept, log, or store anywhere except your own machine.

The models available for local use aren't toys. Llama 3.2 3B, Qwen 3.5, and similar models are genuinely capable — they write code, explain concepts, draft emails, and hold multi-turn conversations. They're not GPT-4, but for 90% of daily use cases, you won't notice the difference.

The ModelPiper Workflow

In ModelPiper, local chat is the default experience. Install ToolPiper from the Mac App Store, launch it, and a starter model (Qwen 3.5 0.8B) downloads automatically. Within 60 seconds, you're in a chat interface that looks and feels like any cloud AI service — except it's running on localhost.

The chat interface supports markdown rendering, code highlighting, and multi-turn conversations. You can switch between models from a dropdown — once you've downloaded a larger model like Llama 3.2 3B or Qwen 3.5 4B, it's one click to swap.

Behind the scenes, ToolPiper runs llama.cpp on Metal GPU. Your Mac's unified memory architecture means the model has direct access to your full RAM — no PCIe bottleneck, no VRAM limitation. On an M2 MacBook Air with 16GB, Llama 3.2 3B generates tokens at 30+ tokens per second. That's faster than most cloud services stream to you.

What Makes Local Chat Worth It

Paste confidential code without thinking twice. Client code, proprietary algorithms, internal APIs — none of it leaves your machine.

Brainstorm without self-censoring. When you know a conversation is truly private, you think differently. You ask the dumb questions. You explore the weird ideas. There's no corporate policy review on the other end.

Works on a plane. No Wi-Fi required. The model runs the same at 35,000 feet as it does at your desk.

No usage limits. Send a thousand messages a day. Paste a 10,000-token document. The model doesn't throttle you or tell you to come back in an hour.

Zero ongoing cost per query. After the one-time model download, every conversation is free. Compare that to $20/month for ChatGPT Plus or per-token API pricing that scales with usage.

When Cloud Is Still Better

Local models are 3–8 billion parameters. GPT-4 and Claude Opus are orders of magnitude larger. For genuinely hard reasoning tasks — complex multi-step math, nuanced legal analysis, novel research synthesis — the big cloud models are still better.

The honest answer: use local for everything that doesn't require the absolute best model, and use cloud for the 10% of tasks that do. Most people discover that 10% is more like 2%.

Try It

Download ModelPiper. Install ToolPiper. Wait 60 seconds for the starter model. Start chatting.

Your data stays on your machine. That's not a setting — it's the architecture.

This is part of a series on local-first AI workflows on macOS. Next up: Voice Transcription — local speech-to-text without uploading audio to anyone.