A privacy policy is a promise. A socket table is a fact. Every AI app on the Mac now says some version of "your data stays on your device," and some of them mean it, but you don't have to take anyone's word for it. macOS will show you every byte a process sends and every connection it opens, with tools that are already installed.

We're writing this guide because we ship one of those apps. ToolPiper's pitch is that inference happens on your Mac and nothing phones home, and that claim is worth exactly nothing if you can't check it yourself. So here's how to check it. On our app, and on every other AI app you run. The method is the point.

Budget about ten minutes for the built-in tools, or a few days if you want the stronger version with a free outbound firewall. The only Terminal work is pasting two commands.

What does "offline" actually mean for an AI app?

An AI app is verifiably offline only when it makes zero outbound network connections while you use it. Storing chats locally and running inference locally are separate, weaker properties - an app can do both and still send telemetry, analytics, or license check-ins to its own servers.

Marketing blurs three claims that engineering keeps separate. "Your chats are stored locally" means the transcript file lives on your disk instead of in someone's database. "Local inference" means the model computes on your hardware instead of on a datacenter GPU. Neither says anything about what else the app does with your network connection.

Picture an app that checks both visible boxes. Chats in a local SQLite file. A model running on your GPU. And a background task that posts usage stats to an analytics endpoint on every launch. Every sentence on that app's privacy page stays technically true while it phones home daily.

The test in this article detects the third property, the strict one. Zero outbound calls means no telemetry, no analytics, no crash uploads, no account check-ins, no "anonymous usage statistics." When a process opens no sockets to the internet, every weaker privacy claim comes along for free.

Which Mac tools show you an app's network traffic?

macOS ships three that need no installation: Activity Monitor's Network tab (cumulative bytes sent per process), lsof (a snapshot of every open socket), and nettop (live per-process traffic). For monitoring over days instead of minutes, add an outbound firewall - LuLu is free and open source, Little Snitch costs about $59.

Each tool answers a different question. Activity Monitor answers "has this process sent anything since boot?" with one number you can watch move. lsof answers "what connections exist right now, and to where?" nettop answers "what's flowing through each connection this second?" And the firewalls answer the hardest question, "what will this app try next Tuesday?", by sitting between every process and the network permanently and asking you before anything new gets through.

One vocabulary note before you start. Connections to 127.0.0.1 (also written localhost) are loopback - traffic from your Mac to itself that never touches the network card. Local AI apps use loopback constantly, because a local inference engine is usually an HTTP server the app talks to on a local port. Loopback entries in these tools are not outbound calls. The addresses that matter are the ones that aren't yours.

The comparison table below summarizes what each method shows, what it costs, and whether it's a snapshot or a standing guard. You don't need all five. Activity Monitor plus the Wi-Fi test catches the obvious cases, lsof and nettop give you the receipts, and a firewall covers the schedule-based behavior nothing else can.

How do you run the verification test?

Watch the app's Sent Bytes in Activity Monitor while you use its AI features, list its sockets with lsof -i -nP, stream live traffic with nettop -p <pid>, then turn Wi-Fi off and confirm inference still works. A truly offline app shows flat byte counters, no sockets to remote addresses, and full function with no network at all.

The step-by-step section below has the exact commands. Two details separate a real test from a glance.

First, generate load. An idle app sends nothing because it's doing nothing. Run the AI features hard while you watch - long chats, a transcription, whatever the app sells. Telemetry often fires on feature use or app launch, not while the app sits in the background. Quit and relaunch the app mid-test too, with the tools still watching.

Second, know what a pass looks like before you start. Sent Bytes that stays flat while you hammer inference. An lsof listing with no ESTABLISHED rows to remote addresses. A model that keeps answering with Wi-Fi off. Anything short of that pattern deserves a follow-up question, and the destination address usually answers it - a connection to a model CDN during a download you started is a different animal from a connection to an analytics domain at every launch.

There's also an attribution gotcha worth knowing. Mac apps can hand large transfers to the system, and that traffic shows up under a daemon called nsurlsessiond rather than under the app's own name. If an app's counters look suspiciously clean, keep an eye on nsurlsessiond while you use it. Rule-based firewalls like Little Snitch attribute that traffic back to the responsible app, which is part of what you're paying for.

What does the test show on ToolPiper?

Local inference in ToolPiper produces zero outbound connections: flat Sent Bytes during chat, no remote sockets in lsof, full function with Wi-Fi off. The only network activity you'll see is a model download you started from Hugging Face, the Sparkle update check (automatic, roughly every six hours, against releases.modelpiper.com - you'll see it in a standing firewall, and that's exactly the kind of transparency this test is for), and prompts to a cloud provider if you've added your own API key.

Run the steps against ToolPiper and here's what comes back. During a local chat session, Sent Bytes for the ToolPiper process doesn't move. lsof -i -nP | grep -i toolpiper returns loopback entries - the app listens on 127.0.0.1:9998 because the local OpenAI-compatible API is a feature, and the embedded llama-server engine talks to the app over loopback too. No remote addresses. Turn Wi-Fi off mid-conversation and the model keeps answering, because the weights are plain GGUF files on your disk and the compute is on your chip.

Three things do touch the network, and we'd rather name them than have you find them. Downloading a model opens a connection to Hugging Face, since that's where the GGUF files live - you clicked the button, you'll see the traffic, it stops when the download does. Checking for updates connects to releases.modelpiper.com on an automatic schedule, about every six hours. We name the domain here so you can match it against what your firewall shows: one rule, one purpose, inspectable. And if you add your own cloud API key (ToolPiper supports bring-your-own-key for cloud providers), prompts you send to that provider obviously leave your machine. That's what you asked for, and you'll see the TLS connection in nettop like everything else. No keys, no traffic.

What you won't find is telemetry, analytics, crash uploads, or account check-ins, because there's no account. We didn't strip tracking out for a privacy release. We never built it.

To be fair about the neighborhood: local runners as a category mostly pass the inference half of this test. Ollama running a model on your Mac stays on your Mac too. The point of the method isn't to catch one app. It's to turn "offline" from a vibe into a checkable property, for every AI app you install from now on.

What can this test miss?

Snapshot tools only show the moment you ran them, so a beacon that fires daily or on specific events can slip past lsof and a glance at Activity Monitor - persistent firewall rules close that gap. The test also shows where traffic went, not what was inside it, and an app update can change behavior, so re-test after updates.

Be straight about the limits, because they decide how much weight each result deserves.

Snapshots miss schedules. An lsof listing at noon says nothing about what the app does at 3am, on its tenth launch, or the first time it sees a new network. This is exactly the gap LuLu and Little Snitch exist to fill: deny by default, alert on every new connection, and stay running for weeks while you go about your work.

DNS isn't payload, and it isn't attributed. Mac apps usually don't resolve hostnames themselves - a system service called mDNSResponder does lookups on their behalf, so a DNS query for an analytics domain won't appear under the app's name in these tools. A lookup also isn't proof that data was sent. Treat DNS as a lead. The actual connection, which lsof and nettop do attribute to the process, is the evidence.

Bytes, not contents. These tools show that traffic happened and where it went, not what was in it. For a zero-connections result that's fine, since nothing times anything is nothing. When you do see traffic, destination plus timing is usually enough to judge it.

Results expire. The cleanest socket listing describes one build of one app on one day. An update can add a license check, a crash reporter, or a "cloud sync" default. Re-run the ten-minute version after meaningful updates, or let a standing firewall make re-testing automatic.

That's the whole method. Ten minutes of built-in tools for a snapshot, a free firewall for the long game, a re-test after updates. Run it on ToolPiper first if you like - modelpiper.com/download, free, no account - and then run it on everything else on your dock.

More on what's verifiable: Is ToolPiper Safe? walks the same evidence feature by feature, and our local-first pillar covers why the architecture matters in the first place.