Does Ollama send your prompts to the cloud?

Only if you use its cloud models, which require an ollama.com account and run on datacenter hardware - prompts you send to a -cloud model leave your machine by design. Local models still run entirely on your Mac, and Ollama's privacy policy states it has no access to local prompts or responses.

How much does Ollama's cloud cost?

As of June 2026: the Free tier includes light cloud usage, Pro is $20/month (or $200/year) with roughly 50x the Free tier's usage and three concurrent cloud models, and Max is $100/month with 5x Pro's usage. Usage is metered by GPU time rather than tokens, with limits resetting every five hours and every seven days. Re-check ollama.com/pricing before relying on these numbers - they have changed before.

Is Ollama's no-retention promise trustworthy?

There's no evidence against it, and the written commitments are strong: no logging, no training, transient processing, zero-retention requirements on partners. The structural caveat applies to any hosted inference, not to Ollama specifically: a policy can change, a subpoena can compel, and a partner chain multiplies the parties you're trusting. On-device inference doesn't have that caveat, which is the architectural difference this post is about.

Does ToolPiper have a cloud tier?

No, and it won't. Your content stays on your Mac - no telemetry, no analytics, no tracking, no account check-ins, no cloud inference - and the only thing ToolPiper ever sends is an anonymous benchmark score you choose to publish. Dictation, text-to-speech, and the whole local runner are free. The paid tiers ($10/$29/$49) add on-device capabilities like local RAG over your files and the media tools. You can verify it yourself with Activity Monitor or by running inference with Wi-Fi off.

Should I stop using Ollama because of the cloud pivot?

Not on privacy grounds alone - local Ollama inference remains local and free. The honest reasons to switch are workflow ones: plain GGUF files instead of a blob store, a native Mac interface instead of CLI plus env vars, and a paid roadmap that points at your hardware instead of a datacenter. If those matter to you, the migration takes an afternoon.

Ollama's Cloud Pivot: What It Means for Local AI on Mac

Ollama became the default way to run models locally by being exactly one thing: the tool that kept AI on your machine. In August 2025 it started selling the other thing. Turbo launched at $20/month for datacenter inference, the cloud models followed that September, and today every paid tier on Ollama's pricing page is, at its core, a meter on cloud GPU time.

None of this is hidden, and none of it is wrongdoing. It's a business model, and a defensible one. But it changes what kind of product Ollama is, and if you picked it because inference on your own hardware was the entire point, the change is worth understanding precisely - what still runs locally, what doesn't, and what holds the line in each case.

What is Ollama's cloud, exactly?

Ollama's cloud runs models on datacenter hardware operated by Ollama and its partners instead of on your machine. It requires an ollama.com account, and as of June 2026 it's metered by GPU time across three tiers: Free (light usage), Pro at $20/month, and Max at $100/month, with usage limits that reset every five hours and every seven days.

The cloud catalog is the draw, and it's honest about why: models like a 480B coder or a 671B DeepSeek don't fit in any Mac. Some cloud entries have no local variant at all - including, notably, a Gemini flash preview, a closed Google model. The tool that built its name running open weights on your hardware now also brokers access to proprietary models on someone else's.

Mechanically, a cloud model works just like a local one: ollama signin, then run a model whose tag ends in -cloud. Your prompt goes to hosting that Ollama describes as primarily in the United States, with routing to Europe and Singapore for capacity, on infrastructure from partners it identifies as NVIDIA Cloud Providers. The CLI experience is identical either way, which is the point - and, depending on your threat model, the catch.

What do Ollama's privacy promises say?

Quote them exactly, because the wording is doing the work. The cloud-models announcement: "Ollama's cloud does not retain your data to ensure privacy and security." The current pricing page: "Prompt or response data is never logged or trained on," and for partners, "we require no logging, no training, and zero data retention policies in place." The privacy policy (updated March 2026) commits to processing cloud content "transiently" and not training on inputs or outputs.

Those are good promises. They're the right promises. They are also promises - policies that depend on Ollama and every partner in the chain honoring them, this quarter and every quarter after. When Turbo launched, the top Hacker News comments made exactly this distinction. One put it plainly: working with any cloud provider means your data "can be subpoenaed just like anyone else's." No-retention policies are a real privacy posture. They are not the same posture as the data never leaving your Mac, and Ollama itself spent years teaching users the difference. It's also worth remembering that Ollama operated without any published privacy policy until users asked where it was in mid-2025 - not sinister for a local-only tool, which it was, but a measure of how new this trust surface is.

Is local Ollama still private?

Yes. Models you run locally with Ollama execute on your machine, and Ollama's privacy policy states it does not collect, store, or have access to your local prompts, responses, or content. The cloud pivot has not changed what local inference does.

That deserves to be said without hedging, because the sloppy version of this article would imply otherwise. Local Ollama inference is still local. Signing in is still optional for local use. If you run llama3.2 on your MacBook today, nothing leaves it.

Why does the direction matter if local stays local?

Because products follow their revenue. Every paid Ollama tier buys cloud usage. The Free tier includes cloud usage too - the on-ramp is built into the default experience. The models with the most headline pull are cloud-only. Engineering attention, support, and roadmap follow the meter, and the meter only runs when your prompts leave the building.

This is the incentive gradient that matters over a horizon of years, and it's why "what does the paid tier buy" is the most clarifying question you can ask about any local-AI tool. When the answer is datacenter GPU time, local inference becomes the freemium funnel for a hosting business. Again: legitimate. Also: a different product than the one whose name became a synonym for local.

There's a precedent worth naming. We wrote about Wispr Flow, a dictation tool whose users assumed on-device processing right up until they discovered otherwise. Ollama's situation is much better - the cloud is explicit, opt-in, and documented. But the lesson transfers: privacy that depends on a vendor's current configuration is one product decision away from being something else. Privacy that depends on architecture isn't.

What's the alternative if you want the line held by architecture?

Pick tools whose paid tiers point at your hardware, not away from it. That's the structural test, and it's the one ToolPiper is built to pass: your content stays local - no telemetry, no analytics, no tracking, no account check-ins, no cloud offload - and the only thing it ever sends is an anonymous benchmark score you choose to publish. There is no cloud inference tier, and there will not be a quiet one. Inference happens on your Mac or it doesn't happen.

The free tier is the whole runner: the native llama.cpp engine (upstream build b9533, stated publicly), unlimited GGUF downloads stored as plain named files, multi-model switching, the local OpenAI-compatible API, embeddings, and an MCP server with over 300 tools. Push-to-talk dictation, text-to-speech, voice cloning, and Apple Intelligence on the Neural Engine are free too. No account, no caps, no terminal. The paid tiers buy more software running on the same machine: local RAG over your files, web scraping, and the media tools. Nothing in the price list meters a datacenter, so nothing in the roadmap bends toward one.

You don't have to take the no-cloud claim on faith, which is rather the point. Open Activity Monitor, watch the network, run it with Wi-Fi off. A property you can verify beats a policy you have to trust - that's the whole thesis, and it's testable in an afternoon.

Download ToolPiper at modelpiper.com/download. If you're moving off Ollama, the migration guide gets your models out of the blob store first.

Part of our series on verifiable local-first AI. See Local-First AI on macOS for the architecture argument and Ollama vs ToolPiper for the full comparison.

	Ollama	ToolPiper
Paid product	Metered cloud GPU time (Pro $20/mo, Max $100/mo)	On-device features (Pro $10, Studio $29, Max $49)
Where paid inference runs	Partner datacenters (primarily US; EU/Singapore for capacity)	Your Mac
Account required	For cloud models (ollama signin)	Never
Usage limits	GPU-time caps, 5-hour and 7-day resets	None
Privacy guarantee	No-retention policy (vendor promise)	No outbound calls (verifiable architecture)
Local inference	Free, on your machine, private	Free, on your machine, private
Largest models offered	480B-671B (cloud-only), incl. proprietary	Whatever fits your Mac's RAM

Ollama's Cloud Pivot: What It Means for Local AI on Mac

What is Ollama's cloud, exactly?

What do Ollama's privacy promises say?

Is local Ollama still private?

Why does the direction matter if local stays local?

What's the alternative if you want the line held by architecture?

What Does the Paid Tier Buy? Ollama vs ToolPiper

Frequently Asked Questions

Related

AI Providers