Ollama became the default way to run models locally by being exactly one thing: the tool that kept AI on your machine. In August 2025 it started selling the other thing. Turbo launched at $20/month for datacenter inference, the cloud models followed that September, and today every paid tier on Ollama's pricing page is, at its core, a meter on cloud GPU time.
None of this is hidden, and none of it is wrongdoing. It's a business model, and a defensible one. But it changes what kind of product Ollama is, and if you picked it because inference on your own hardware was the entire point, the change is worth understanding precisely - what still runs locally, what doesn't, and what holds the line in each case.
What is Ollama's cloud, exactly?
Ollama's cloud runs models on datacenter hardware operated by Ollama and its partners instead of on your machine. It requires an ollama.com account, and as of June 2026 it's metered by GPU time across three tiers: Free (light usage), Pro at $20/month, and Max at $100/month, with usage limits that reset every five hours and every seven days.
The cloud catalog is the draw, and it's honest about why: models like a 480B coder or a 671B DeepSeek don't fit in any Mac. Some cloud entries have no local variant at all - including, notably, a Gemini flash preview, a closed Google model. The tool that built its name running open weights on your hardware now also brokers access to proprietary models on someone else's.
Mechanically, a cloud model works just like a local one: ollama signin, then run a model whose tag ends in -cloud. Your prompt goes to hosting that Ollama describes as primarily in the United States, with routing to Europe and Singapore for capacity, on infrastructure from partners it identifies as NVIDIA Cloud Providers. The CLI experience is identical either way, which is the point - and, depending on your threat model, the catch.
What do Ollama's privacy promises say?
Quote them exactly, because the wording is doing the work. The cloud-models announcement: "Ollama's cloud does not retain your data to ensure privacy and security." The current pricing page: "Prompt or response data is never logged or trained on," and for partners, "we require no logging, no training, and zero data retention policies in place." The privacy policy (updated March 2026) commits to processing cloud content "transiently" and not training on inputs or outputs.
Those are good promises. They're the right promises. They are also promises - policies that depend on Ollama and every partner in the chain honoring them, this quarter and every quarter after. When Turbo launched, the top Hacker News comments made exactly this distinction. One put it plainly: working with any cloud provider means your data "can be subpoenaed just like anyone else's." No-retention policies are a real privacy posture. They are not the same posture as the data never leaving your Mac, and Ollama itself spent years teaching users the difference. It's also worth remembering that Ollama operated without any published privacy policy until users asked where it was in mid-2025 - not sinister for a local-only tool, which it was, but a measure of how new this trust surface is.
Is local Ollama still private?
Yes. Models you run locally with Ollama execute on your machine, and Ollama's privacy policy states it does not collect, store, or have access to your local prompts, responses, or content. The cloud pivot has not changed what local inference does.
That deserves to be said without hedging, because the sloppy version of this article would imply otherwise. Local Ollama inference is still local. Signing in is still optional for local use. If you run llama3.2 on your MacBook today, nothing leaves it.
Why does the direction matter if local stays local?
Because products follow their revenue. Every paid Ollama tier buys cloud usage. The Free tier includes cloud usage too - the on-ramp is built into the default experience. The models with the most headline pull are cloud-only. Engineering attention, support, and roadmap follow the meter, and the meter only runs when your prompts leave the building.
This is the incentive gradient that matters over a horizon of years, and it's why "what does the paid tier buy" is the most clarifying question you can ask about any local-AI tool. When the answer is datacenter GPU time, local inference becomes the freemium funnel for a hosting business. Again: legitimate. Also: a different product than the one whose name became a synonym for local.
There's a precedent worth naming. We wrote about Wispr Flow, a dictation tool whose users assumed on-device processing right up until they discovered otherwise. Ollama's situation is much better - the cloud is explicit, opt-in, and documented. But the lesson transfers: privacy that depends on a vendor's current configuration is one product decision away from being something else. Privacy that depends on architecture isn't.
What's the alternative if you want the line held by architecture?
Pick tools whose paid tiers point at your hardware, not away from it. That's the structural test, and it's the one ToolPiper is built to pass: the app makes zero outbound calls - no telemetry, no account check-ins, no cloud offload. There is no cloud inference tier, and there will not be a quiet one. Inference happens on your Mac or it doesn't happen.
The free tier is the whole runner: the native llama.cpp engine (upstream build b9533, stated publicly), unlimited GGUF downloads stored as plain named files, multi-model switching, the local OpenAI-compatible API, embeddings, and an MCP server with over 300 tools. No account, no caps, no terminal. The paid tiers buy more software running on the same machine - push-to-talk dictation, text-to-speech, Apple Intelligence on the Neural Engine, local RAG, media tools. Nothing in the price list meters a datacenter, so nothing in the roadmap bends toward one.
You don't have to take the no-cloud claim on faith, which is rather the point. Open Activity Monitor, watch the network, run it with Wi-Fi off. A property you can verify beats a policy you have to trust - that's the whole thesis, and it's testable in an afternoon.
Download ToolPiper at modelpiper.com/download. If you're moving off Ollama, the migration guide gets your models out of the blob store first.
Part of our series on verifiable local-first AI. See Local-First AI on macOS for the architecture argument and Ollama vs ToolPiper for the full comparison.
