Is cloud AI going away?

No, and we are not arguing it is. Frontier models with massive context windows and specialized capabilities will stay in the cloud for years. The argument is that the everyday workloads (chat, transcription, dictation, summarization, code help, document Q&A, image processing) are already better served locally for most users, and the share of "everyday" workloads will keep growing as both open models and Apple Silicon improve.

Why Apple Silicon specifically?

Apple is the only company shipping consumer computers with a dedicated Neural Engine, unified memory architecture, and a multi-year track record of generation-over-generation improvements aimed at on-device inference. The M-series chip line is the strongest local AI hardware available outside a workstation today, and the trajectory through M5 and M6 will widen that lead in the consumer segment.

What about Windows or Linux?

Windows is next. Swift has a formal Windows working group, the compiler and standard library both run there, and the portable parts of our stack (inference orchestration, MCP, pipeline runtime, tool registry) already compile cross-platform. The Apple-specific work is the part that matters most for performance: Neural Engine kernels, Metal compute, unified memory awareness. We finish the macOS story first, then take the same Swift codebase to Windows.

Are you saying open source models are as good as GPT-4 or Claude?

For most tasks people use those products for, yes. For frontier reasoning, complex agentic work, and the largest context windows, not yet. The point is the gap is closing, not closed. Two years ago there was no open model anyone would seriously use. Today the 27B-class generation (Qwen 3.6, Kimi 2.6, Gemma 4) is reaching parity for daily work. The same trajectory continues.

What about AI regulation - won't the government just outlaw local models?

Expect a push to license or restrict the most capable models, framed as safety and responsibility. The cloud providers with the worst privacy track records will position themselves as the responsible adults in that conversation, because their hosted APIs are easier to audit than open weights. Open weights are difficult to outlaw without breaking research, education, and the broader software industry. The likely outcome is friction at the frontier and continued availability for everything below it, which, if our chip-trajectory thesis is right, is most of what people actually need.

Why PiperKit Exists: Local AI Is All That's Left

Every major cloud AI provider has spent this year defending itself in mainstream news. Training data lawsuits. Privacy reversals. Pricing changes that broke earlier promises. Account terminations without recourse. Internal staff reading user prompts. These are not fringe complaints from privacy maximalists. They are The Times, The Atlantic, Reuters, Bloomberg.

The pattern is clear enough now that pretending otherwise is a choice.

The trust argument is over

Cloud AI providers built their business on a quiet exchange: you give us your data, we give you a useful tool. For a while that exchange seemed reasonable. The tools were good. The terms of service were boring. Most people did not read them.

Then the headlines started. Training on user conversations. Retention windows that quietly extended. Government data requests honored without notice. Pricing tiers that punished long context. API keys revoked mid-project. Terms updated unilaterally. Output rate-limited based on internal heuristics nobody could see.

None of these were edge cases. They were the operating model becoming visible.

You can argue about the specifics of any one incident. You cannot argue about the pattern. The same companies that promised your data was safe have been forced, by court order, by leak, by their own product changes, to admit otherwise.

Trust, once broken at this scale, does not repair through a blog post. It repairs through architecture.

The pricing argument is going next

The other half of the cloud bargain was that hosted AI would always be cheaper than running your own. For two years that was true. The infrastructure investment, the model weights, the engineering. All of it was concentrated in a handful of labs that could amortize cost across millions of users. Local hardware could not compete on price-per-token.

That window is closing.

Open source models are doing what open source has always done: catching up faster than anyone forecast, then quietly passing the proprietary stack. Llama, Qwen, DeepSeek, Mistral, Gemma. None of these existed two years ago in a form that mattered. Today, a 7B-parameter open model running on a MacBook handles most of the daily-driver tasks people pay $20 a month in cloud subscriptions to do. A 32B model handles most of the rest.

As of this writing, the proof is shipping in real time. Qwen 3.6, Kimi 2.6, and Gemma 4, all roughly in the 27-billion-parameter class, have landed within months of each other and pull within touching distance of frontier cloud models on the benchmarks that matter for daily work. A model that fits in 32GB of unified memory and runs at usable speed on an M-series Mac is now genuinely competitive with what people pay $20 to $200 a month to access remotely.

Project that trajectory across the next two chip generations. If 27B-class models are the current consumer-laptop ceiling and they are already reaching parity for everyday work, 100B-class models become the high-end laptop ceiling within two M-series generations. The open side will be shipping weights that compete with what data centers run today, running on hardware that fits on a desk.

And the open models are released, not licensed. There is no per-token meter on a model running on your laptop. There is no rate limit. There is no API outage. There is no upstream provider quietly downgrading the model you paid for.

The cloud providers are going to keep raising prices to chase frontier compute. Open-weight models are going to keep getting cheaper to run. Those two lines cross. They have already crossed for everyday tasks. They will cross for the hard tasks too.

The cloud is starting to look like a wrapper

Watch what the major cloud AI providers have been shipping lately. It is not better models. It is calendar integration, email summaries, file management, browser agents, code editors, image editors, voice modes, persistent memory, cross-device sync. Traditional software features wrapped around the model.

That is not a coincidence. The model alone has stopped being a moat. When the gap between your model and an open one narrows each release, you have to compete on everything around the model. And that is software, not inference.

There is a name for that posture. It is the posture of a service provider who knows their unique input is becoming a commodity, and who is racing to build a product the commodity cannot run on its own. The pivot is rational. It is also an admission. They know what is coming, and they are trying to even the playing field before the underlying weights stop being theirs to charge for.

Watch the regulatory ask

There is one move left for cloud AI providers to defend their position, and it is not technical. It is regulatory.

Expect a push, already starting, to put the most capable AI models under government control: licensing regimes, registration requirements, capability thresholds, audited safety processes. Each will be framed as responsible. Each will also be framed in terms that conveniently exclude open-weight models, because open weights cannot be licensed, registered, or audited the way a hosted API can.

When that fight reaches its loudest, ask which companies are positioning themselves as the responsible adults in the room. Ask whose track record gets cited as evidence of safe operation. Ask who happens to have the resources to comply with whatever framework gets proposed.

The same companies that have been quietly logging conversations, training on user prompts, terminating accounts without recourse, and reversing privacy promises will be the ones offering themselves as guardians of a regulated AI future. They will arrive with a proven track record of responsible AI control and usage. Watch for the phrase. It is coming.

The thesis is not that this is sinister. It is that it is predictable, and that it is one more reason the value of running your own models on your own hardware, on weights you can read and inspect, is going up, not down.

Apple did not drop the ball

The common take is that Apple has been slow on AI. That take is wrong, and it is wrong in a specific way: it judges Apple's AI strategy by the metrics that matter to OpenAI's strategy.

OpenAI is in the cloud inference business. Their KPI is API calls. Apple is not in that business, and theirs is not. Since the M1 shipped in 2020, Apple has been quietly building the strongest local AI hardware in consumer computing, generation over generation.

The Neural Engine. Unified memory architecture. Metal compute. A 16-core ANE on the Pro tier. Memory bandwidth that scales linearly into the Max and Ultra. The fact that an M2 Max with 32GB of RAM can hold and serve a 32B-parameter model at usable speed is not an accident. It is the result of a decade of betting that on-device intelligence was the right place to invest silicon.

That bet is starting to pay off in public. The next two generations will make it obvious.

What do the next chips do to this argument?

We do not know the exact specs of the M5 Ultra or the M6. We know the trajectory. Each generation has expanded unified memory ceilings, increased ANE throughput, and improved memory bandwidth. The M4 already runs models that needed a discrete GPU two years ago. The M5 Ultra will hold and serve models that today require a workstation with two H100s.

I worked through the bandwidth math publicly in December 2025, before the M5 launched. The M4 Max runs an 8,533 MT/s LPDDR5x bus at 512 bits wide, giving 546 GB/s. Just upgrading to LPDDR6 on the same 512-bit bus puts a Max chip past 900 GB/s. A 1024-bit Ultra at LPDDR6 speeds projects to roughly 1.8 TB/s, more than half of an H100's memory bandwidth, sitting on a desk. The M5 is tracking the curve. The bet is not on a specific peak number; it is on Apple continuing to upgrade memory, which they have done every generation since the M1. The only way the prediction misses is if Apple chooses to stay on LPDDR5x for another cycle, which would be its own kind of news. The live version of this math, including projected M6 Pro/Max/Ultra configurations, is on our Model Fit page. Pick any Mac and see what runs at what speed.

The math becomes uncomfortable for cloud-only providers around the M5 Ultra and unmistakable around the M6. A desktop that runs frontier-class models locally, indefinitely, with no per-token cost, sitting on the desk of every serious knowledge worker, is not a market that gets recaptured.

It is the same shape as the photography market in 2008. Phone cameras were not as good as DSLRs. Then they were good enough. Then they were better for what most people actually did. The professional segment did not disappear, but the mass market never went back.

Why we started here, why we started now

The reason PiperKit exists on macOS first is not because we love Apple. It is because we looked at where local AI was going to land first, and Apple Silicon was sitting at that landing site already. The Neural Engine is shipping. Unified memory is shipping. Metal-optimized inference runtimes are shipping. The hardware is here. The software was the gap.

Most of what gets called "local AI software" today is a wrapper around llama.cpp with a chat box. That is a starting point, not a product. People who already know what they want will tolerate it. Everyone else will not.

What was missing is everything that turns inference into something you would actually use:

One app instead of seven
Voice in, voice out, vision, OCR, image and video upscale, document parsing, all running locally, all coordinated
An MCP surface so other tools can use your local AI without shipping data anywhere
A pipeline builder that does not require Python
Models that load on demand, share memory intelligently, and unload when something else needs the GPU
A developer story that includes browser automation, testing, and code search, with none of it phoning home

That is what we are building. ToolPiper, VisionPiper, AudioPiper, PiperSR, PiperMatch, PiperTest, PiperProbe, PiperScrape. Each is a tool we wanted to exist on local hardware and could not find. So we built them, and we built them as a single coordinated stack because that is the only way the experience is competitive with what cloud platforms ship.

Concretely, what is shipping today: a unified local server exposing 147 MCP tools across 26 system-action domains. A 453K-parameter open-source super-resolution model we trained from scratch, running at 44 FPS on the M4 Max Neural Engine. An on-device tool-retrieval model fine-tuned to 181/181 top-5 on our test battery. Accessibility-tree-first browser automation. Push-to-talk dictation and command. An HNSW vector store with semantic RAG. Structured tool schemas with per-provider adaptation. None of it phones home.

Why Swift, and why Windows is next

We bet on Swift, and we bet specifically on shipping native Swift apps rather than wrapping a JavaScript runtime around a local model. The reason is the same as the rest of the thesis: when latency budgets are tight and the model is sitting on the chip in your laptop, the bridge layer between the model and the user matters. Swift gives us first-class access to Apple's hardware (Neural Engine kernels, Metal compute, Accelerate, Core ML) without the marshaling overhead other runtimes pay. That is what makes a 44 FPS video upscale on the M4 Max possible. It is also what makes voice-in, voice-out feel like a conversation instead of a transaction.

That choice used to be a tradeoff with portability. It is starting not to be.

Swift has a formal Windows working group. The compiler runs on Windows. The standard library runs on Windows. The cross-platform toolchain is improving release over release, with serious investment from the Swift project itself, because the language is no longer being treated as a Mac-only tool. Our codebase is along for that ride. The parts of PiperKit that are not Apple-platform-specific (the inference orchestration, the MCP server, the pipeline runtime, the test format, the tool registry, large portions of model management) are already written in portable Swift. They will run on Windows when we are ready to ship there.

That is the order of operations. Finish the macOS story. Then take the same stack to Windows, where the on-device AI tailwind is starting to blow on that side too.

What we are not saying

We are not saying cloud AI is over today. Frontier models still run in data centers. Some workloads (long-running agent runs, multi-million-token context, specialized models that have no open equivalent yet) will stay in the cloud for years. We use those workloads when they are the right tool, and our products integrate with them.

We are saying the trajectory is clear. We are saying the trust has been spent and there is no easy way for the providers to earn it back. We are saying the cost curves favor local. We are saying the hardware is already here for most tasks and will be here for the rest within two chip generations.

If that thesis is right, the question is not whether local AI on Apple Silicon becomes the default. It is who builds the software that makes it usable when the hardware finally outpaces the cloud bargain.

What we are building toward

Every part of PiperKit is built on the assumption that the unit of AI compute is going to be the chip on your desk, not the chip in someone else's data center. That assumption shapes every decision we make:

We optimize for the Neural Engine because that is the hardware that scales. Native Swift apps, because the bridge layer between model and user matters when latency budgets are tight. MCP everywhere, because agents will run locally too. PiperSR and PiperMatch are our own on-device models, fine-tuned for the chips they run on. The cost-quality frontier is moving toward small specialized models that know their hardware, not bigger general ones in someone else's data center.

None of that is a moonshot. It is a product roadmap that becomes more obvious each chip generation. We are shipping it now because the people who care most, the ones who already feel the cloud bargain breaking, are the people we want to build with first.

Local AI is what is left when the trust runs out and the prices keep climbing. That is not a problem. It is the most interesting place a software company could be working right now, and probably the most interesting place for the next decade. PiperKit exists to build for it: on the hardware Apple is about to ship, on weights you can read, with the data never leaving your machine.

PiperKit LLC builds ModelPiper, ToolPiper, VisionPiper, AudioPiper, and PiperSR. Free tier covers chat and transcription. Pro is $10 a month. Everything runs on your Mac.