Open your Mac's dictation settings and check whether on-device processing is enabled. If it's not - and for most users, it's not - every word you dictate gets sent to Apple's servers. That's Apple's default behavior. Third-party dictation apps are worse. Wispr Flow sends audio to OpenAI and Meta. Otter.ai sends audio to its own cloud. Google Docs voice typing routes through Google's servers. None of them offer a true on-device option.
Your audio is someone else's data. That's the default state of voice input on every Mac in 2026.
Why every dictation app uses the cloud
Cloud STT models are larger, trained on more data, and generally more accurate than local models - especially for edge cases like heavy accents, background noise, and domain-specific jargon. Running a large model in a data center is cheaper per inference because the cost is amortized across millions of users.
For the companies building these products, cloud processing has a second advantage: data. Audio recordings, transcription corrections, and usage patterns are valuable for model improvement. Wispr Flow's initial privacy policy allowed using user content for AI training by default. After a public controversy in 2025, they changed this to opt-in. But the audio still goes to the cloud regardless of your training preference.
The business incentive is clear. Cloud processing enables subscription pricing (ongoing server costs justify ongoing charges), provides training data, and allows using the largest possible models. The user's incentive is different - they want their words transcribed accurately and privately. These incentives aren't aligned.
What changed: the Neural Engine
Every Mac with Apple Silicon has a Neural Engine - dedicated hardware designed specifically for machine learning inference. Not a GPU. Not a CPU core repurposed for AI. A separate processing unit optimized for the matrix operations neural networks require.
Most of the time, the Neural Engine sits idle. Your CPU handles your apps, your GPU handles graphics, and the Neural Engine waits. Speech-to-text is a near-perfect workload for it - relatively small models, real-time audio processing, latency-sensitive results.
FluidAudio's Parakeet TDT V3 is a 0.6B parameter STT model compiled for the Neural Engine. It processes speech at roughly 210x realtime - a 10-second utterance transcribes in under 50 milliseconds. With the model loaded as a keep-warm backend, end-to-end latency from key release to text at cursor is approximately 140 milliseconds. That's faster than cloud dictation, which adds 200-500ms of network overhead before processing even begins.
Local isn't just more private. It's faster.
What private dictation actually means
"Private" gets overused in tech marketing. Here's what it means concretely for voice dictation.
When you speak into ActionPiper, the audio is captured by your Mac's microphone, processed by the Parakeet model on the Neural Engine, and the resulting text is inserted at your cursor. At no point does audio data leave your machine. There's no network request, no server, no cloud fallback. If your Wi-Fi is off, dictation works identically.
Some dictation apps capture screenshots of your active window to provide "context-aware" formatting. That means images of your screen - whatever sensitive document, codebase, or communication is visible - get sent to external servers alongside your audio. ActionPiper doesn't capture your screen. It doesn't read your window contents. It transcribes your voice and inserts the text. That's all.
There's no usage telemetry, no analytics, no crash reporting to external services. You don't create an account, provide an email, or agree to terms that grant data processing rights. You download a DMG, install it, and use it.
Who needs this
Some people want privacy as a matter of principle. For many others, it's a professional requirement.
Attorneys dictating case notes, contract terms, or legal strategy into a cloud-based transcription service create a data exposure most compliance teams would flag immediately. Healthcare workers face similar constraints - HIPAA requires a BAA with any service that processes protected health information, and most dictation apps don't offer one. Developers dictating about proprietary software may be leaking trade secrets, unreleased feature names, or security-sensitive details through a third party's infrastructure.
Financial professionals, journalists protecting sources, researchers handling unpublished findings - the list goes on. But you don't need to be in a regulated industry. Most people, if asked directly, "would you like your dictation app to record your screen and send the screenshot to a server alongside your audio?" would say no. The default should be privacy. Cloud processing should be the opt-in.
The bigger picture: AI shouldn't live in someone else's data center
Private dictation isn't a niche concern. It's the leading edge of a larger question about where your AI runs.
The current model - upload your data, pay monthly, trust the provider - exists because local hardware couldn't match cloud inference quality. That gap is closing fast. STT proves it. Voice input is latency-critical. Sending audio over the wire adds hundreds of milliseconds before processing even starts. On-device STT on the Neural Engine finishes the entire job in 140 milliseconds. The local version isn't just more private - it's physically faster because there's no network to traverse.
Think about what a $15/month cloud STT subscription actually buys you. Someone else's GPU time to run a model that fits on hardware you already own. Your Mac's Neural Engine processes speech at 210x realtime. You paid for that silicon at the Apple Store. Paying again, monthly, for a cloud service to do the same work slower while capturing your audio - that's a transitional pricing model, not an enduring one.
Local models get smaller and faster every quarter. The accuracy gap narrows with each generation. In a few years, paying a subscription for speech-to-text will seem as dated as paying per minute for long-distance calls. The capability was always going to move to the device.
The accuracy trade-off, honestly
The single biggest argument for cloud dictation is accuracy. Cloud models have more parameters, more training data, and more compute per inference. For difficult conditions - heavy accents, noisy environments, rare vocabulary - cloud models are measurably better.
The Parakeet model on the Neural Engine delivers Whisper-class accuracy for standard English in reasonably quiet environments. For the majority of use cases - writing emails, taking notes, drafting messages, commenting code - local accuracy is indistinguishable from cloud. Where you'll notice a difference is heavy accents, non-native English, noisy environments, specialized medical or legal terminology, and languages beyond the primary 25 that Parakeet supports.
For those edge cases, cloud models have an advantage. The question is whether that margin is worth sending all your audio - including the 90% that local models handle perfectly - to external servers. For most users, it isn't.
How ActionPiper handles private dictation
ActionPiper is a free macOS menu bar app with two voice input modes, both fully on-device.
Push-to-talk dictation (Right Option) - hold the key, speak naturally, release. Parakeet transcribes on the Neural Engine and inserts text at your cursor in whatever app is focused. 140ms end-to-end. Works offline.
Push-to-command (Right Command) - hold the key, speak a natural language instruction, release. The STT engine transcribes, a local LLM interprets the command against 26 action domains, and your Mac executes it. "Turn on dark mode." "Set volume to fifty percent." "Snap this window to the left half." All local.
ActionPiper requires ToolPiper (free, Mac App Store) for the STT engine and local LLM.
Data flow comparison
| App | Processing | Audio sent to cloud | Screenshots captured | Account required | Price |
|---|---|---|---|---|---|
| ActionPiper | On-device (Neural Engine) | No | No | No | Free |
| Wispr Flow | Cloud (OpenAI/Meta) | Yes, always | Yes | Yes | $15/month |
| Apple Dictation (default) | Cloud (Apple) | Yes, by default | No | Apple ID | Free |
| Apple Dictation (on-device) | On-device | No | No | Apple ID | Free |
| Otter.ai | Cloud | Yes | No | Yes | $10-25/month |
| Whisper.cpp | On-device | No | No | No | Free (CLI only) |
Setup
Download ActionPiper (DMG, direct download). Install ToolPiper from the Mac App Store (free) to get the STT engine. Grant Accessibility permission when macOS prompts you. Hold Right Option, speak, release. Text appears at your cursor. Everything ran on your Mac.
ActionPiper is part of the ModelPiper family of local AI tools for Mac. See also: Wispr Flow Alternative, Push-to-Talk AI on Mac, Voice Coding on Mac.