Is on-device dictation accurate enough for daily use?

For standard English in reasonable environments, yes. The Parakeet model delivers Whisper-class accuracy on the Neural Engine. You'll notice a gap with heavy accents, noisy rooms, or specialized terminology - but for emails, messages, notes, and code comments, it's indistinguishable from cloud.

Does ToolPiper comply with HIPAA?

ToolPiper processes everything on-device. No audio or text leaves your Mac, so there's no third-party data processor to execute a BAA with. That said, consult your compliance team for your specific requirements. The architecture is privacy-by-design - no data to leak because no data leaves the machine.

How does this compare to Apple's on-device dictation?

Apple's on-device mode is genuinely local, but uses a smaller model with noticeably lower accuracy. It also can't do voice commands, has no push-to-talk hotkey (just the clunky fn-fn toggle), and can't be extended with custom actions. ToolPiper uses a different STT model (Parakeet) on the same Neural Engine hardware, with better accuracy and 142 system actions on top.

What happens if I'm offline?

Nothing changes. After the one-time model download, ToolPiper works identically with Wi-Fi on or off. The entire pipeline - audio capture, STT, text insertion, voice commands - runs on local hardware.

Private Voice Dictation on Mac: No Cloud, No Data Collection, No Compromise

Open your Mac's dictation settings and check whether on-device processing is enabled. If it's not - and for most users, it's not - every word you dictate gets sent to Apple's servers. That's Apple's default behavior. Third-party dictation apps are worse. Wispr Flow sends audio to OpenAI and Meta. Otter.ai sends audio to its own cloud. Google Docs voice typing routes through Google's servers. None of them offer a true on-device option.

Your audio is someone else's data. That's the default state of voice input on every Mac in 2026.

Why every dictation app uses the cloud

Cloud STT models are larger, trained on more data, and generally more accurate than local models - especially for edge cases like heavy accents, background noise, and domain-specific jargon. Running a large model in a data center is cheaper per inference because the cost is amortized across millions of users.

For the companies building these products, cloud processing has a second advantage: data. Audio recordings, transcription corrections, and usage patterns are valuable for model improvement. Wispr Flow's initial privacy policy allowed using user content for AI training by default. After a public controversy in 2025, they changed this to opt-in. But the audio still goes to the cloud regardless of your training preference.

The business incentive is clear. Cloud processing enables subscription pricing (ongoing server costs justify ongoing charges), provides training data, and allows using the largest possible models. The user's incentive is different - they want their words transcribed accurately and privately. These incentives aren't aligned.

What changed: the Neural Engine

Every Mac with Apple Silicon has a Neural Engine - dedicated hardware designed specifically for machine learning inference. Not a GPU. Not a CPU core repurposed for AI. A separate processing unit optimized for the matrix operations neural networks require.

Most of the time, the Neural Engine sits idle. Your CPU handles your apps, your GPU handles graphics, and the Neural Engine waits. Speech-to-text is a near-perfect workload for it - relatively small models, real-time audio processing, latency-sensitive results.

FluidAudio's Parakeet TDT V3 is a 0.6B parameter STT model compiled for the Neural Engine. It processes speech at roughly 210x realtime - a 10-second utterance transcribes in under 50 milliseconds. With the model loaded as a keep-warm backend, end-to-end latency from key release to text at cursor is approximately 140 milliseconds. That's faster than cloud dictation, which adds 200-500ms of network overhead before processing even begins.

Local isn't just more private. It's faster.

What private dictation actually means

"Private" gets overused in tech marketing. Here's what it means concretely for voice dictation.

When you speak into ToolPiper, the audio is captured by your Mac's microphone, processed by the Parakeet model on the Neural Engine, and the resulting text is inserted at your cursor. At no point does audio data leave your machine. There's no network request, no server, no cloud fallback. If your Wi-Fi is off, dictation works identically.

Some dictation apps capture screenshots of your active window to provide "context-aware" formatting. That means images of your screen - whatever sensitive document, codebase, or communication is visible - get sent to external servers alongside your audio. ToolPiper doesn't capture your screen. It doesn't read your window contents. It transcribes your voice and inserts the text. That's all.

There's no usage telemetry, no analytics, no crash reporting to external services. You don't create an account, provide an email, or agree to terms that grant data processing rights. You download a DMG, install it, and use it.

Who needs this

Some people want privacy as a matter of principle. For many others, it's a professional requirement.

Attorneys dictating case notes, contract terms, or legal strategy into a cloud-based transcription service create a data exposure most compliance teams would flag immediately. Healthcare workers face similar constraints - HIPAA requires a BAA with any service that processes protected health information, and most dictation apps don't offer one. Developers dictating about proprietary software may be leaking trade secrets, unreleased feature names, or security-sensitive details through a third party's infrastructure.

Financial professionals, journalists protecting sources, researchers handling unpublished findings - the list goes on. But you don't need to be in a regulated industry. Most people, if asked directly, "would you like your dictation app to record your screen and send the screenshot to a server alongside your audio?" would say no. The default should be privacy. Cloud processing should be the opt-in.

The bigger picture: AI shouldn't live in someone else's data center

Private dictation isn't a niche concern. It's the leading edge of a larger question about where your AI runs.

The current model - upload your data, pay monthly, trust the provider - exists because local hardware couldn't match cloud inference quality. That gap is closing fast. STT proves it. Voice input is latency-critical. Sending audio over the wire adds hundreds of milliseconds before processing even starts. On-device STT on the Neural Engine finishes the entire job in 140 milliseconds. The local version isn't just more private - it's physically faster because there's no network to traverse.

Think about what a $12/month cloud STT subscription (at the time of this writing) actually buys you. Someone else's GPU time to run a model that fits on hardware you already own. Your Mac's Neural Engine processes speech at 210x realtime. You paid for that silicon at the Apple Store. Paying again, monthly, for a cloud service to do the same work slower while capturing your audio - that's a transitional pricing model, not an enduring one.

Local models get smaller and faster every quarter. The accuracy gap narrows with each generation. In a few years, paying a subscription for speech-to-text will seem as dated as paying per minute for long-distance calls. The capability was always going to move to the device.

The accuracy trade-off, honestly

The single biggest argument for cloud dictation is accuracy. Cloud models have more parameters, more training data, and more compute per inference. For difficult conditions - heavy accents, noisy environments, rare vocabulary - cloud models are measurably better.

The Parakeet model on the Neural Engine delivers Whisper-class accuracy for standard English in reasonably quiet environments. For the majority of use cases - writing emails, taking notes, drafting messages, commenting code - local accuracy is indistinguishable from cloud. Where you'll notice a difference is heavy accents, non-native English, noisy environments, specialized medical or legal terminology, and languages beyond the primary 25 that Parakeet supports.

For those edge cases, cloud models have an advantage. The question is whether that margin is worth sending all your audio - including the 90% that local models handle perfectly - to external servers. For most users, it isn't.

How ToolPiper handles private dictation

ToolPiper is a macOS app with two voice input modes, both fully on-device.

Push-to-talk dictation (Right Option) - hold the key, speak naturally, release. Parakeet transcribes on the Neural Engine and inserts text at your cursor in whatever app is focused. 140ms end-to-end. Works offline.

Push-to-command (Right Command) - hold the key, speak a natural language instruction, release. The STT engine transcribes, a local LLM interprets the command against 26 action domains, and your Mac executes it. "Turn on dark mode." "Set volume to fifty percent." "Snap this window to the left half." All local.

Both modes ship in ToolPiper Pro ($10/month). The STT engine and local LLM underneath are part of the free runner - transcription and chat cost nothing and need no account; Pro is what puts them behind a push-to-talk key in every app.

Data flow comparison

App	Processing	Audio sent to cloud	Screenshots captured	Account required	Price
ToolPiper	On-device (Neural Engine)	No	No	No	Pro ($10/month)
Wispr Flow	Cloud (OpenAI/Meta)	Yes, always	Yes	Yes	$12/month
Apple Dictation (default)	Cloud (Apple)	Yes, by default	No	Apple ID	Free
Apple Dictation (on-device)	On-device	No	No	Apple ID	Free
Otter.ai	Cloud	Yes	No	Yes	$10-25/month
Whisper.cpp	On-device	No	No	No	Free (CLI only)

Setup

Download ToolPiper (DMG, direct download). Install ToolPiper from modelpiper.com to get the STT engine. Grant Accessibility permission when macOS prompts you. Hold Right Option, speak, release. Text appears at your cursor. Everything ran on your Mac.

ToolPiper is part of the ModelPiper family of local AI tools for Mac. See also: Wispr Flow Alternative, Push-to-Talk AI on Mac, Voice Coding on Mac.