You just finished an hour-long meeting. Or a lecture. Or a brainstorming session you recorded on your phone. Now you need it in text.
The cloud options are fine until you think about what's in the audio. Client names. Revenue numbers. Medical details. Legal strategy. HR conversations. The moment you upload that audio to Otter.ai or Google's speech API, it exists on someone else's infrastructure, subject to their retention policies and their security posture.
There's a better option: transcribe it on your Mac, locally, with Whisper-class accuracy, and never send the audio anywhere.
How Local Speech-to-Text Actually Works
Speech-to-text models have gotten remarkably good in the last two years. OpenAI's Whisper model was a watershed — it proved that a single model could handle multiple languages, accents, background noise, and domain-specific vocabulary at near-human accuracy.
The models that run locally on your Mac today are derived from that same research lineage. Parakeet, the STT model that ships with ModelPiper's ToolPiper engine, runs on Apple's Neural Engine via CoreML. The Neural Engine is dedicated silicon designed specifically for machine learning workloads — it processes audio faster than real time, meaning a 30-minute recording transcribes in well under 30 minutes.
The quality is real. These aren't the speech-to-text engines of five years ago that turned every third word into gibberish. Modern local models handle crosstalk, accents, filler words, and technical vocabulary with accuracy that's genuinely comparable to cloud services.
The ModelPiper Workflow
Open ModelPiper, load the Voice Input template. Hit record or drag in an audio file. The audio runs through ToolPiper's FluidAudio STT engine on the Neural Engine, and text streams back in real time.
That's the basic workflow. But the real power is what comes next — because ModelPiper's pipeline builder lets you chain this with other blocks.
What People Actually Use This For
Meeting transcription. Record the meeting (with consent), drop the audio into ModelPiper, get a full transcript. No subscription to a cloud transcription service. No uploading confidential business discussions to a third party.
Voice memos to text. Talk into your Mac's microphone when you're thinking through a problem. Get text back. Faster than typing, more complete than the notes you'd take by hand.
Lecture notes. Record a talk, a webinar, a conference session. Get searchable text. Highlight the parts that matter.
Interview transcription. Journalists, researchers, and recruiters all deal with hours of recorded conversation. Local transcription means source confidentiality stays intact.
Accessibility. For anyone who processes audio better as text, or who needs to make audio content searchable and accessible.
Parakeet v3: 25 Languages, One Model
ToolPiper ships with Parakeet v3, which supports 25 European languages including English, Spanish, French, German, Portuguese, Italian, Dutch, and more. Language detection is automatic — you don't need to specify what language is being spoken.
This matters for multilingual meetings, international conference talks, or any scenario where the audio isn't exclusively English. One model handles it all.
Try It
Download ModelPiper, install ToolPiper, and load the Voice Input template. Record something or drop in an audio file. Watch the transcription appear.
Everything runs on your Mac's Neural Engine. The audio never leaves your machine.
This is part of a series on local-first AI workflows on macOS. Next up: Text to Speech — have your Mac read anything aloud with AI-quality voice synthesis.