Your Mac has had text-to-speech built in since the 1980s. "Hello, I am Macintosh" was the original demo. The problem is that Apple's built-in voices still sound like they're from 2012 — robotic cadence, flat intonation, the uncanny valley of voice synthesis.
Modern AI text-to-speech is different. The voices sound human. They handle emphasis, pacing, and natural pauses. They don't stumble over acronyms or mispronounce technical terms the way rule-based systems do.
And now they run locally, on your Mac's GPU, without sending your text to any cloud service.
Why This Matters Beyond Accessibility
Text-to-speech isn't just an accessibility feature (though it's an important one). There are practical daily use cases that most people don't consider:
Proofreading by ear. Reading your own writing silently, your brain auto-corrects errors. Hearing it read aloud exposes awkward phrasing, missing words, and rhythm problems immediately. Professional writers and editors have used this technique for decades.
Consuming long documents hands-free. A 20-page report you don't have time to read becomes a 30-minute listen during your commute or workout.
Learning and retention. Hearing information engages different memory pathways than reading. For studying, reviewing notes, or absorbing new material, audio reinforcement helps.
Content creation. Narrate blog posts, create audio versions of written content, produce voiceovers for demos — all without recording yourself or paying a voice actor.
How It Works on Your Mac
ToolPiper bundles two TTS backends. FluidAudio TTS runs on Apple's Neural Engine via CoreML — fast, efficient, good quality. MLX Audio TTS runs on the Metal GPU — higher quality voices with more natural prosody, at the cost of slightly more compute.
Both run entirely on your hardware. The text you synthesize never leaves your machine. You can feed it confidential documents, personal notes, draft emails to a difficult client — it doesn't matter, because there's no server on the other end.
The ModelPiper Workflow
Open ModelPiper, load the Text to Speech template. Type or paste text. Hit run. Audio plays back immediately, and you can download the result as a file.
The response block auto-plays the generated audio. For longer texts, synthesis streams — you start hearing the first sentence while the rest is still being generated.
Combining With Other Workflows
Because ModelPiper is a pipeline builder, TTS isn't a dead end — it's a building block. The real utility comes from chaining it with other workflows:
Transcribe & Read: Drop in an audio file → transcribe with STT → clean up with an LLM → read back with TTS. Useful when you have a rough recording and want a polished audio version.
Translate & Speak: Type in English → translate with an LLM → speak the translation with TTS. Instant multilingual audio output.
Summarize & Narrate: Paste a long document → summarize with an LLM → speak the summary with TTS. Turn a 20-page PDF into a 3-minute audio brief.
These aren't hypothetical — they're pipeline templates you can build in ModelPiper's visual editor by connecting blocks.
Try It
Download ModelPiper, install ToolPiper, and load the Text to Speech template. Paste something, hit run, and listen.
Your text stays on your Mac. The voice is generated on your hardware.
This is part of a series on local-first AI workflows on macOS. Next up: Voice Chat — talk to an AI on your Mac and hear it respond, entirely locally.