You're on a video call with a colleague in São Paulo. Your Portuguese is limited to "obrigado." Their English is functional but slow. The conversation is productive but exhausting for both of you.

Google Translate exists. Apple Translate exists. But both require typing, tapping, or holding your phone up to the screen. And both send everything through their servers.

A live translation pipeline running locally on your Mac changes this interaction completely. Speak English. Hear Portuguese. Or the reverse. Real-time, on-device, no cloud.

How the Pipeline Works

Live translation chains three models, same as voice chat but with a translation step in the middle.

Stage 1: Speech-to-Text. Your spoken words are transcribed. Parakeet v3 handles 25 languages — it detects which language you're speaking automatically.

Stage 2: Translation (LLM). The transcribed text is sent to a language model with a translation prompt. Modern LLMs are surprisingly good at translation — they handle idioms, context, and natural phrasing better than traditional machine translation models because they understand meaning, not just word-for-word substitution.

Stage 3: Text-to-Speech. The translated text is spoken aloud in the target language.

The result: you speak in your language, and the translation plays through your speakers (or headphones) in the target language.

The ModelPiper Workflow

Load the Live Translate template. It's pre-wired: Audio Capture → STT → LLM (translation) → TTS → Response.

The LLM block has a system prompt configured for translation. By default, it translates to English, but you change the target language by editing the prompt — "Translate the following text to Brazilian Portuguese" or whatever you need.

Practical Use Cases

International business calls. Not everyone on a global team speaks the same language fluently. A local translation pipeline running alongside your video call gives you real-time support without installing third-party plugins or routing audio through cloud services.

Travel preparation. Practice conversations in a language you're learning. Speak English, hear the translation, repeat it back. The STT will transcribe your attempt so you can compare.

Content localization. Have a script or presentation in English that needs to exist in another language? Speak it through the pipeline and get both a written translation and an audio version.

Document translation with voice output. Paste text into the input block instead of using audio capture, and the pipeline translates and speaks it. Useful for reading foreign-language emails or documents aloud in a language you understand.

Why Local Translation Matters

Translation services see everything. Every sentence you translate through Google Translate is data that Google uses. For casual translations — restaurant menus, road signs — that's fine. For business communications, legal documents, or private conversations, it's not.

Local translation means the content of your conversations stays on your machine. No logs. No data retention. No third-party access.

Language Quality

A fair question: are local models as good at translation as Google Translate or DeepL?

For major language pairs — English to Spanish, French, German, Portuguese, Chinese, Japanese — a 3B-parameter model is genuinely good. It handles conversational language, technical terminology, and idiomatic expressions well. It's not perfect, and it occasionally misses nuance, but for practical communication it's more than adequate.

For less common language pairs, cloud services still have an edge because they're trained on more parallel data. But the gap is closing fast.

Try It

Download ModelPiper, install ToolPiper, and load the Live Translate template. Edit the translation prompt to your target language. Speak something, and hear the translation.

Your words stay on your Mac. The translation happens on your hardware.

This is part of a series on local-first AI workflows on macOS. Next up: Voice Cloning — replicate any voice, entirely on your Mac.