---
title: "Transcribe and Summarize on Mac: Audio In, Key Points Out"
description: "Drop a meeting recording into ModelPiper and get a structured summary - decisions, action items, key points - without uploading confidential audio to any cloud service."
date: 2026-03-08
author: "Ben Racicot"
tags: ["Transcription", "Speech to Text", "Text Generation", "Privacy", "macOS", "Meetings", "Productivity"]
type: "article"
canonical: "https://modelpiper.com/blog/transcribe-summarize-mac/"
---

# Transcribe and Summarize on Mac: Audio In, Key Points Out

> Drop a meeting recording into ModelPiper and get a structured summary - decisions, action items, key points - without uploading confidential audio to any cloud service.

## TL;DR

Drop a meeting recording into ModelPiper and get a structured summary - decisions, action items, key points - without uploading confidential audio to any cloud service. Two local models chain together: speech-to-text transcribes, then an LLM summarizes.

You have a 45-minute meeting recording. **You don't need a 12-page transcript. You need the five decisions that were made, the three action items assigned, and the one thing everyone disagreed about.**

Cloud services like Otter.ai and Fireflies do this - they transcribe, then summarize. They're also listening to every word of your meeting, storing it on their servers, and processing it under terms of service that give them broad rights to use the data for product improvement.

For internal strategy meetings, HR conversations, legal discussions, or any recording with confidential content, that's a problem. Not a theoretical one - a practical one that compliance teams and privacy-conscious individuals think about constantly.

The same workflow runs locally. Audio goes in, key points come out, and the recording never leaves your machine.

## How does the transcribe-and-summarize pipeline work?

This workflow chains two AI models in sequence.

**Step 1: Speech-to-Text.** The audio recording is transcribed by Parakeet, running on the Neural Engine. This produces a full text transcript - every word spoken, in order.

**Step 2: Summarization.** The transcript is passed to a language model (Llama, Qwen, etc.) with a summarization prompt. The LLM extracts the key points, decisions, action items - whatever you ask for.

The elegance is in the chaining. You don't manually copy the transcript and paste it into a chat window. The pipeline feeds the output of Step 1 directly into Step 2.

## How do you transcribe and summarize in ModelPiper?

Load the **Transcribe & Summarize** template. It's pre-wired: Audio Capture → STT → LLM → Response.

Record directly or drag in an audio file. The STT engine transcribes it, then the LLM processes the transcript according to its system prompt. The default prompt extracts a structured summary, but you can customize it - ask for action items only, a bullet-point recap, a formal meeting minutes format, or whatever you need.

## How do you customize the summary format?

The LLM's system prompt controls what comes out. Some useful variations:

**Meeting minutes format:** "Extract attendees, decisions made, action items with owners, and unresolved questions. Format as formal meeting minutes."

**Key decisions only:** "List only the decisions that were made in this meeting. Ignore discussion, small talk, and tangents. Be concise."

**Client-ready summary:** "Summarize this conversation for a client update email. Professional tone, focus on outcomes and next steps."

**Technical review:** "Extract all technical decisions, architecture choices, and implementation commitments. Flag any disagreements or unresolved technical questions."

You edit the system prompt directly in the LLM block. No code, no configuration files - it's a text field in the visual pipeline editor.

## Why does privacy matter for meeting recordings?

**Meeting recordings are some of the most sensitive content in any organization.** They contain unguarded opinions, salary discussions, strategic plans, personnel decisions, competitive intelligence, and casual remarks that were never meant to be documented.

Uploading them to a cloud transcription service means those recordings exist on third-party infrastructure. Local transcription and summarization means they don't. The audio file stays on your disk. The transcript exists in your app. The summary is on your screen. That's it.

## Try It

Download [ModelPiper](https://modelpiper.com), install ToolPiper, and load the Transcribe & Summarize template. Record something or drop in a file. Edit the system prompt to get the summary format you want.

Audio, transcript, and summary - all on your Mac. Nothing else involved.

_This is part of a series on [local-first AI workflows on macOS](/blog/local-first-ai-macos). Next up: [Live Translation](/blog/live-translation-mac-local) - speak in one language, hear the translation spoken back._

## Steps

### 1. Install ToolPiper and load the template

Install ToolPiper from modelpiper.com/download or modelpiper.com. Open ModelPiper and load the Transcribe & Summarize template - it pre-wires Audio Capture → STT → LLM → Response.

### 2. Record or drop in an audio file

Record directly through ModelPiper's Audio Capture block, or drag in an existing recording (MP3, WAV, M4A). The STT engine accepts common audio formats and handles multiple languages automatically.

### 3. Customize the summary prompt

Edit the LLM block's system prompt to control the output format. Ask for meeting minutes, key decisions only, action items with owners, or a client-ready executive summary. The default extracts a structured summary.

### 4. Run the pipeline

Hit run. The audio transcribes on the Neural Engine, then the transcript passes to the LLM for summarization. Both the full transcript and the summary appear in the output - you can access either one.

## FAQ

### Can I customize what the summary extracts?

Yes. The LLM's system prompt controls the output format. You can ask for meeting minutes with attendees and action items, key decisions only, a client-ready executive summary, or a technical review focusing on architecture choices. Edit the prompt directly in the visual pipeline editor - no code required.

### How long does it take to transcribe and summarize a one-hour meeting?

Transcription runs faster than real time on the Neural Engine - a 60-minute recording typically transcribes in 15-30 minutes depending on your Mac's chip. Summarization by the LLM adds another 10-30 seconds. Total processing time is significantly less than the recording length.

### Can I transcribe and summarize in languages other than English?

Yes. Parakeet v3 supports 25 European languages with automatic detection. The LLM can summarize in the original language or translate the summary to a different language. For a multilingual meeting, the pipeline handles mixed-language audio automatically.

### Is the full transcript preserved or just the summary?

Both. The pipeline shows the full transcript as intermediate output and the summary as the final output. You can access, copy, or save either one. The transcript is useful for searching specific quotes or verifying the summary's accuracy.