Drop an image. The AI describes what's in it. Then it reads that description aloud.

That's the Image Narrator workflow in one sentence. It sounds simple, but the use cases are more interesting than you'd expect.

What This Actually Does

The Image Narrator chains two models. First, a vision-capable language model analyzes the image and generates a text description. Then, a text-to-speech engine reads that description aloud.

The vision model doesn't just list objects — it understands context, relationships, composition, and meaning. Show it a chart, and it describes the trend. Show it a photo, and it narrates the scene. Show it a diagram, and it explains the structure.

Adding TTS on top turns this from a text output into an audio experience. You can process images hands-free — drop them in, listen to the descriptions while doing something else.

The ModelPiper Workflow

Load the Image Narrator template. It's pre-wired: Text prompt + Image → Vision Model → TTS → Response (auto-play).

Drag in an image or capture one with VisionPiper. The text prompt controls what kind of description you get — you can ask for a brief summary, a detailed analysis, a creative interpretation, or a specific extraction.

Use Cases

Accessibility. For users with visual impairments, image narration provides an audio description of visual content that would otherwise be inaccessible. Photos, charts, infographics, screenshots — all described and spoken aloud.

Content creation. Generate alt text for blog images, create audio descriptions for video content, produce image-based podcast segments. Drop in an image, get a description, edit if needed, and you have professional alt text or narration copy.

Photo review. Going through a large batch of photos — from a shoot, a trip, a project — and want a quick audio summary of each one? Drop them through the pipeline and listen while you sort.

Education. Diagrams, illustrations, maps, and visual aids can be narrated automatically for study materials or accessible course content.

Data visualization narration. Charts and graphs are inherently visual. For reports that need to be accessible or consumable as audio, the Image Narrator can describe what a chart shows — trends, outliers, comparisons — and speak it aloud.

Customizing the Description

The text prompt in the pipeline controls the output style:

Alt text: "Describe this image concisely for use as alt text. Focus on what's visually important."

Detailed analysis: "Analyze this image in detail. Describe the composition, the subjects, the setting, any text visible, and the overall mood."

Technical description: "Describe the technical content of this image. If it's a diagram, explain the structure. If it's a chart, describe the data."

Creative narration: "Narrate this image as if you're describing a scene in a documentary. Be evocative and vivid."

Different prompts, same pipeline, dramatically different outputs.

Try It

Download ModelPiper, install ToolPiper, and load the Image Narrator template. Drop in an image, and listen.

The image is analyzed on your GPU. The voice is synthesized on your hardware. Nothing uploaded.

This is part of a series on local-first AI workflows on macOS. Next up: Deep Reasoning — complex problem-solving with reasoning models, locally.