Free on Mac App Store

Screen capture that talks to AI.

Capture any region, record video, export GIFs, and stream live to vision models. A steerable camera for your screen.

VisionPiper is a standalone menu bar app for macOS that captures screen regions with precision borders, records H.264 video, trims and exports as GIF/WebP/MP4, and streams frames over WebSocket to ToolPiper's vision models at 30fps. Think of it as a screen-mounted camera you can point at anything.

Region Selection

Precision Screen Capture

8-Window Border

Four edges + four corners, pass-through interior. Click through the capture region to interact with everything inside it normally.

Movable While Recording

Drag the border to follow content while recording is active. A steerable camera that tracks multi-step workflows across panels.

Region Persistence

Last capture region is remembered between sessions. Launch VisionPiper and your previous region is already set.

Keyboard Shortcuts

Quick capture, toggle recording, adjust region from keyboard. Fast enough for rapid bug documentation.

Multi-Monitor

Works across all connected displays. Place the capture region on any monitor regardless of resolution or scaling.

High DPI

Retina-aware capture at native resolution. Every pixel captured at the display's actual density.

Recording & Export

H.264 Video

SCStream + AVAssetWriter for efficient, hardware-accelerated recording. Captures the selected region as .mp4 with minimal CPU overhead.

GIF Export

Built-in cgif + libimagequant for high-quality, small-file GIFs. Allocates color information to changing pixels, not static backgrounds.

WebP Export

Modern format for web-ready screen captures. Full color and alpha channel support, smaller files than GIF.

Trim Editor

Cut start and end of recordings before export. No external editor needed. Preview, trim, and export in one step.

ToolPiper Integration

Live AI Streaming

30fps WebSocket Stream

Metal JPEG frames streamed to ToolPiper on port 10000. Hardware-accelerated encoding keeps CPU usage low during continuous streaming.

Screen Q&A

Ask AI questions about what's on your screen in real time. The vision model sees exactly what you see, updated every frame.

Image Narration

Vision model describes the screen content, TTS reads it aloud. An accessibility workflow that works with any on-screen content.

OCR Pipeline

Extract text from any screen region using Apple Vision OCR. No cloud API, no rate limits. Runs entirely on-device.

VisionPiper streaming screen content to a vision model for analysis

VisionPiper vs Screen Capture Tools

VisionPipermacOS ScreenshotCleanShot XKap
Region selectionClick-through 8-window border, persistedCrosshair selection, not savedCrosshair + pinned overlayFull screen or window only
Record videoH.264 .mp4, hardware-acceleratedQuickTime (separate app)H.264, HEVC, GIFH.264, WebM, GIF
GIF exportBuilt-in, optimized cgif + libimagequantNoYes, with compressionYes, basic
Trim editorBuilt-in, cut start/end before exportNo (requires iMovie/QuickTime)YesYes
AI streaming30fps WebSocket to local vision modelsNoNoNo
Movable regionDrag border during recordingNoNoNo
PriceFreeFree (built-in)$29 one-timeFree (open source)
AI integrationToolPiper vision models, OCR, narrationNoneNoneNone

How It Works

1

Install

Free from the Mac App Store. Grant screen recording permission when prompted.

2

Select Region

Click and drag to select any area of your screen. Borders appear with pass-through interior so you can interact normally.

3

Capture

Screenshot, record video, or stream live to ToolPiper's vision models for AI analysis.

On-Device Only

Screen content stays on your Mac. Live streaming goes to localhost, not the cloud.

Free

No subscription, no watermarks, no feature gates. Free on the Mac App Store.

Frequently Asked Questions

Does it work without ToolPiper?

Yes. Screen capture, video recording, trim editing, and GIF/WebP/MP4 export all work standalone. ToolPiper is only needed for AI streaming — sending frames to vision models for Screen Q&A, image narration, and OCR.

What macOS version is required?

macOS 26 or later on Apple Silicon (M1 or newer). VisionPiper uses ScreenCaptureKit and Metal, which require Apple Silicon.

Can I click through the capture region?

Yes. The 8-window architecture uses four edge windows and four corner windows with a pass-through interior. Everything inside the border is fully interactive — clicks, drags, and scrolls pass through to the underlying app.

Does it record system audio?

No. VisionPiper captures video only. For audio capture, use AudioPiper — a separate free app that records mic, system audio, and per-app audio via Core Audio Taps.

Your screen, analyzed by AI.

Capture a region, stream to a vision model, get answers — without uploading screenshots anywhere.