You have old footage. A screen recording at 360p. A webcam archive. Drone video shot at low resolution to save storage. A conference talk recorded on a phone held by someone in the back row. The content is valuable but the resolution makes it hard to use — too blurry for a presentation, too pixelated for a portfolio, too small for a modern display.

Cloud video upscaling exists but it's slow, expensive, and requires uploading your video to a third party. Topaz Video AI costs $199 and runs locally but is a dedicated app with its own learning curve. Most online services charge per minute of video, add watermarks to free-tier output, and take hours to process because you're in a queue behind everyone else.

Your Mac's Neural Engine can upscale video faster than realtime. A 10-minute 360p video upscaled to 720p takes about 6.5 minutes on an M4 Max — and produces clean, sharp output with audio intact.

How does AI video upscaling work?

Video upscaling applies a super-resolution model to every frame. The model — a small neural network trained on pairs of low-resolution and high-resolution images — predicts what high-resolution detail should exist in each frame. It's not interpolating pixels like bicubic scaling; it's reconstructing textures, edges, and fine detail based on learned patterns.

The challenge with video is throughput. A 30 FPS video needs 30 frames processed per second just to match realtime. Each frame at 360p is 640×360 pixels. At 2x upscale, the output is 1280×720 — 921,600 pixels per frame, predicted individually by the neural network.

Most implementations tile each frame into small patches, run inference on each patch, and stitch the results. This works but it's slow — the overhead of dispatching dozens of small inference calls per frame is enormous. ToolPiper takes a different approach: full-frame inference. The entire 640×360 frame goes through the model as a single operation, eliminating 96% of the scheduling overhead.

The result: 44.4 FPS sustained on real-world H.264 video. That's 1.5x faster than the 30 FPS playback rate — the upscale finishes before you'd finish watching the original.

Why local video upscaling matters

Your video never leaves your machine. Client footage, security camera recordings, personal videos, unreleased content — none of it gets uploaded to any server. The processing happens entirely on your hardware.

No queue. Cloud services process jobs sequentially across all users. Your 10-minute video might wait behind someone's two-hour film. Locally, processing starts immediately and runs at full hardware speed.

No per-video cost. Cloud video upscaling charges per minute or per video — often $0.50–2.00 per minute of footage. Locally, every video is free after the one-time app install.

No watermark. Free tiers of cloud services watermark output. Locally, the output is clean.

Audio is preserved automatically. ToolPiper remuxes the original audio track unchanged into the upscaled video. No re-encoding, no quality loss, no sync issues. The audio stream passes through untouched.

What You Need

You don't need: A terminal. FFmpeg. Python. An API key. A subscription to Topaz Video AI. A dedicated GPU.

You do need: A Mac with Apple Silicon (M1 or later) and at least 8GB of RAM. The model is bundled — no separate download.

The model: PiperSR 2x Video

PiperSR is a 453,388-parameter super-resolution model purpose-built for Apple Silicon. The video variant accepts full 640×360 frames as single tensors (no tiling), with batch normalization fused into convolutions to minimize operation count. The entire model is 928 KB in CoreML FP16 format — small enough to fit in the Neural Engine's on-chip SRAM.

The pipeline is double-buffered across three hardware units simultaneously:

  • CPU: Converts the input frame from pixel buffer to Float16 tensor (0.3ms)
  • Neural Engine: Runs the super-resolution prediction (20.8ms)
  • GPU: Converts the output tensor back to a pixel buffer via a Metal shader (1.3ms)

While the GPU converts frame N's output, the CPU prepares frame N+1's input, and the Neural Engine processes the current frame. All three run in parallel. The effective per-frame time is ~22ms — yielding 44–46 FPS on real-world content.

The model achieves 37.54 dB PSNR on the Set5 benchmark — 3.88 dB above bicubic interpolation. For a deeper look at the engineering, see How We Achieved 44 FPS Video Upscale on Apple Neural Engine.

The ModelPiper Workflow

Load the Local Video Upscale template. The pipeline has three nodes: a video input, the upscale processor (PiperSR 2x Video), and the output display.

Drop a video file onto the input node — MP4 or MOV. The upscale starts immediately. Progress streams in real time via SSE, showing frame count and estimated completion. When it's done, the upscaled video appears in the output node with audio intact, ready to save.

The upscale is also available via the /v1/video/upscale REST endpoint and the video_upscale MCP tool for scripted or automated workflows.

Resolution and limitations

The full-frame pipeline is currently optimized for 360p → 720p (640×360 input, 1280×720 output). This is the resolution-locked fast path that achieves 44 FPS.

Other input resolutions fall back to the tiled pipeline, which processes 128×128 tiles sequentially. The tiled path produces the same visual quality but runs at 5–10 FPS depending on resolution — functional but not realtime.

Additional resolution-specific models (480p → 960p, 540p → 1080p) can be exported from the PiperSR training pipeline. They're not bundled yet to keep the app size small, but the architecture supports them.

Performance benchmarks are from an M4 Max. The M1 has the same 16 Neural Engine cores but an older microarchitecture — expect lower but still-usable throughput.

Try It

Download ModelPiper, install ToolPiper, and load the Local Video Upscale template. Drop a video file and watch it upscale. The model is bundled — nothing to download.

This is part of a series on local-first AI workflows on macOS. See also: Image Upscale — the same technology for still images at up to 4x resolution.