---
title: "Local Video Upscale on Mac: 2x Resolution at Realtime Speed"
description: "Upscale video from 360p to 720p at 44 FPS on your Mac's Neural Engine - no cloud upload, no watermark, audio preserved automatically."
date: 2026-03-19
author: "Ben Racicot"
tags: ["Video Upscale", "Super Resolution", "Neural Engine", "CoreML", "Privacy", "macOS", "Apple Silicon"]
type: "article"
canonical: "https://modelpiper.com/blog/local-video-upscale-mac/"
---

# Local Video Upscale on Mac: 2x Resolution at Realtime Speed

> Upscale video from 360p to 720p at 44 FPS on your Mac's Neural Engine - no cloud upload, no watermark, audio preserved automatically.

## TL;DR

ToolPiper upscales video 2x (360p to 720p) at 44 FPS on Apple Silicon - faster than realtime playback. PiperSR, a 453K-parameter super-resolution model, runs entirely on your Mac's Neural Engine. No cloud upload, no watermark, no per-video fee. Audio is preserved automatically.

You have old footage. A screen recording at 360p. A webcam archive. Drone video shot at low resolution to save storage. A conference talk recorded on a phone held by someone in the back row. The content is valuable but the resolution makes it hard to use - too blurry for a presentation, too pixelated for a portfolio, too small for a modern display.

Cloud video upscaling exists but it's slow, expensive, and requires uploading your video to a third party. Topaz Video AI costs $199 and runs locally but is a dedicated app with its own learning curve. Most online services charge per minute of video, add watermarks to free-tier output, and take hours to process because you're in a queue behind everyone else.

Your Mac's Neural Engine can upscale video faster than realtime. A 10-minute 360p video upscaled to 720p takes about 6.5 minutes on an M4 Max - and produces clean, sharp output with audio intact.

## How does AI video upscaling work?

Video upscaling applies a super-resolution model to every frame. The model - a small neural network trained on pairs of low-resolution and high-resolution images - predicts what high-resolution detail should exist in each frame. It's not interpolating pixels like bicubic scaling; it's reconstructing textures, edges, and fine detail based on learned patterns.

The challenge with video is throughput. A 30 FPS video needs 30 frames processed per second just to match realtime. Each frame at 360p is 640×360 pixels. At 2x upscale, the output is 1280×720 - 921,600 pixels per frame, predicted individually by the neural network.

Most implementations tile each frame into small patches, run inference on each patch, and stitch the results. This works but it's slow - the overhead of dispatching dozens of small inference calls per frame is enormous. ToolPiper takes a different approach: full-frame inference. The entire 640×360 frame goes through the model as a single operation, eliminating 96% of the scheduling overhead.

The result: **44.4 FPS sustained on real-world H.264 video - 1.5x faster than the 30 FPS playback rate** - the upscale finishes before you'd finish watching the original.

## Why does local video upscaling matter?

**Your video never leaves your machine.** Client footage, security camera recordings, personal videos, unreleased content - none of it gets uploaded to any server. The processing happens entirely on your hardware.

**No queue.** Cloud services process jobs sequentially across all users. Your 10-minute video might wait behind someone's two-hour film. Locally, processing starts immediately and runs at full hardware speed.

**No per-video cost.** Cloud video upscaling charges per minute or per video - often $0.50-2.00 per minute of footage. Locally, every video is free after the one-time app install.

**No watermark.** Free tiers of cloud services watermark output. Locally, the output is clean.

**Audio is preserved automatically.** ToolPiper remuxes the original audio track unchanged into the upscaled video. No re-encoding, no quality loss, no sync issues. The audio stream passes through untouched.

## What do you need for local video upscaling?

**You don't need:** A terminal. FFmpeg. Python. An API key. A subscription to Topaz Video AI. A dedicated GPU.

**You do need:** A Mac with Apple Silicon (M1 or later) and at least 8GB of RAM. The model is bundled - no separate download.

## What is PiperSR and how does it upscale video?

**PiperSR is a 453,388-parameter super-resolution model purpose-built for Apple Silicon.** The video variant accepts full 640×360 frames as single tensors (no tiling), with batch normalization fused into convolutions to minimize operation count. The entire model is 928 KB in CoreML FP16 format - small enough to fit in the Neural Engine's on-chip SRAM.

The pipeline is double-buffered across three hardware units simultaneously:

-   **CPU:** Converts the input frame from pixel buffer to Float16 tensor (0.3ms)
-   **Neural Engine:** Runs the super-resolution prediction (20.8ms)
-   **GPU:** Converts the output tensor back to a pixel buffer via a Metal shader (1.3ms)

While the GPU converts frame N's output, the CPU prepares frame N+1's input, and the Neural Engine processes the current frame. All three run in parallel. The effective per-frame time is ~22ms - yielding 44-46 FPS on real-world content.

The model achieves 37.54 dB PSNR on the Set5 benchmark - 3.88 dB above bicubic interpolation. For a deeper look at the engineering, see [How We Achieved 44 FPS Video Upscale on Apple Neural Engine](/blog/pipersr-44fps-video-upscale-apple-neural-engine).

## How do you upscale video in ModelPiper?

Load the **Local Video Upscale** template. The pipeline has three nodes: a video input, the upscale processor (PiperSR 2x Video), and the output display.

Drop a video file onto the input node - MP4 or MOV. The upscale starts immediately. Progress streams in real time via SSE, showing frame count and estimated completion. When it's done, the upscaled video appears in the output node with audio intact, ready to save.

The upscale is also available via the `/v1/video/upscale` REST endpoint and the `video_upscale` MCP tool for scripted or automated workflows.

## What are the resolution limits and trade-offs?

The full-frame pipeline is currently optimized for 360p → 720p (640×360 input, 1280×720 output). This is the resolution-locked fast path that achieves 44 FPS.

Other input resolutions fall back to the tiled pipeline, which processes 128×128 tiles sequentially. The tiled path produces the same visual quality but runs at 5-10 FPS depending on resolution - functional but not realtime.

Additional resolution-specific models (480p → 960p, 540p → 1080p) can be exported from the PiperSR training pipeline. They're not bundled yet to keep the app size small, but the architecture supports them.

Performance benchmarks are from an M4 Max. The M1 has the same 16 Neural Engine cores but an older microarchitecture - expect lower but still-usable throughput.

## Try It

Download [ModelPiper](https://modelpiper.com), install ToolPiper, and load the Local Video Upscale template. Drop a video file and watch it upscale. The model is bundled - nothing to download.

_This is part of a series on [local-first AI workflows on macOS](/blog/local-first-ai-macos). See also: [Image Upscale](/blog/local-image-upscale-mac) - the same technology for still images at up to 4x resolution._

## FAQ

### What video formats are supported?

MP4 and MOV with H.264 encoding. The output is H.264 High profile MP4 with the original audio track remuxed unchanged.

### Can I upscale 1080p to 4K?

Not yet with the full-speed pipeline. The optimized full-frame path is currently 360p → 720p only. Other resolutions use the tiled fallback at 5-10 FPS. Higher-resolution models are technically possible but aren't bundled yet.

### Does the upscale work on screen recordings?

Yes. Screen recordings, webcam footage, game capture, and any other H.264 video. PiperSR handles text, UI elements, and synthetic content well - it was trained on diverse content including screenshots and digital graphics.

### Can I upscale a live video stream?

Yes. ToolPiper supports real-time video upscale via WebSocket on port 10004. External apps can send frames and receive upscaled output in real time. The full-frame pipeline achieves the same ~44 FPS throughput on streaming frames. This is a Pro feature.