You have a scanned contract. A photo of a whiteboard from a meeting. A screenshot of a table you can't copy from. A PDF where the text layer is wrong or missing entirely.
You need the text out of these images, and you need it now.
The cloud options work fine: Google Drive's OCR, Adobe Acrobat's cloud service, various API-based solutions. They're accurate and fast. They also mean your scanned contracts, whiteboard photos, and documents are uploaded to third-party servers for processing.
For personal documents, that's your call. For client contracts, medical records, legal documents, financial statements, or anything under NDA — sending it to a cloud OCR service is a decision that should give you pause.
macOS has Apple Vision OCR built into the operating system. It's fast, it's accurate, and it runs entirely on-device. The problem is accessing it — Apple exposes it through developer APIs, not through a user-facing tool that lets you drop in a document and get text back.
ToolPiper fixes that. It wraps Apple Vision OCR in a REST endpoint and makes it available as a pipeline block in ModelPiper.
How Apple Vision OCR Differs From Cloud OCR
Apple Vision OCR runs on Apple's Neural Engine. It's optimized specifically for Apple Silicon and uses the same framework that powers Live Text in Photos and the camera. It handles:
Printed text with high accuracy across multiple fonts, sizes, and orientations.
Handwritten text — surprisingly well. Apple has invested heavily in handwriting recognition, and the results on reasonably legible handwriting are good.
Multiple languages recognized automatically, without specifying the language in advance.
Structured documents — it understands layout, columns, tables, headers, and reading order, not just individual characters.
The key advantage over cloud OCR: zero latency from network round trips, and the document never leaves your device.
The ModelPiper Workflow
Load the Document OCR template. Drag in an image or PDF. ToolPiper sends it through Apple Vision OCR and returns the extracted text.
That's the basic use. But because this is a pipeline block, you can chain it with other operations:
OCR → Summarize: Extract text from a long document, then pass it to an LLM for summarization. Drop in a 10-page scanned PDF, get a one-paragraph summary.
OCR → Translate: Extract text from a foreign-language document, then translate it with an LLM.
OCR → Structured Extract: Extract text from invoices, receipts, or forms, then use an LLM to pull out specific fields (amounts, dates, names, addresses) into structured data.
Beyond Basic OCR: Apple Vision Endpoints
ToolPiper doesn't just do text extraction. It exposes multiple Apple Vision endpoints through its REST API, including image classification, face detection, body pose estimation, barcode reading, object saliency, document segmentation, and more.
For the OCR use case specifically, the pipeline is straightforward: image in, text out. But having the full Apple Vision stack available locally opens up workflows that most people associate with cloud-only services.
Try It
Download ModelPiper, install ToolPiper, and load the Document OCR template. Drop in an image with text — a photo of a document, a screenshot, a scanned page.
Text extraction happens on Apple's Neural Engine. Your documents never leave your Mac.
This is part of a series on local-first AI workflows on macOS. Next up: Image Narration — AI describes what it sees and reads the description aloud.