Ollama runs one model at a time. Send it a prompt, get a response. For single-turn chat, that's enough.

But useful work often chains capabilities. You record a meeting, transcribe the audio, summarize the transcript, and extract action items. Each step needs a different model or tool: a speech-to-text model, a language model for summarization, maybe another pass for structured extraction. Ollama handles one of those steps. The orchestration - deciding what feeds into what, moving data between models, handling failures - is your problem.

The usual workaround is a Python script that calls Ollama's API in a loop, piping output from one model into the next. It works. Then you want to swap the summarization model, or add a translation step, or figure out why step three produced garbage. Now you're maintaining a custom orchestration layer for something that should be a drag-and-drop operation.

That's the gap pipelines fill: a visual way to chain models and capabilities without writing glue code. And because ToolPiper connects to Ollama as a backend, the models you already have downloaded work as blocks in any pipeline.

What is a pipeline in ModelPiper?

A pipeline is a visual workflow where each block represents a model or operation. Data flows from one block to the next through connections you draw on a canvas. You're building workflows, not writing prompts.

Each block has a type: text generation, speech-to-text, text-to-speech, OCR, embedding, image upscale, and others. When a block's type is text generation, it can use any model from any connected provider - your Ollama models, ToolPiper's built-in llama.cpp engine, or a cloud API. The pipeline builder handles the data flow between blocks automatically.

Pipelines are data-driven. The configuration for each block (which model, what parameters, how to transform the output) is stored as JSON. You can duplicate a pipeline, swap out one block, and have a variation running in seconds. No code to maintain.

What can you build with Ollama models in a pipeline?

Three example pipelines that use Ollama as the language model backend. Each solves a real problem that single-model chat can't.

Voice conversation: STT → LLM → TTS

The simplest multi-model pipeline. A speech-to-text block (Parakeet v3) transcribes your voice input. The transcript feeds into an Ollama chat model for reasoning. The model's response streams into a text-to-speech block (PocketTTS, Soprano, or Orpheus) that reads it aloud.

Three blocks, three different AI capabilities, one workflow. ToolPiper ships this as the tp-local-voice-chat template. For a deeper walkthrough, see voice chat with Ollama on Mac.

Document Q&A: OCR → Embed → Index → Chat

Drop a scanned PDF into the pipeline. An OCR block (Apple Vision) extracts the text. An embedding block converts the text into vectors and indexes them in a local collection. A chat block with RAG context answers questions about the document, citing specific passages.

This is the workflow that makes local AI practical for professionals with document-heavy work. Contracts, research papers, internal reports - they stay on your Mac and become searchable through natural language.

Multilingual content: Chat → Translate → TTS

Ask your Ollama model a question in English. The response feeds into a second chat block with a translation system prompt. The translated text streams into a TTS block that reads it aloud in the target language.

Each block handles one task. The pipeline builder handles the wiring. Changing the target language is a one-field edit in the translation block's system prompt, not a code change.

How does Ollama fit into the pipeline?

If you already use Ollama, you have a library of downloaded models. Connecting Ollama as a provider in ToolPiper makes every one of those models available as a block in the pipeline builder. No re-downloading, no format conversion - they appear in the model dropdown alongside ToolPiper's built-in models and any cloud APIs you've configured.

This is the practical advantage for Ollama users: you've already invested time finding and pulling the right models for your tasks. A 7B coding model, a 3B fast-chat model, maybe a 13B for complex reasoning. In a pipeline, you can use each one where it's strongest - the fast model for classification, the large one for generation - without managing separate API calls or scripts.

The connection works through Ollama's API on localhost:11434. You'll need OLLAMA_ORIGINS configured for the browser-based pipeline builder to reach Ollama. Or use ToolPiper's built-in engine for the LLM blocks, which doesn't require CORS setup.

How do you build a pipeline from scratch?

We'll build a three-block pipeline: transcribe an audio clip, then summarize the transcript with an Ollama model. A practical workflow for meeting notes, podcast summaries, or lecture review.

Open the pipeline builder in ModelPiper. The canvas starts empty. Drag an STT block from the block palette onto the canvas. This block will handle speech-to-text using Parakeet v3.

Drag a text generation block next to it. Click the block to open its settings and select an Ollama model - Llama 3.2 3B is a good default for summarization tasks. Set the system prompt to something like "Summarize the following transcript in 3-5 bullet points. Be concise."

Draw a connection from the STT block's output port to the chat block's input port. The line represents data flow: when the STT block finishes transcribing, the text automatically feeds into the summarization model.

Hit run. Drop an audio file into the STT block's input. Parakeet transcribes it, the transcript flows to your Ollama model, and the summary appears in the output panel. Two models, one click, no scripting.

Want to extend it? Add a TTS block after the summary to read the bullet points aloud. Add a translation block between summarization and TTS to get the summary in another language. Each extension is another block and another connection, not another script to maintain.

What are the limitations of local pipelines?

Latency compounds. Each block in the pipeline adds processing time. A three-block voice pipeline (STT + LLM + TTS) adds roughly 1-2 seconds of total overhead on an M2 Max. A five-block pipeline with OCR, embedding, retrieval, chat, and TTS will take longer. For real-time interactive workflows, keep the chain short. For batch processing (transcribe a folder of audio files), latency per step matters less.

Memory adds up. Each model block that loads a different model needs its own RAM allocation. A voice chat pipeline with STT + 3B LLM + TTS needs about 3GB. A document Q&A pipeline with OCR + embeddings + 7B LLM might need 6-7GB. ToolPiper's resource monitor shows whether a pipeline's models fit before you run it.

Single-model chat doesn't need a pipeline. If you're asking a question and reading an answer, the pipeline builder is overhead. Open the chat panel and talk to the model directly. Pipelines earn their complexity when the workflow involves more than one capability - voice, vision, translation, document processing, or model chaining.

Pipeline debugging takes patience. When a multi-block workflow produces unexpected output, you need to inspect each block's input and output to find where the chain went wrong. The pipeline builder shows per-block results, which helps. But debugging a three-step pipeline is inherently more work than debugging a single prompt.

Download ToolPiper at modelpiper.com and open the pipeline builder. If you have Ollama models, they show up as options in every text generation block.

This is part of a series on Ollama frontends for Mac. See also: Voice Chat With Ollama - the simplest pipeline you can build. Next: Ollama Vision GUI on Mac - use LLaVA without the terminal.