What's broken with Mac automation in 2026?
Mac automation is stuck in two worlds. Shortcuts is visual but limited. It cannot control window layouts, adjust display brightness based on context, or respond to natural language. AppleScript is powerful but requires learning a 30-year-old scripting language that most developers avoid and most users have never seen. Keyboard Maestro, BetterTouchTool, and Raycast each solve pieces of the puzzle, but none combine voice input, AI reasoning, and system control into a single interface.
The result: Mac users have the most capable hardware in consumer computing and the most fragmented automation story. You need three or four apps to do what "hey, make it dark and move Slack to the left half" should accomplish in one sentence. Shortcuts handles some system toggles but not window management. BetterTouchTool handles window snapping but not natural language. Raycast handles AI text but not system actions. Keyboard Maestro handles everything but requires building macros by hand in a visual programming environment.
Every tool in this category requires you to learn its language: Shortcuts' visual blocks, AppleScript's English-like syntax, Keyboard Maestro's condition trees, Hammerspoon's Lua bindings, Raycast's extension API. The user already knows what they want. "Turn down the brightness." "Open Safari and put it next to my terminal." The gap is not capability. It is translation from intent to action. The missing layer is not another automation app. It is a natural language interface that sits above the system APIs and routes human intent to machine execution.
The state of Mac automation (April 2026)
The Mac automation landscape is fragmented. Each tool occupies a specific niche, and none of them combine AI interpretation, voice activation, and broad system control into a single workflow.
Apple Shortcuts
Shortcuts replaced Automator in macOS Monterey (2022) and has improved steadily since. As of macOS Sequoia, Shortcuts supports over 900 built-in actions across Apple apps and system functions. Third-party apps can expose Shortcut actions through the App Intents framework.
The limitations are real. Shortcuts has no AI integration for natural language command interpretation. Building a multi-step automation requires dragging blocks, connecting variables, and debugging in a visual editor that becomes unwieldy for anything beyond five steps. Window management is minimal. System-level controls like Spaces, display brightness, and audio device switching are either absent or require workarounds. There is no voice activation beyond triggering a named Shortcut through Siri, which requires knowing the exact Shortcut name.
Apple Intelligence improvements in macOS Sequoia added some Siri enhancements, including on-device processing for simple requests and better contextual understanding. But as of March 2026, Siri's automation capabilities remain limited to Apple's predefined intent set. You cannot extend Siri with custom action domains or teach it new system commands.
Raycast (v1.87, $8/month Pro)
Raycast has become the default launcher for power users, replacing Alfred for many. The AI extension ($8/month with Raycast Pro) adds GPT-4o, Claude, and other cloud models for text generation, translation, and summarization directly in the launcher. Raycast also ships window management, clipboard history, and snippets as built-in features.
Raycast's AI is conversational, not action-oriented. It can generate text and answer questions, but it cannot toggle dark mode, adjust display brightness, manage Spaces, control audio devices, or simulate keyboard input. The window management is manual: keyboard shortcuts that you configure, not AI-interpreted commands. There is no voice activation. Raycast is an excellent launcher with AI text features bolted on, not an AI automation platform.
Alfred (v5.6, one-time purchase)
Alfred remains the most customizable launcher on macOS. Workflows support AppleScript, shell scripts, Python, and JavaScript. The community has built thousands of workflows covering everything from Spotify control to Jira integration. Alfred's strength is its extensibility and one-time pricing model (Powerpack, roughly $40).
Alfred has no built-in AI integration. Community members have built ChatGPT workflows, but these are text-only and require API keys. There is no voice input, no system action interpretation, and no MCP tool exposure. Alfred is a launcher with scripting capabilities, not an AI automation tool.
BetterTouchTool (v4.6, $22 license)
BetterTouchTool is the Swiss Army knife of input customization. It maps gestures, key sequences, trackpad actions, Touch Bar buttons, and Stream Deck presses to macOS actions. Window snapping is excellent. The automation capabilities are deep if you invest time in the configuration UI.
BTT added an AI Actions feature in 2024 that sends prompts to OpenAI or local models. The focus is text transformation (summarize clipboard contents, translate selected text), not system control. BTT's automation is trigger-mapped, not intent-interpreted. You configure a specific gesture to trigger a specific action. There is no natural language command layer that interprets "mute my Mac and move this window to the right" as two system actions.
Keyboard Maestro (v11.0, $36 license)
Keyboard Maestro is the most powerful general-purpose automation tool on macOS. It can script nearly anything: UI element interaction, conditional logic, loops, variables, file operations, network requests, clipboard manipulation, and timed triggers. Professional users build genuinely complex automation workflows.
The learning curve is significant. Building a macro that "moves the frontmost window to the left half of the screen if it's Safari, otherwise moves it to the right half" requires understanding Keyboard Maestro's condition system, variable model, and UI element targeting. There is no AI interpretation. There is no voice activation. Every automation is hand-built in a visual programming environment.
Hammerspoon (free, open source)
Hammerspoon bridges Lua scripting to macOS APIs. If you can write Lua, you can automate almost anything. Window management, hotkey bindings, Wi-Fi event handlers, USB device watchers, menu bar items. The community Spoons library provides pre-built modules.
Hammerspoon requires programming skills. There is no GUI, no AI, and no voice input. It is a tool for developers who want full control and are willing to write code for it.
macOS Accessibility APIs and the sandbox wall
A recurring theme across these tools is the tension between macOS security and automation capability. Apple's App Sandbox, required for Mac App Store distribution, prohibits the accessibility APIs that power window management, input simulation, and process control. Every serious automation tool on macOS, including Keyboard Maestro, BetterTouchTool, Hammerspoon, and ActionPiper, distributes outside the App Store for this reason.
Apple's direction is clear: they want automation to flow through App Intents and Shortcuts, which are sandboxed and permission-gated. This is a reasonable security model, but it means system-level automation tools will always exist in tension with Apple's platform policies. Users who want broad macOS control must grant explicit accessibility permissions and accept apps distributed as DMGs. That is the trade-off, and it applies equally to every tool in this category.
The gap
No existing Mac automation tool combines voice activation, AI command interpretation, broad system control, and MCP tool exposure. Raycast has AI but only for text. BetterTouchTool has gestures but no intent interpretation. Keyboard Maestro has power but no natural language. Hammerspoon has depth but requires programming. Apple Shortcuts has breadth but no AI integration. Each tool solves part of the problem. None solve the whole thing.
The natural language interface layer
Our thesis for Mac automation: natural language is the missing interface layer for desktop control. Every automation tool on Mac requires you to learn its language. Shortcuts has visual blocks. AppleScript has syntax. Keyboard Maestro has a macro editor. Raycast has an extension API. The user already knows what they want: "turn down the brightness" or "open Safari and put it next to my terminal." The gap is translation from intent to action.
ActionPiper bridges this with a two-stage architecture. First, STT converts speech to text. FluidAudio's Parakeet model runs on the Neural Engine with approximately 140ms end-to-end latency. Second, a local LLM interprets the intent and routes it to one of 142 actions across 26 domains. The LLM receives structured tool definitions, not freeform text, so it knows exactly what actions are available and what parameters they accept. "Make it dark" matches action_appearance with parameter darkMode: true. "Move Slack left" matches action_window with app: Slack, position: left-half. The model is not guessing. It is selecting from a defined action space.
This architecture has a key advantage over every alternative: the model sees the complete action surface at once. Traditional automation tools require you to know that brightness is in System Settings, that window snapping needs a third-party app, and that audio device switching is buried in a Sound menu bar icon. The LLM sees all 26 domains simultaneously. "Mute my Mac, go dark, and set brightness to 30%" is three tool calls to three different domains, dispatched in sequence, resolved in under a second. The user does not need to know which domain handles which capability.
The other key architectural advantage: no sandbox restrictions. ActionPiper is distributed as a DMG, not through the App Store, specifically because App Store sandboxing prevents the system-level access that desktop automation requires. Shortcuts runs in a sandbox. Siri runs in a sandbox. ActionPiper has full access to accessibility APIs, window management, display control, audio routing, network configuration, and process management. This is a deliberate distribution tradeoff. The Mac App Store provides discoverability and automatic updates. Direct distribution provides the system access that makes real automation possible. Every serious automation tool on macOS, including Keyboard Maestro, BetterTouchTool, and Hammerspoon, makes the same choice for the same reason.
What's coming
Mac automation is moving toward AI-native control, and several developments are worth tracking.
Our roadmap
More action domains. ActionPiper currently covers 26 domains with roughly 142 actions. Planned additions include deeper per-app integration (controlling specific application features beyond basic window management), multi-step macros with conditional logic ("if the battery is below 20%, enable low power mode and reduce brightness"), and scheduled actions that trigger on system events.
Custom hotkey mapping. The Right Option and Right Command keys are currently fixed assignments. Configurable hotkey bindings are planned, allowing users to assign push-to-talk to any key combination.
Context-aware commands. Future versions will use the frontmost application and current system state as context for command interpretation. "Make this bigger" would resize a window in Finder, zoom in on a document in Preview, or increase font size in an editor, depending on what's active.
Industry horizon
Apple Intelligence and Siri. Apple's WWDC 2025 announcements expanded on-device Siri capabilities, and rumors suggest WWDC 2026 will further extend Siri's ability to control third-party apps through App Intents. If Apple opens system-level automation to Siri with AI interpretation, the entire landscape shifts. As of March 2026, this remains rumored but unconfirmed.
Raycast AI Extensions. Raycast has signaled interest in expanding its AI capabilities beyond text generation. A system action layer for Raycast AI would compete directly with ActionPiper's approach, though Raycast's cloud-dependent AI model limits privacy-focused workflows.
MCP adoption. The Model Context Protocol is gaining traction across AI tools. As of March 2026, Claude Code, Cursor, Windsurf, and several other AI development tools support MCP natively. System automation tools that expose MCP-compatible interfaces become more valuable as this ecosystem grows. ActionPiper's 29 MCP tools are already usable from any MCP client. As MCP adoption increases, the line between "AI coding assistant" and "AI system automation" blurs. A developer asking Claude Code to "mute my Mac, switch to dark mode, and focus the terminal" is not switching tools. They are using one interface for both code and system control.
How ToolPiper handles this today
ActionPiper is a standalone macOS menu bar app that ships as a DMG (not App Store, because the sandbox prohibits the accessibility APIs required for system control). It runs in the background using roughly 20MB of memory and registers two global hotkeys for voice input.
Push-to-talk dictation
Hold the Right Option key, speak naturally, release. FluidAudio's Parakeet STT model transcribes your speech on the Neural Engine with approximately 140ms end-to-end latency. The transcribed text is pasted at your current cursor position, in any application. No app switching, no clipboard management, no cloud round-trip.
Ready to try it? Set up push-to-talk dictation - works immediately after installing ActionPiper.
Push-to-command
Hold the Right Command key, speak a natural language instruction, release. The STT engine transcribes your speech, a local LLM interprets the command against 26 action domain tool definitions, ActionRouter dispatches the action through native macOS APIs, and a notification confirms what happened. The entire pipeline runs locally: Neural Engine for STT, Metal GPU for LLM inference, native APIs for execution.
Example commands: "Turn on dark mode." "Set volume to fifty percent." "Snap this window to the left half." "Open Safari." "Turn off Wi-Fi." "Start the screensaver." The LLM handles natural language variation, so "make it dark" and "switch to dark mode" resolve to the same action.
26 action domains, 29 MCP tools
ActionPiper exposes approximately 142 individual actions across 26 domains: accessibility, app management, appearance, audio, bluetooth, calendar, contacts, defaults, desktop, display, dock, finder, focus, input, location, media, network, notification, power, process, reminders, shortcut, spaces, storage, system, and window management. Related domains are grouped into 29 MCP tools, each with structured parameter schemas.
These tools work in three contexts: ModelPiper chat (type a command and the AI dispatches it), any MCP client like Claude Code or Cursor (system actions alongside your development tools), and push-to-command voice (speak and release). All three interfaces call the same underlying action system.
Ready to try it? Set up AI desktop automation - install ActionPiper and start controlling your Mac through natural language.
MCP integration for developers
For developers using Claude Code, Cursor, or other MCP-capable tools, ActionPiper's system actions become part of your AI workflow. "Mute my Mac, switch to dark mode, and open the project in Finder" is a single prompt. The setup is one command: claude mcp add toolpiper -- ~/.toolpiper/mcp. All 29 action tools appear alongside ToolPiper's other capabilities (browser automation, testing, inference, and more).
Models and hardware
Mac automation requires two models working in sequence: a speech-to-text model for voice input, and an LLM for command interpretation. Both run locally on Apple Silicon.
Speech-to-text: FluidAudio Parakeet TDT V3. This is the STT model that powers push-to-talk. It runs on the Neural Engine at approximately 210x realtime, meaning it processes a 10-second utterance in under 50ms. The model stays loaded in memory as a keep-warm backend, eliminating cold-start delays. End-to-end latency from key release to text insertion is approximately 140ms. FluidAudio handles the ANE compilation and audio preprocessing. You do not interact with the model directly.
Command interpretation: any local LLM via llama.cpp. For push-to-command, the transcribed speech is sent to a local LLM along with tool definitions for all 29 MCP action tools. The LLM selects the right tool and fills in parameters. A 3B model (Llama 3.2 3B, roughly 3GB RAM) handles straightforward commands reliably. An 8B model (Llama 3.1 8B, roughly 6GB RAM) improves accuracy for ambiguous or multi-step instructions. The LLM runs on the Metal GPU via llama.cpp.
Hardware requirements. Any Apple Silicon Mac (M1 or later) runs the full pipeline. The STT model uses the Neural Engine, which is idle during most workloads. The LLM uses the Metal GPU. ActionPiper itself uses roughly 20MB. With a 3B LLM and the STT model loaded, expect approximately 5GB total memory usage for the automation pipeline. A Mac with 16GB handles this comfortably alongside normal workloads. An 8GB machine can run it but with less headroom for other applications.
Local versus cloud automation
The most common question about local AI automation is whether it is actually better than cloud-based alternatives. The honest answer depends on what you value.
Privacy. Local automation processes everything on your hardware. No audio recordings leave your Mac. No command transcripts are sent to external servers. No system state information is shared with cloud providers. This is a hard requirement for some users and irrelevant to others. If you work with sensitive information or in regulated environments, local processing is not a nice-to-have. It is mandatory.
Latency. ActionPiper's push-to-talk pipeline runs at 140ms end-to-end. Cloud-based voice assistants add 200-500ms of network latency before processing even begins. For single commands, the difference is subtle. For rapid-fire voice commands during a workflow, the gap becomes noticeable. Local processing also works without an internet connection, which matters on planes, in cafes with poor Wi-Fi, and during network outages.
Flexibility. Siri supports Apple's predefined intent set. You cannot add custom action domains, define new parameter schemas, or extend the command vocabulary beyond what Apple ships. ActionPiper's 26 domains and 142 actions are the current set. When new domains are added, they become available immediately through the same voice and MCP interfaces. The LLM interprets natural language against whatever tool definitions are present.
Integration. Cloud assistants like Siri and Google Assistant operate in their own context. You speak to them, they respond, and you go back to what you were doing. ActionPiper's MCP tools integrate directly into development environments. Claude Code, Cursor, and Windsurf can dispatch system actions as part of larger AI workflows. This is a fundamentally different interaction model: system automation as a tool call within a broader task, not a separate conversational exchange.
Quality tradeoff. Cloud LLMs (GPT-4o, Claude) are more capable than local 3B or 8B models for complex reasoning. For automation command interpretation, this gap matters less than you might expect. Tool selection from a defined set of 29 tools with structured schemas is a constrained task. A 3B model handles it reliably for single-step commands. Multi-step commands with ambiguity benefit from larger models, and ToolPiper supports routing to cloud LLMs if you prefer accuracy over privacy for command interpretation.