The MCP testing landscape in 2026
MCP (Model Context Protocol) turned AI coding assistants into tool-using agents. Instead of just generating code, they can now call functions, read data, and drive external systems. Browser testing is one of the most natural fits for this pattern. An AI agent can look at a page, decide what to test, create the test, and run it, all through structured tool calls.
The ecosystem has responded. Four MCP servers now claim testing capabilities, but they approach the problem from very different angles.
Playwright MCP is the official Microsoft server for Playwright. It exposes around 25 tools for navigation and interaction. The AI agent can open pages, click elements, fill forms, and take screenshots. It uses Playwright's snapshot mode by default, which reads a structured accessibility tree representation. But there are gaps: no built-in assertion tools, no recording, no self-healing, and no test persistence. The agent can drive a browser but can't save or replay what it did. Microsoft now recommends Playwright CLI over MCP for coding agents because a typical MCP session consumes around 114,000 tokens, while the CLI approach uses about 27,000 for the same task.
Chrome DevTools MCP is Google's official server. It exposes roughly 29 tools organized around debugging: console messages, network requests, performance traces, JavaScript evaluation, and basic browser automation (navigate, click, fill, screenshot). It's built for debugging, not testing. There are no assertion primitives, no test format, no recording, and no export. You can reproduce a bug through it, but you can't create a reusable test.
Cypress Cloud MCP launched in beta on March 17, 2026. It's a remote MCP server that connects AI assistants to Cypress Cloud for read-only access to test run data: statuses, failure details, error messages, stack traces, and flaky test reports. It requires a personal access token and Cypress Cloud account. The server is scoped to data retrieval and debugging. It cannot create tests, run tests, or modify test code. It answers "what happened in my last CI run?" but not "write me a test for this page."
None of these servers cover the full testing lifecycle through MCP. They each handle a piece: Playwright MCP drives browsers, Chrome DevTools MCP debugs them, Cypress Cloud MCP reads test results. Creating, persisting, running, healing, and exporting tests through MCP tools requires something else.
MCP testing tools: the comparison table
This table compares what each MCP server actually exposes for testing workflows. Not marketing features. Actual tools an AI agent can call.
The differences are structural, not incremental. Playwright MCP and Chrome DevTools MCP give AI agents browser control. Cypress Cloud MCP gives AI agents test result visibility. ToolPiper gives AI agents the complete testing lifecycle: see the page, create the test, run it, heal it, and export it. These are different problems with different tool requirements.
The complete workflow: AI writes a test from scratch
Here's how an AI agent actually creates and runs a browser test through MCP, step by step. This works with Claude Code, Cursor, Windsurf, or any MCP-compatible client connected to ToolPiper.
Step 1: Take a snapshot. The agent calls browser_snapshot. This returns the current page's accessibility tree as structured plain text. Not a screenshot. Not raw HTML. A semantic representation of every interactive element: buttons, links, inputs, headings, landmarks. A page with 2,000 DOM nodes might produce 200 AX nodes, and those 200 are the ones that matter for testing.
browser_snapshot
Result:
Page: Login - Example App
URL: https://app.example.com/login
---
navigation "Main"
link "Home"
link "Pricing"
link "Docs"
heading "Sign in to your account" [level=1]
form "Login"
textbox "Email address" [required]
textbox "Password" [required]
checkbox "Remember me"
button "Sign In"
link "Forgot password?"The output is plain text, not JSON. It's designed for LLM consumption: compact, structured, and immediately readable. The agent now understands every interactive element on the page, their roles, their labels, and their hierarchy.
Step 2: Reason about what to test. The agent reads the AX snapshot and decides what to test. For a login page, the obvious tests are: successful login, empty field validation, wrong password handling, and the forgot password link. The agent doesn't need special browser automation capabilities for this step. It reads text and reasons about it. Any model can do this.
Step 3: Generate test steps. The agent constructs a test as a sequence of typed steps. Each step has an action (navigate, click, fill, assert), a selector in AX format, and a value where applicable.
Steps:
1. navigate https://app.example.com/login
2. fill label:Email address [email protected]
3. fill label:Password secret123
4. click role:button:Sign In
5. assert url contains "dashboard"
6. assert visible role:heading:DashboardStep 4: Save the test. The agent calls test_save with the steps, a name, and a description. The test is persisted as a session on disk. It's now a reusable artifact that can be run, edited, exported, or shared.
Step 5: Run the test. The agent calls test_run with the session ID. The runner walks through each step, executes the action via CDP, and evaluates the assertions. If a selector doesn't match, the self-healing engine activates: fuzzy AX matching in 5-15ms, with optional AI-assisted healing as a fallback. The run returns a structured result:
test_run session_id="ts-abc123"
Result:
Test: Login Happy Path
Status: PASSED
Steps: 6/6 passed
Duration: 2.3s
---
1. navigate https://app.example.com/login PASSED 120ms
2. fill label:Email address PASSED 18ms
3. fill label:Password PASSED 15ms
4. click role:button:Sign In PASSED 22ms
5. assert url contains "dashboard" PASSED 890ms
6. assert visible role:heading:Dashboard PASSED 12ms
---
Health: 0 console errors, 0 network failures
Heals: 0Step 6: Export to Playwright. The agent calls test_export with the session ID and target framework. The export renderer maps AX selectors to Playwright's native format:
test_export session_id="ts-abc123" framework="playwright"
Result:
// login-happy-path.spec.ts
import { test, expect } from '@playwright/test';
test('Login Happy Path', async ({ page }) => {
await page.goto('https://app.example.com/login');
await page.getByLabel('Email address').fill('[email protected]');
await page.getByLabel('Password').fill('secret123');
await page.getByRole('button', { name: 'Sign In' }).click();
await expect(page).toHaveURL(/dashboard/);
await expect(
page.getByRole('heading', { name: 'Dashboard' })
).toBeVisible();
});Six tool calls. A complete test from nothing to exportable Playwright code. The agent handled the entire lifecycle: observe the page, decide what to test, create the test, run it, verify it passes, and generate CI-ready code.
What makes ToolPiper's MCP tools different
Tool count isn't the differentiator. It's what the tools return and how they compose. Four design decisions set ToolPiper's MCP testing tools apart from the alternatives.
Semantic plain text output
Every tool returns structured plain text, not raw JSON. When browser_action clicks a button, the response includes a semantic AX diff showing what changed on the page:
browser_action click role:button:Sign In
Result:
Clicked: button "Sign In"
---
AX Diff:
- form "Login"
- textbox "Email address"
- textbox "Password"
- button "Sign In"
+ heading "Welcome back" [level=1]
+ paragraph "Redirecting to dashboard..."
+ progressbar "Loading"Added nodes are marked with +, removed with -, modified with ~. The AI agent sees exactly what the action did to the page without taking another full snapshot. This keeps token usage low and gives the model precise information about state transitions.
Playwright MCP returns the full AX tree on every interaction. After several tool calls, the context window fills with stale page states. ToolPiper's diff-based approach means the agent always has fresh, compact information about what just changed.
Self-healing actions
When browser_action targets a selector that doesn't match, it doesn't fail immediately. The action enters a healing loop: query the AX tree for same-role candidates, score them by name similarity using Levenshtein distance, and execute on the best match if confidence is high. A button renamed from "Submit" to "Save" heals in 5-15ms with zero network calls. The response includes what was healed:
Clicked: button "Save" (healed from "Submit", confidence: high)No other MCP server has built-in self-healing. Playwright MCP fails the action. Chrome DevTools MCP fails the action. The AI agent would need to catch the error, take a new snapshot, reason about what changed, and retry. That costs tokens, time, and context window space. With ToolPiper, the tool handles it automatically.
Assertions with polling
browser_assert provides seven assertion types (visible, hidden, text, url, count, attribute, console) with built-in polling and configurable timeouts. The assertion retries until the condition is met or the timeout expires, then captures an AX snapshot on failure.
Playwright MCP has no assertion tools. An AI agent using Playwright MCP would need to take a snapshot, parse the result, and evaluate conditions in its own reasoning. That's brittle: the agent might check too early, miss a loading state, or misinterpret the snapshot. ToolPiper's assertions handle timing internally.
Full test lifecycle
The six test tools (test_list, test_get, test_save, test_delete, test_run, test_export) give the AI agent persistent test management. Tests are saved to disk, rerunnable, editable, and exportable. An agent can build a test suite over multiple sessions, come back tomorrow, list existing tests, modify one, run the full suite, and export the results.
With Playwright MCP or Chrome DevTools MCP, everything is ephemeral. The agent drives a browser session and the interactions evaporate when the session ends. There's no "save this sequence of actions as a test" tool. There's no "run that test I made yesterday" tool. The AI has to regenerate everything from scratch each time.
Without MCP: the provider-agnostic approach
MCP is powerful, but it's not the only path. ToolPiper's browser tools work just as well without MCP by injecting the AX tree directly into conversation context.
The approach is simple. Take a browser snapshot via ToolPiper's REST API (GET /v1/browser/snapshot). The response is the same semantic plain text that the MCP tool returns. Paste it into any AI conversation, with any model, on any provider. Claude, GPT-4, Gemini, a local Llama model, anything that can read text and generate structured output.
The AI reads the AX snapshot, generates PiperTest steps as JSON, and you save them via the REST API (POST /v1/test-sessions). Run them via POST /v1/test-sessions/:id/run. Export via GET /v1/test-sessions/:id/export. The full lifecycle works through HTTP, no MCP required.
This matters for three scenarios:
AI providers without MCP support. Not every model host supports MCP. OpenAI's API, Google's Gemini API, and many open-source model APIs don't have native MCP clients. But they all accept text input and produce text output. Feed them the AX tree, get test steps back.
Custom integrations. If you're building your own testing agent, you might not want MCP's overhead. A simple HTTP client that calls ToolPiper's REST API is lighter weight and gives you full control over the orchestration logic.
Local models. A 7B parameter model running on your Mac through llama.cpp can read an AX snapshot and generate test steps. The snapshot is compact plain text, not a 100KB JSON blob. Small models handle it fine because there's nothing to parse, just structured text to reason about.
The MCP tools are the convenient path. The REST API is the universal path. Both produce the same result: AI-generated tests stored in PiperTest format, runnable with self-healing, exportable to Playwright or Cypress.
Example: Claude Code writing a login test
Here's a real interaction with Claude Code connected to ToolPiper via MCP. The human provides a single instruction. The agent does the rest.
Human: "Write a test for the login page at https://staging.myapp.com/login. Test successful login and invalid password."
What the agent does:
- Calls
browser_actionwithnavigateto open the URL - Calls
browser_snapshotto read the page structure - Reads the AX tree: finds the email field, password field, sign in button, and error message area
- Generates two test sessions: "Login - Happy Path" and "Login - Invalid Password"
- Calls
test_savetwice to persist both tests - Calls
test_runon the happy path test, reads the result (all steps passed) - Calls
test_runon the invalid password test, reads the result (assertion on error message passed) - Calls
test_exporton both tests with target "playwright" - Reports: "Created 2 tests, both passing. Exported Playwright code for both. The happy path test has 6 steps and the invalid password test has 7 steps including an assertion on the error message text."
Total tool calls: 8. Total time: under 30 seconds. Total code written by the human: zero. The agent handled navigation, page analysis, test design, step generation, execution, verification, and export through MCP tools.
The same workflow works in Cursor, Windsurf, or any MCP-compatible client. The tools are identical. The agent's reasoning varies by model, but the tool interface is the same.
What the AI is good (and bad) at
AI-driven test authoring isn't magic. The agent is good at specific things and genuinely bad at others.
Good at: generating happy path coverage. Given an AX snapshot of a form, the agent can generate fill-and-submit steps, navigate to the result page, and assert on visible outcomes. This is mechanical work that the agent handles consistently.
Good at: expanding a recording. Record a 5-step test manually, then ask the agent to add error handling: "What happens if I submit an empty form?" The agent reads the existing test, takes a snapshot, and adds steps for the empty-field case. This turns a basic recording into a thorough test.
Good at: generating assertions. The agent can look at a page state and suggest what to assert: "The heading should say Dashboard. The user's name should appear in the nav. The URL should contain /dashboard." Humans often forget assertions because they're focused on the interaction flow. The agent fills in the verification gaps.
Bad at: knowing which edge cases matter. The agent doesn't know that your app has a race condition when two users edit the same document simultaneously. It doesn't know that the discount code field breaks when the code contains a special character. These are business-logic edge cases that require domain knowledge.
Bad at: complex auth flows. OAuth redirects, CAPTCHA challenges, MFA tokens. These are inherently dynamic and require either environment setup (test accounts, bypass tokens) or manual intervention. The agent can record the flow once, but it can't replicate a time-based OTP code.
Bad at: replacing test strategy. The agent generates tests for what it can see. It doesn't decide that you need load testing, or that the checkout flow needs to be tested with 15 different payment methods, or that the admin panel needs tests at all. Test strategy is a human decision.
Use the AI for the tedious work: recording flows, generating assertions, expanding test coverage for visible UI. Apply human judgment for the strategy: what to test, which edge cases matter, when a test is actually valuable.
Honest limitations
ToolPiper's MCP testing tools have real constraints that affect who they work for.
Chrome only. All browser tools use Chrome DevTools Protocol. Firefox and Safari are not supported. If your CI matrix requires cross-browser testing, you need to export to Playwright and run the generated code across browsers there. The authoring happens in Chrome. The CI execution can be multi-browser via the export.
macOS only. ToolPiper is a native macOS app that runs on Apple Silicon (M1 or later). There's no Windows or Linux version. The exported Playwright and Cypress code runs on any platform, but the MCP server and test runner require a Mac.
Token efficiency varies by client. Claude Code handles 20 tools well because Anthropic designed MCP with large tool registries in mind. Other clients may struggle with tool selection when presented with ToolPiper's full 104-tool catalog. If your MCP client gets confused by tool count, you can limit the scope by configuring which tool tiers to expose.
Self-healing has confidence thresholds. Fuzzy AX matching works for label renames and element reordering. It doesn't work for fundamental page restructures where the original element's role and name have both changed completely. In those cases, the test correctly fails and AI-assisted healing (if enabled) attempts a structural repair.
AI-generated tests need review. An AI agent can produce a test that passes today but checks the wrong things. The assertions might be too loose ("page contains some text") or too tight ("heading equals exact string that changes with every deploy"). Human review of AI-generated tests is still essential.
Getting started
Install ToolPiper from modelpiper.com. Connect it to your MCP client with one command:
claude mcp add toolpiper -- ~/.toolpiper/mcpFor Cursor, Windsurf, or other clients that use JSON config, point to the same binary path. For HTTP-based clients, use http://localhost:9998/mcp as the Streamable HTTP endpoint.
Open Chrome, navigate to whatever you want to test, and ask your AI agent: "Take a snapshot of this page and write a test for the login flow." The agent handles the rest.
This is part of a series on AI-powered testing workflows. For the visual testing guide (no AI required), see Visual Testing on Mac. For the self-healing deep-dive, see Self-Healing Test Selectors. For the full MCP server overview (all 104 tools), see Local MCP Server on Mac.