End-to-end tests are the most valuable and least written tests. Every engineering team knows they should have them. Nobody has time to write them.
The testing pyramid has been gospel for over a decade: unit tests at the base, integration tests in the middle, end-to-end tests at the top. And at most companies, that top layer is empty. Not because people disagree with it, but because the cost of authoring and maintaining E2E tests is too high relative to everything else competing for developer time.
The result is predictable. Bugs that unit tests can't catch - broken flows, missing form validations, auth redirects that go nowhere - ship to production and get caught by users. The most important layer of testing is the one nobody writes.
Why are end-to-end tests so hard to maintain?
Playwright and Cypress are genuinely good tools. They're fast, well-documented, and handle the hard problems of browser automation - waiting for elements, managing async state, handling network timing. If you're writing E2E tests professionally, you're probably using one of them.
The problem isn't the frameworks. It's the authoring model.
Writing a Playwright test means writing code. You need an IDE, a Node.js environment, a test configuration file, and familiarity with the framework's API. For a QA engineer who thinks in user flows - "click this, type that, verify this appears" - the gap between what they want to express and what they need to write is enormous.
Codegen tools try to bridge this gap. Playwright has a recorder. Cypress has a basic one too. But the generated code produces brittle selectors - CSS classes, generated IDs, deeply nested XPath expressions. These selectors are implementation details. They break when class names change, when components are refactored, when the CSS-in-JS library changes its hashing strategy, when a developer renames a wrapper div.
When selectors break, someone has to open the test file, find the failing selector, inspect the updated DOM, write a new selector, and re-run the test. Multiply that by fifty tests and three UI refactors a quarter, and you understand why the E2E test suite is the first thing teams abandon when velocity pressure increases.
What if testing was visual?
The core idea behind PiperTest is simple. What if you could create a test by using your app, and maintain it by running it?
Record your interaction. Browse the app the way a user would. Click buttons, fill forms, navigate pages. The recorder captures every interaction as a structured step with rich metadata about what you clicked and why it was identifiable.
Replay it. The test runner walks through each step, executes the action, and verifies the result. If something changed - a button moved, a label was updated - the runner tries to find it anyway using fuzzy matching against the current page structure.
Export it. When you need the test in your CI pipeline, export to Playwright or Cypress code with one click. The generated code uses the selectors your CI tool understands.
No IDE required to create a test. No test framework to configure. No code to write. The person who knows the user flow best - often a QA engineer, a product manager, or the developer who just built the feature - can create the test directly.
Why do CSS selectors keep breaking?
CSS and XPath selectors target the DOM - the raw HTML structure of a page. The DOM is a build artifact. It changes when you update a dependency, when you switch component libraries, when you rename CSS classes, when your build tool updates its chunk hashing.
Consider a login button. In the DOM, it might be <button class="btn-primary-lg mt-4 auth-submit">Sign In</button>. The test selects it with .auth-submit. Then a designer renames the class to .login-cta. The test breaks. The button still says "Sign In" and still does the same thing, but the selector is dead.
The accessibility tree doesn't have this problem. Chrome's AX tree represents what users see and interact with: a button labeled "Sign In." That's it. No class names, no generated IDs, no framework-specific wrapper divs. A React app with 2,000 DOM nodes might have 200 AX tree nodes - the ones that actually matter for interaction.
AX selectors target what the user experiences, not how the developer built it. A CSS refactor that changes every class name on the page doesn't touch the accessibility tree. A migration from React to Vue doesn't touch it either, as long as the UI looks and behaves the same. Tests break when behavior changes, which is exactly when they should break.
What is PiperTest?
PiperTest is ToolPiper's visual testing format. It's built on the accessibility tree from the ground up - not CSS selectors with AX as a fallback, but AX-native as the primary and only selector strategy.
Every test is a sequence of typed steps. Each step has an action (click, fill, navigate, assert), a selector in AX format, and metadata about the target element. Here's what a recorded login test looks like:
navigate https://example.com/login
fill label:Email [email protected]
fill label:Password secret123
click role:button:Sign In
assert text role:heading = "Dashboard"The selectors use accessibility roles: role:button:Sign In, label:Email, text:Welcome, testid:submit-btn. For ambiguous cases, hierarchical scoping narrows the match: role:form:Login > role:button:Submit finds the Submit button inside the Login form, even if other Submit buttons exist on the page.
How does recording work?
Browse your app normally. PiperTest captures every interaction as an AX-enriched step. You don't need to annotate anything or switch between a recorder panel and the app. Just use it.
Each captured step includes an axPath - the full accessibility-tree path from the document root to the target element - and elementMeta with the element's role, name, description, tag, and bounding box. This metadata serves two purposes: it makes the test human-readable ("clicked the Sign In button in the Login form") and it provides the structural context the self-healing engine needs when something changes.
The recording runs through Chrome DevTools Protocol (CDP) directly. No browser driver binary, no WebDriver middleware, no Selenium. ToolPiper holds a persistent WebSocket connection to Chrome and captures interactions in real time.
How does self-healing work?
This is the feature that changes how you think about test maintenance. When a test step's selector no longer matches the page, PiperTest doesn't fail immediately. Instead, the test runner:
- Takes a fresh AX tree snapshot of the current page
- Searches for nodes that match the original selector's role and approximate name
- Scores candidates by similarity - role match, name edit distance, tree position relative to the original
- If a high-confidence match is found, executes the action on the healed selector and records the mapping
The heal history is persisted with the test session. Subsequent runs use the healed selector directly. A button renamed from "Submit" to "Save" is healed automatically. A fundamentally different page structure correctly fails the test.
This means minor UI updates - label changes, element reordering, component renames - don't require manual test maintenance. The test repairs itself and keeps running. You review the heal log after the run to confirm the mappings are correct.
What assertion types are available?
PiperTest includes seven built-in assertion types that cover the most common verification needs:
- visible - confirm an element is present and visible on the page
- hidden - confirm an element is not visible (useful for testing dismissals and hide/show logic)
- text - verify an element's text content matches an expected value
- url - check the current page URL (exact match or pattern)
- count - verify the number of elements matching a selector (e.g., "3 items in the cart")
- attribute - check a specific attribute value on an element
- console - verify that a console message was (or wasn't) logged
Assertions use polling with configurable timeouts. They retry until the condition is met or the timeout expires, then capture an AX snapshot on failure for debugging. This handles the most common source of flaky tests - timing issues where the UI hasn't finished updating when the assertion runs.
How fast is test execution?
Each step executes in 10-50ms. PiperTest talks to Chrome via CDP WebSocket directly, with no browser driver binary in the middle. There's no Selenium server, no WebDriver protocol translation, no process spawning per action. One persistent WebSocket connection, direct protocol messages, immediate responses.
A 20-step login-and-navigate test completes in under a second. A 100-step comprehensive flow finishes in a few seconds. The bottleneck is your application's response time, not the test runner.
How does export to Playwright and Cypress work?
PiperTest stores tests in its own format because that format is what enables self-healing, AX enrichment, and the visual editor. But when you need tests in CI, you need Playwright or Cypress code.
Export is one click. The export renderer maps AX selectors to each framework's native selector format:
role:button:Sign Inbecomespage.getByRole('button', { name: 'Sign In' })in Playwrightrole:button:Sign Inbecomescy.contains('button', 'Sign In')in Cypresslabel:Emailbecomespage.getByLabel('Email')in Playwrighttestid:submit-btnbecomescy.get('[data-testid="submit-btn"]')in Cypress
The exported code is clean, idiomatic, and ready to paste into your test suite. It's not a code dump with comments saying "auto-generated" - it's the code you would have written by hand, generated from the recording.
What does the test editor look like?
PiperTest has a hierarchical tree UI for managing test steps. Each step is a row showing the action type, selector, and value. You can click any step to edit it inline, drag steps to reorder them, and right-click for a context menu with options to duplicate, delete, or insert steps above or below.
The smart fill feature auto-detects the input type when you record a fill action. <select> elements get programmatic option selection instead of text input. Date and time inputs use native value setters that bypass the browser's date picker UI. Range sliders, color pickers - each input type gets the right interaction strategy automatically.
Can AI write tests for me?
PiperTest exposes six MCP tools: test_list, test_get, test_save, test_delete, test_run, and test_export. Any MCP-capable AI client - Claude Code, Cursor, Windsurf, or your own integration - can create, modify, run, and export tests programmatically.
The workflow looks like this: the AI takes a browser snapshot (AX tree), reasons about what should be tested, generates PiperTest steps, saves them, runs them, and reports the results. Because the AX tree is plain text, any model can participate - it doesn't need special browser automation capabilities. It just reads the accessibility tree and generates structured steps.
This isn't a replacement for thinking about what to test. The AI is good at generating coverage for visible UI flows. It's not good at knowing which edge cases matter to your business. Use AI for the tedious parts - recording happy paths, generating assertion steps, expanding a 5-step test to 20 steps with error handling - and apply your judgment for the test strategy.
What are the current limitations?
PiperTest is honest about what it doesn't do yet.
Chrome only. PiperTest uses Chrome DevTools Protocol for everything - recording, execution, assertions, AX tree access. Firefox and Safari have different debugging protocols and different accessibility tree implementations. Multi-browser support is on the roadmap but not shipping today.
Recording captures what you do. It doesn't infer what you should test. If you forget to check that an error message appears after invalid input, the recorder won't add that assertion for you. You still need to think about coverage.
Complex auth flows may need editing. Multi-page authentication with OAuth redirects, CAPTCHA challenges, or MFA codes often requires manual step editing after recording. The recorder captures the flow, but tokens and codes change between runs.
These are real limitations. PiperTest is best suited for testing application flows where you control the environment - staging servers, seeded test data, predictable auth. It's not a full replacement for Playwright's configuration depth on a large CI matrix.
Try It
Download ToolPiper from the Mac App Store. Open Chrome, navigate to your app, and start recording. Your first test takes less time to create than it would to write the test file boilerplate in Playwright.
When you're ready for CI, export to Playwright or Cypress and add the generated file to your repo. The visual recorder creates the test. The export puts it where your pipeline expects it.
This is part of a series on local-first AI workflows on macOS. For the technical architecture behind PiperTest's AX-native selectors and CDP engine, see AX-Native Browser Automation.