article2026-04-02by Ben Racicot

Accessibility Testing Automation on Mac: AX Selectors as Audits

TL;DR

PiperTest uses AX selectors - role:button:Sign In, label:Email - that resolve against Chrome's real accessibility tree. If an element isn't accessible, the selector won't find it. A button without an accessible name breaks role:button:Sign In. An unlabeled form field breaks label:Email. This means every PiperTest run is also an implicit accessibility audit. WebAIM's 2025 study found 94.8% of top homepages have WCAG failures. Automated tools catch only 30-40% of violations. AX-native testing catches a different class of issues: missing labels, broken roles, invisible interactive elements. Testing and accessibility auditing collapse into one activity.

PiperTest showing a test failure caused by a missing accessible name on a button, revealing both a broken test and an accessibility violation simultaneously

When tests target the accessibility tree, a test failure is an accessibility failure

Why does accessibility always come last?

Accessibility audits happen after the feature is built, after QA signs off, and usually after launch. A team builds a settings page. QA tests it manually. It ships. Three months later, someone runs an axe-core scan and discovers 12 WCAG violations: missing labels on form fields, low-contrast text on secondary buttons, a custom dropdown that isn't keyboard navigable.

The violations get filed as bugs. A developer who didn't write the original code picks them up in a future sprint. They fix the ARIA labels, adjust the contrast ratios, and refactor the dropdown. The fix takes longer than the original build because the context is gone.

This pattern repeats across the industry. WebAIM's 2025 analysis of the top 1,000,000 home pages found that 94.8% had detectable WCAG 2 failures, averaging 51 errors per page. The number improved slightly from 95.9% in 2024, but the scale of the problem is staggering. Nearly every website has accessibility issues, and they persist because the testing happens too late in the development cycle to prevent them.

The reason is structural. Accessibility tools are separate from testing tools. You write Playwright tests to verify functionality, then run axe-core to check accessibility. Two separate activities, two separate tools, two separate parts of the workflow. Teams with deadline pressure cut the second one because it doesn't block the deploy pipeline.

What if the testing tool itself enforced accessibility? Not as an add-on scan, but as a fundamental property of how selectors work. If your test selectors could only resolve against accessible elements, every test failure would be either a functional bug or an accessibility violation. You'd never ship an inaccessible interactive element because your tests would catch it before the deploy.

How do AX selectors work as accessibility checks?

PiperTest selectors target Chrome's accessibility tree directly via CDP's Accessibility.queryAXTree. The selector role:button:Sign In asks Chrome: "Is there an element with the accessibility role 'button' and the accessible name 'Sign In'?"

If the answer is no, the test step fails. And the reason the answer might be no tells you something important about your application's accessibility.

Missing accessible name. A <button><svg>...</svg></button> without an aria-label or visible text has no accessible name. The AX tree node has role "button" but an empty name. role:button:Sign In won't match it. The test fails. The failure message says "Element not found: role:button:Sign In." A developer investigates, discovers the button has no label, and adds aria-label="Sign In". The test passes. The screen reader user can now identify the button. The test failure was the accessibility audit.

Wrong ARIA role. A <div onclick="submit()">Sign In</div> looks like a button visually, but it has the AX role "generic" (a div with no semantic role), not "button." role:button:Sign In won't match it. The fix is either role="button" on the div or, better, using a real <button> element. Either way, the accessibility improves because screen readers now announce it correctly.

Hidden interactive elements. An element with aria-hidden="true" or display: none is excluded from the accessibility tree. If a button is visually present via CSS tricks but marked as hidden from assistive technology, PiperTest won't find it. The test fails. The developer realizes the element is invisible to screen readers and fixes the visibility.

Unlabeled form fields. label:Email resolves against the accessible name of form inputs, which comes from associated <label> elements, aria-label, or aria-labelledby. A text input without any of these has no accessible name. The selector won't resolve. The test fails. Adding a proper label fixes both the test and the WCAG 1.3.1 (Info and Relationships) violation.

None of these checks require a separate accessibility scanning tool. They emerge naturally from the selector strategy. If you can't select it by its accessible properties, it doesn't have accessible properties.

What does the accessibility tree actually show?

PiperTest's browser_snapshot command returns the full AX tree of the current page. This is the same data that screen readers like VoiceOver, JAWS, and NVDA consume. It's the ground truth of your page's accessibility.

Here's what a well-structured form looks like in the AX tree:

form "Login"
  heading "Sign in to your account" (level 2)
  textbox "Email address" (required)
  textbox "Password" (required)
  checkbox "Remember me"
  button "Sign In"
  link "Forgot password?"

And here's what the same form looks like when it has accessibility issues:

generic
  generic
    StaticText "Sign in to your account"
  textbox (required)         ← no label
  textbox (required)         ← no label
  generic                    ← div with onclick, not a checkbox
    StaticText "Remember me"
  generic                    ← div with onclick, not a button
    StaticText "Sign In"
  StaticText "Forgot password?"  ← not a link

The second version renders identically in a browser. Users with a mouse see the same form. But screen reader users experience a completely different page: unlabeled inputs they can't identify, fake buttons they can't activate with the keyboard, and a text string they can't follow as a link.

Every PiperTest selector that targets the first version's accessible names and roles would fail against the second version. The test suite becomes a regression safety net for accessibility. A future developer who accidentally removes a label or swaps a <button> for a styled <div> breaks both the test and the accessibility in one action. The test failure catches it before deployment.

What accessibility issues do automated tools miss?

Understanding what AX-native testing catches requires understanding what existing tools don't catch.

Automated accessibility tools like axe-core, Pa11y, Lighthouse, and WAVE are excellent at detecting certain classes of violations: low contrast text, missing alt text on images, missing form labels, duplicate IDs, and incorrect heading hierarchy. Deque (the company behind axe-core) reports their engine detects up to 57% of accessibility issues while maintaining zero false positives.

But the broader research consistently shows that automated tools catch only 30-40% of WCAG violations. The UK Government Digital Service ran a definitive test: they intentionally created a webpage with 142 accessibility barriers and tested it with 13 automated tools. The best-performing tool found 40% of the barriers. The worst found 13%.

What do they miss? Issues that require contextual judgment:

Meaningful alt text. A tool can detect a missing alt attribute. It can't determine whether alt="image" is meaningful or useless.
Logical reading order. A tool can check that headings exist. It can't determine whether the heading hierarchy makes semantic sense for the content.
Keyboard navigation flow. A tool can check that elements are focusable. It can't determine whether the tab order follows a logical path through the interface.
Error identification and recovery. A tool can check that aria-invalid exists. It can't determine whether the error message actually helps the user fix the problem.
Consistent navigation. A tool can check individual pages. It can't determine whether navigation patterns are consistent across the site.

AX-native testing catches a different category. It doesn't replace axe-core for contrast ratios or heading hierarchy. But it catches functional accessibility failures that axe-core doesn't flag: interactive elements that aren't accessible to assistive technology. A button that visually works but has no semantic role. A form field that visually has a label but isn't programmatically associated. A custom component that handles click events but isn't keyboard-operable.

These are the violations that break the user experience for screen reader users, not just the violations that appear in a compliance audit.

How does this compare to dedicated accessibility tools?

PiperTest isn't an accessibility testing tool. It's a functional testing tool that uses accessibility as its selector mechanism, which produces accessibility coverage as a side effect. Understanding the difference matters for choosing the right combination of tools.

axe-core is the gold standard for automated WCAG rule checking. It tests 100+ WCAG rules across levels A, AA, and AAA, detects up to 57% of issues with zero false positives, and integrates into every major testing framework and CI pipeline. What it doesn't do: verify that interactive elements actually work through assistive technology. It checks that ARIA attributes are valid. It doesn't check that the element behind those attributes behaves correctly when a screen reader user activates it.

Pa11y is a free, open-source Node.js tool for CI pipeline integration. It runs headlessly against URLs or sitemaps, supports WCAG 2.0, 2.1, and Section 508 rule sets, and outputs JSON, CSV, or HTML reports. Excellent for regression detection across large sites. Same automation coverage gap as axe-core: 30-40% of violations.

WAVE (by WebAIM) provides visual overlay feedback in the browser, showing accessibility issues in context. The 2025 version 3.3 aligns with WCAG 2.2 failures. Private (no data sent to servers). Good for developer education because the visual feedback helps you understand why something is a violation. Limited to one page at a time in the free version.

Lighthouse uses axe-core as its accessibility engine, so it has the same rule coverage. The accessibility score (0-100) is a weighted average of passing audits. Useful as a high-level signal. Not a comprehensive audit tool.

BrowserStack Accessibility offers automated WCAG testing on real devices with CI/CD integration, screen reader testing with VoiceOver, TalkBack, JAWS, and NVDA. The most comprehensive cloud-based option. Enterprise pricing.

PiperTest sits alongside these tools, not in place of them. The optimal setup is: PiperTest for functional testing with implicit accessibility coverage on interactive elements, axe-core in CI for WCAG rule checking across all elements, and periodic manual audits for the 60-70% of issues that no automated tool catches.

What does the coverage report reveal about accessibility?

PiperProbe's combined coverage report scans the AX tree to find every interactive element on each page, then maps test steps against those elements. The coverage percentage tells you what proportion of interactive elements are exercised by tests.

But there's a second, implicit meaning. Every element that appears in the coverage report is accessible. It has a role, a name, and a position in the AX tree. It's visible to screen readers. It's targetable by assistive technology.

Elements that aren't accessible don't appear in the AX tree at all. A custom dropdown built with <div> elements and no ARIA roles shows up in the DOM but not in the AX tree. PiperProbe won't list it as an interactive element. It won't appear in the coverage report. The absence is the signal.

If your team builds a page with 20 interactive elements and PiperProbe finds only 14, the 6 missing elements aren't just untested. They're inaccessible. They're invisible to assistive technology. The coverage report doubles as an accessibility surface audit without any additional configuration.

This approach catches a class of issue that rule-based tools miss entirely. axe-core checks whether existing ARIA attributes are valid. PiperProbe reveals whether interactive elements exist in the accessibility tree at all. An element with no ARIA attributes and no semantic HTML role doesn't trigger an axe-core violation because there's nothing invalid. But it's invisible to screen readers, and PiperProbe's coverage gap makes that visible.

Does this replace accessibility audits?

No. AX-native testing catches a specific and important category of accessibility issues: missing roles, missing labels, hidden interactive elements, and broken semantic structure. It catches them early (during development, not after launch) and automatically (as a side effect of testing, not as a separate activity).

It doesn't catch contrast ratios. It doesn't check heading hierarchy. It doesn't evaluate whether alt text is meaningful. It doesn't test keyboard navigation flow across the full page. It doesn't assess cognitive accessibility. These require dedicated tools (axe-core, Pa11y, WAVE) and manual testing (keyboard navigation, screen reader walkthroughs).

The value is in the collapse of two activities into one. Instead of running functional tests and then running accessibility scans, AX-native testing gives you both simultaneously. The functional test that verifies "clicking Sign In logs the user in" also verifies "the Sign In button is accessible to screen readers." The form test that verifies "submitting with an empty email shows an error" also verifies "the email field has a programmatic label."

For teams that currently do no accessibility testing (which, given the 94.8% failure rate, is most teams), AX-native testing via PiperTest is a zero-cost entry point. You're not adding accessibility work to your sprint. You're getting accessibility coverage as a byproduct of the testing work you're already doing.

What about the ADA Title II deadline?

The ADA Title II deadline of April 24, 2026 requires all state and local government entities with populations of 50,000 or more to conform to WCAG 2.1 Level AA for their websites and mobile apps. This applies to thousands of government websites that have never been formally audited.

AX-native testing doesn't produce a compliance certificate. It doesn't check every WCAG 2.1 AA success criterion. But it establishes a baseline of functional accessibility that catches the most user-impacting violations: buttons that screen readers can't find, form fields without labels, interactive elements hidden from assistive technology.

For government teams scrambling to meet the deadline, the practical approach is layered:

PiperTest for functional testing with AX selectors - catches missing roles, labels, and hidden elements during normal test authoring
axe-core in CI - catches the 57% of WCAG violations it can automate (contrast, headings, alt text, ARIA validity)
Manual audit of critical flows - keyboard navigation, screen reader walkthrough, cognitive assessment for the 60-70% that automation misses

This layered approach covers more ground than any single tool because each layer catches a different class of issue. AX-native testing catches functional accessibility failures. axe-core catches rule-based violations. Manual testing catches contextual issues. Together, they approach comprehensive WCAG coverage.

What does the daily workflow look like?

Here's how AX-native testing changes the development cycle for a team that currently does no accessibility testing.

Day 1: Record a test. Open your app in Chrome. Start PiperTest recording. Interact with the login page. PiperTest captures selectors like role:button:Sign In, label:Email, label:Password. If these selectors resolve, your login page's interactive elements are accessible. If they don't, you've found your first accessibility issues.

Day 2: A test fails on a new feature. A developer builds a settings page with custom toggle switches implemented as styled <div> elements. The test tries role:switch:Notifications. It fails. The div has no ARIA role. The developer adds role="switch" and aria-checked. The test passes. The toggle is now accessible. Time to fix: 2 minutes, during development, not three months later in a separate sprint.

Day 5: Coverage reveals gaps. PiperProbe scans the dashboard. It finds 25 interactive elements in the AX tree. Your tests cover 18. But you built 30 interactive elements. The 5 missing from the AX tree are custom components without ARIA roles. They work with a mouse but they're invisible to assistive technology. The coverage report showed you something axe-core wouldn't have flagged.

Week 2: The pattern is established. Tests use AX selectors by default. Every test that passes confirms both functionality and accessibility of the elements it touches. Accessibility issues surface at the earliest possible moment: during test authoring and execution, not during post-launch audits.

Try it

Download ToolPiper from modelpiper.com/download. Record a test on any page of your app. Look at the AX selectors PiperTest generates. If they resolve cleanly, your interactive elements are accessible. If they don't, you've found the issues that matter most: the ones that prevent assistive technology users from using your application.

Run browser_snapshot on any page. Read the AX tree output. That's what screen readers see. If something important is missing from that tree, it's missing from the experience of every user who depends on assistive technology.

Combine PiperTest with axe-core in CI for rule-based WCAG checking. The two approaches are complementary: PiperTest catches functional accessibility failures (missing roles, missing labels, hidden elements) while axe-core catches rule-based violations (contrast, heading hierarchy, ARIA validity). Together, they cover more ground than either alone.

This is part of the AI-powered testing series. Next: Self-Healing Test Selectors - how PiperTest's three healing modes work under the hood. For the business case, see Reduce Test Maintenance Cost. For AI-assisted test generation, see AI Test Generation.

Side-by-side comparison of an accessible form's AX tree with clean role and label nodes versus an inaccessible form's AX tree with generic nodes and missing labels

The same form, two accessibility profiles - AX selectors only resolve against the accessible version

Accessibility Testing Tools

Tool	What It Checks	WCAG Coverage	Catches Missing Roles/Labels	CI Integration	Cost
axe-core	100+ WCAG rules: contrast, alt text, ARIA validity, heading hierarchy	Up to 57% of violations (zero false positives)	Yes (as ARIA rule violations)	Yes (Playwright, Cypress, Selenium, Jest plugins)	Free (open source)
Pa11y	WCAG 2.0, 2.1, Section 508 rule sets via HTML CodeSniffer	30-40% of violations	Partially (label association checks)	Yes (headless CLI, JSON/CSV output)	Free (open source)
WAVE	WCAG 2.2 aligned rules: contrast, structure, ARIA, labels, alt text	30-50% of violations	Yes (visual overlay shows issues in context)	Paid API ($0.04/check), free browser extension	Free extension, paid API
Lighthouse	axe-core engine + performance + SEO + best practices	Same as axe-core (30-40%)	Yes (via axe-core rules)	Yes (CLI, CI integrations)	Free (built into Chrome)
BrowserStack Accessibility	Automated WCAG testing on real devices + screen reader testing	40-50% automated + manual screen reader coverage	Yes (real screen reader validation)	Yes (cloud CI/CD)	Enterprise pricing ($129+/mo)
PiperTest AX selectors	Functional tests that only resolve against accessible elements - missing roles, labels, hidden interactive elements	Different category: catches functional a11y failures that rule-based tools miss	Yes (primary mechanism - selectors fail if role/label missing)	Via Playwright/Cypress export	Free

Frequently Asked Questions

Does PiperTest replace axe-core or other accessibility testing tools?

No. PiperTest catches a different class of accessibility issues. axe-core checks WCAG rules like contrast ratios, heading hierarchy, alt text quality, and ARIA validity. PiperTest catches functional accessibility failures: interactive elements with missing roles, unlabeled form fields, buttons hidden from assistive technology, and custom components invisible to screen readers. The two approaches are complementary. Use PiperTest for functional testing with implicit accessibility coverage on interactive elements, and axe-core in CI for rule-based WCAG checking.

How does AX-native testing catch accessibility issues that axe-core misses?

axe-core checks whether existing ARIA attributes are valid. PiperTest reveals whether interactive elements exist in the accessibility tree at all. A custom component built with <div> elements and no ARIA roles doesn't trigger an axe-core violation because there's nothing invalid to flag. But it's invisible to screen readers. PiperTest's selectors won't find it, and PiperProbe's coverage report will show it as missing from the interactive surface. The absence is the signal.

Does this help with WCAG 2.1 Level AA compliance?

It helps with a subset. AX-native testing directly addresses WCAG 1.3.1 (Info and Relationships) by requiring programmatic labels and semantic roles, WCAG 4.1.2 (Name, Role, Value) by requiring accessible names and valid roles on interactive elements, and WCAG 2.1.1 (Keyboard) indirectly by surfacing elements that aren't in the accessibility tree and therefore aren't keyboard-operable. It doesn't address contrast ratios, timing adjustability, text alternatives for non-text content, or the many success criteria that require human judgment.

What percentage of accessibility issues will AX-native testing catch?

No single number applies because AX-native testing catches a different category than rule-based tools. Automated rule checkers (axe-core, Pa11y, Lighthouse) catch 30-57% of WCAG violations. AX-native testing catches functional accessibility failures that rule-based tools often miss: elements invisible to assistive technology, missing labels on interactive elements, broken semantic structure. The overlap between the two categories is partial, which is why combining them provides better coverage than either alone.

Is this relevant for the ADA Title II April 2026 deadline?

Yes, as part of a layered approach. The ADA Title II deadline requires WCAG 2.1 Level AA conformance for state and local government websites. No single automated tool achieves full conformance (automated tools catch 30-57% of violations). AX-native testing via PiperTest catches functional accessibility failures during development. Combined with axe-core for rule-based WCAG checking and manual audits for contextual issues, the three layers cover more ground than any single approach. The practical advantage: PiperTest catches issues at the earliest point in the development cycle, when they're cheapest to fix.

TestingAccessibilityBrowser AutomationWCAGPrivacymacOS

Self-Healing Test Selectors: How PiperTest Fixes Broken Tests AutomaticallyThe AX-based healing mechanism that keeps accessible selectors working after UI changes AX-Native Browser Automation: Why We Built Our Own CDP EngineThe CDP engine that reads Chrome's accessibility tree for both testing and accessibility Reduce Test Maintenance Cost on Mac: AX Selectors and Self-HealingThe business case for AX selectors - hard numbers on maintenance savings Test Coverage on Mac: See Which Interactions Your Tests Actually CoverCoverage reports that reveal which interactive elements are accessible and tested