Code coverage tells you which lines of code your tests execute. It says nothing about whether users can actually click every button, fill every form, or navigate every flow. A page can have 100% code coverage and still have interactive elements that no test ever touches. What you really want to know is: which user interactions are tested, and which are not?

This is the gap between code coverage and interaction coverage. And until now, no tool has measured it.

What is the difference between code coverage and interaction coverage?

Code coverage tools - Istanbul, nyc, c8 - instrument your JavaScript and track which lines execute during a test run. You get a percentage: 80% of your code was touched. This is useful for finding dead code and untested branches. But it measures the wrong thing for UI testing.

A line of code can execute during a test that never actually interacts with the UI element that line renders. Your React component's render function runs, your event handler is defined, your CSS-in-JS styles are computed - and the coverage tool marks all of that as "covered." But no test ever clicked the button that render function produced. No test ever submitted the form. No test ever triggered the dropdown.

Interaction coverage measures something fundamentally different: which buttons, links, inputs, checkboxes, and navigation paths are exercised by actual test steps. It answers the question code coverage cannot: did a test actually use this element the way a user would?

Why does this gap matter?

The elements your tests miss are the elements that break in production. An untested dropdown. An untested modal close button. An untested form validation error state. An untested pagination control. These are where bugs hide - in the interactions nobody verified.

Every QA team has experienced this. The test suite passes. Code coverage is at 85%. A customer reports that the "Cancel" button on the edit dialog does nothing. Nobody wrote a test that clicks it. The code coverage tool shows the component as covered because the render path executed. But no test ever interacted with that button.

Nobody measures interaction coverage because no tool tracks it. Code coverage tooling is mature, well-integrated, and runs in CI. Interaction coverage has historically been a manual audit - someone opens the app, clicks around, and tries to remember what was and was not tested. That does not scale, and it does not persist.

What would interaction coverage actually look like?

Imagine you could point a tool at any web page and instantly get a list of every interactive element - every button, every link, every input, every checkbox. Then compare that list against your test steps and highlight the ones nothing touches. That is interaction coverage.

It is not a replacement for code coverage. It is the other half. Code coverage tells you whether your logic is tested. Interaction coverage tells you whether your UI is tested. You want both.

What is PiperProbe?

PiperProbe is ToolPiper's interaction coverage system for PiperTest. It scans the current page's accessibility tree to build a page surface - every interactive element on the page - and compares your test steps against that surface to compute coverage.

The result is a percentage score and a list of uncovered elements, per page. Not per file, not per function - per page. Because users interact with pages, not source files.

Here is how it works step by step.

How does PiperProbe scan a page?

PiperProbe uses Chrome DevTools Protocol to scan the accessibility tree. The buildPageSurface() function walks the AX tree and identifies every interactive element - buttons, links, inputs, checkboxes, dropdowns, sliders, tabs, and similar elements. These are the elements a user can interact with. Decorative text, layout containers, and non-interactive elements are filtered out.

The result is a structured list of interactive elements with their roles, accessible names, and positions in the tree. This is the page surface - the complete set of things a user could do on this page.

But scanning at the right moment matters. Modern web apps built with React, Vue, Angular, or Svelte do not render synchronously. If you scan before the page finishes loading, you get an incomplete surface.

How does PiperProbe know when a page is ready?

PiperProbe injects a JavaScript enricher script via CDP. This script handles the timing problem - detecting when the page is stable enough to scan. It uses three mechanisms:

Framework detection. The script detects 16 frontend frameworks - React, Vue, Angular, Svelte, Next.js, Nuxt, SvelteKit, and others. Each framework has specific readiness signals. A React app is ready when hydration completes. An Angular app is ready when Zone.js stabilizes.

MutationObserver quiescence. The script watches for DOM mutations and waits until the page stops changing. This handles framework-agnostic rendering - any page that finishes rendering and settles gets detected.

Navigation API for SPA route changes. Single-page applications change routes without full page loads. The script detects these navigation events and triggers a re-scan, so coverage stays accurate as users navigate.

The split is intentional: JavaScript handles WHEN to scan (it can observe the DOM and framework state from inside the page), and Swift handles WHAT to extract (it uses DOM.getDocument(depth:-1) to walk the full document tree and pull inputType, testId, and other metadata by backendNodeId). Each layer does what it is best at.

How does PiperProbe compute coverage?

Once PiperProbe has the page surface (all interactive elements) and your test session (all test steps), it matches them. Each test step's AX selector is compared against the surface elements. A step with selector role:button:Sign In matches the surface element with role "button" and name "Sign In."

The computeCoverage() function produces a per-page coverage percentage: the number of surface elements matched by at least one test step, divided by the total number of surface elements. If your page has 40 interactive elements and your tests touch 30 of them, coverage is 75%.

More importantly, it produces the list of uncovered elements - the 10 things no test interacts with. This is the actionable output. You do not need to guess what is missing. You can see it.

What does the coverage UI look like?

PiperProbe surfaces coverage data in three places in the ModelPiper web app.

Live coverage bar. A horizontal bar that fills as coverage increases, color-coded: green for high coverage, yellow for moderate, red for low. This appears in the test session view and updates in real time as you add test steps or run tests.

Uncovered elements list. An expandable section below the coverage bar that shows every interactive element on the page that no test step touches. Each entry shows the element's role, accessible name, and position. This is the to-do list for improving coverage.

Sidebar badge. The test session sidebar shows overall coverage percentage as a badge, so you can compare coverage across multiple test sessions at a glance.

When does coverage recalculate?

PiperProbe auto-triggers in two situations: after recording stops (so you immediately see how much of the page your new recording covers) and after a test run completes (so you see updated coverage reflecting any added or modified steps).

Coverage percentage is persisted on TestSessionMeta.coveragePercent, which means it survives app restarts and is available for historical tracking. You can see whether your test coverage is improving or declining over time.

You can also trigger a scan manually via the HTTP API: POST /v1/browser/probe/scan initiates an AX scan, runs the enricher, and computes coverage. GET /v1/test-sessions/:id/probe returns the saved interaction map for a test session.

What are the honest limitations?

Interaction coverage complements code coverage - it does not replace it. You still want both. Code coverage catches untested logic branches. Interaction coverage catches untested UI elements. They measure different things.

Dynamic elements may be missed. Elements that appear only after specific user actions - a modal triggered by a button click, an error message triggered by invalid input, a dropdown that opens on hover - may not appear in a single scan. The scan captures what is on the page at that moment, not what could appear.

Coverage is per-page, not per-app. PiperProbe scans one page at a time. Multi-page coverage requires running tests that navigate between pages, with scans triggered at each route. There is no single "app coverage" number that aggregates all pages automatically.

Some features are not yet shipped. Phases 4 through 8 of PiperProbe - including re-audit diff (detecting what changed between scans), AI gap-filler (suggesting tests for uncovered elements), and dedicated MCP tools for probe data - are planned but not available today. What ships today is the core: surface scanning, coverage computation, and the coverage UI.

How is this different from code coverage?

The comparison is worth spelling out clearly, because these are tools that sound similar but measure fundamentally different things.

Code coverage answers: "Did this line of code run?" Interaction coverage answers: "Did a test actually click this button?" A manual QA audit answers: "Does this feel like it was tested?" All three have a place. PiperProbe fills the gap that has been empty until now - the automated, persistent measurement of which UI elements your tests exercise.

How does PiperProbe fit into a testing workflow?

PiperProbe is not a standalone tool. It is built into PiperTest, ToolPiper's visual testing system. The typical workflow looks like this:

Record a test by browsing your app normally. PiperTest captures each interaction as an AX-enriched step. When you stop recording, PiperProbe automatically scans the page and computes coverage against the steps you just recorded. You see the coverage bar and the uncovered elements list immediately.

Review the uncovered elements. Maybe your recording covered the login flow but missed the "Forgot Password" link, the "Show Password" toggle, and the "Sign up" link at the bottom of the form. These are now visible. Add steps for the ones that matter - or make a deliberate decision to leave them untested.

Run the test. After execution completes, coverage recalculates. If you added steps that touch previously uncovered elements, the percentage goes up and those elements disappear from the uncovered list.

Over time, the persisted coverage percentage on each test session lets you track trends. If a new feature adds 15 interactive elements to a page and your existing tests cover none of them, the coverage percentage drops - giving you a signal that your test suite needs updating.

Try It

Download ToolPiper from the Mac App Store. Create a PiperTest session, record a test, and watch the coverage bar populate. The uncovered elements list tells you exactly what your test missed.

No configuration. No instrumentation step. No CI integration required to get started. Just record, scan, and read the list.

This is part of a series on local-first AI workflows on macOS. For the visual testing system that PiperProbe measures coverage for, see Visual Testing on Mac. For the CDP engine underneath, see AX-Native Browser Automation.