The Accessibility API Is Underrated

When I started building Browser Notes, I needed to know which web page the user was looking at. That’s it. Just the URL in the address bar.

My first instinct was JavaScript injection via AppleScript — controlling the browser, executing scripts, reading the DOM. I wrote about how that went. Short version: it was a security nightmare and I deleted it all.

The replacement was the macOS Accessibility API. It reads a single text field — the address bar — and that’s enough. When I built Browser Commander, the same API let me scrape every link on a page, detect whether the user was typing in a text field, and figure out which browser was in focus. No browser extensions. No AppleScript controlling anything. No injected code.

I’ve become a bit evangelical about this API, because it solves problems that most developers reach for much heavier tools to handle. And almost nobody talks about it.

What it actually is

The Accessibility API1 is a C-based framework that exposes every running application’s UI as a tree of elements. Every window, button, text field, menu item, and label becomes a queryable node with attributes you can read — and in some cases, write.

The core primitive is AXUIElement. You create one for any running application using its process ID, and from there you can walk the entire UI tree. Every element has a role (AXTextField, AXButton, AXLink), a value, a position, a size, children, and a parent. You can read them. You can observe changes. For some attributes, you can set them — which is how window managers move and resize windows from other applications.

It was designed for assistive technology. Screen readers like VoiceOver use it to describe what’s on screen. But the API doesn’t care why you’re reading the UI tree. It just gives you access to it.

What I use it for

In Browser Notes, the entire data flow is:

  1. Get the frontmost application’s PID
  2. Walk the accessibility tree to find an AXTextField or AXComboBox containing something that looks like a URL
  3. Read the value
  4. Look up notes for that URL in a local SQLite database

That’s it. The tree walk is depth-limited to 12 levels, which is more than enough to find an address bar in any browser I’ve tested. The polling runs every 0.75 seconds. The entire accessibility interaction is maybe 20 lines of meaningful code.

Browser Commander goes further. Its link scraper walks the page’s accessibility tree looking for elements with the role AXLink, pulling their URL from kAXURLAttribute (or kAXValueAttribute as a fallback, because browsers aren’t consistent). It collects up to 1,000 links from up to 25,000 visited nodes per page. Different browsers expose link text differently — some use the title attribute, some use description, some bury it in child AXStaticText elements — so the scraper tries each source in order.

It also detects text input focus. Before handling a backspace press as “navigate back,” it checks whether the focused element is an AXTextField, AXTextArea, AXSearchField, or AXComboBox. If you’re typing, backspace is backspace. If you’re not, backspace goes back. This is the kind of context-aware behaviour that would be painful to achieve with any other approach.

Why not the alternatives?

I tried AppleScript first. Here’s the thing about AppleScript’s UI scripting: it uses the Accessibility API under the hood. When you write tell application "System Events" to get value of text field 1 of window 1 of process "Safari", that’s walking the same AXUIElement tree I’m walking directly. AppleScript just adds a layer of string-based, error-prone abstraction on top.

AppleScript also requires controlling the target application, which means NSAppleEventsUsageDescription in your Info.plist and a separate permission prompt. The Accessibility API needs its own permission, but it’s one permission for all apps — not per-target.

Browser extensions would work for the URL reading, but they only work inside the browser. They can’t detect which browser is in focus. They can’t tell you whether the user is typing in a text field in a native app. They require installation in every browser. And they need updating when browser extension APIs change. Pitch that proposal against the Accessibility API one, which enables Browser Commander to support sixteen browsers with zero extensions installed.2 It’s a no-brainer.

The permission model

The Accessibility API requires explicit user permission, granted in System Settings → Privacy & Security → Accessibility. There’s no way around this, and there shouldn’t be — an API that can read every text field and click every button in every running application should require consent.

The permission check is straightforward:

let trusted = AXIsProcessTrustedWithOptions(
    [kAXTrustedCheckOptionPrompt.takeRetainedValue(): true] as CFDictionary
)

Pass true for the prompt option and macOS will show a dialog directing the user to System Settings. After that, I poll AXIsProcessTrusted() every two seconds until it returns true, then start the event tap and begin work.

One thing that tripped me up: the permission is tied to your code signature. Rebuild with a different signing identity — or no identity — and macOS silently revokes the permission without telling you. You’ll see the toggle enabled in System Settings, AXIsProcessTrusted() might even return true, but the API calls will quietly return nothing. I’ve learned this the hard way more than once, and I’ve written about code signing for exactly this reason.

Sandboxed apps — which includes everything on the Mac App Store — cannot receive Accessibility permission at all. AXIsProcessTrusted() always returns false. The prompt never appears. This is why every window manager, clipboard manager, and automation tool you’ve ever used is distributed outside the App Store.

Who else uses it

Some of the most useful macOS utilities are built on this API:

These aren’t niche toys. Rectangle alone has millions of users. They all work because every standard macOS application automatically exposes its UI through this tree. You don’t need the target app to opt in, support a scripting dictionary, or install a plugin.

The rough edges

I won’t pretend the API is pleasant to work with. It’s a C API from the early 2000s. In Swift, you’re dealing with CFTypeRef return values, Unmanaged pointer bridging, and manual memory management patterns that feel like time travel.

Every call is an IPC round-trip to the accessibility server, so walking deep trees is slow. Batch reads with AXUIElementCopyMultipleAttributeValues() help, but the API’s design encourages the slow pattern of reading one attribute at a time.

Apple’s documentation is sparse — function signatures with minimal explanation. Most practical knowledge lives in GitHub issues and Stack Overflow posts. Wrapper libraries like AXSwift and the newer AXorcist exist to smooth over the worst of it, but even with wrappers, you’re working with an API that Apple maintains but clearly doesn’t love.

Electron apps are a particular frustration. They disable their accessibility tree by default for performance. Unless VoiceOver is running, tools that walk the tree will see a mostly empty structure for apps like Slack, VS Code, and Discord. This isn’t a limitation of the API itself, but it means your app might work perfectly with Safari and Chrome, then appear broken with Electron-based browsers.

And the permission model, while correct in principle, has practical sharp edges. Code signing changes silently invalidating permissions. No way to detect why an API call returned nothing. The TCC database occasionally getting into inconsistent states. These are solvable problems, but they eat debugging time.

The point

The Accessibility API is not glamorous. It’s a 20-year-old C framework with poor documentation, IPC overhead, and a permission model that requires manual user interaction. It will never be the subject of a WWDC keynote.

But it lets you read the UI of any running application without that application’s cooperation or knowledge. It lets you observe changes, detect focus, enumerate elements, and interact with windows — across every app on the system, from a single permission grant. No extensions. No scripting dictionaries. No injected code.

For Browser Notes, it replaced several hundred lines of security-questionable JavaScript injection with a clean, universal, 20-line tree walk. For Browser Commander, it enabled link scraping and context-aware keyboard handling that would have required per-browser extensions otherwise.

If you’re building a macOS utility that needs to know what’s happening in other applications, look at the Accessibility API before you reach for AppleScript or browser extensions. It’s not pretty, but it’s powerful, and it’s right there.

Footnotes

  1. Part of the ApplicationServices framework, specifically HIServices. The header file AXUIElement.h is the starting point. Apple’s official documentation exists but is minimal — you’ll learn more from reading the header comments and other developers’ code. 

  2. Safari, Safari Technology Preview, Chrome, Chrome Canary, Firefox, Firefox Developer Edition, Firefox Nightly, Arc, Edge, Brave, Opera, Vivaldi, Orion, Chromium, Zen Browser, and Nicegram. If it has an address bar, the Accessibility API can read it.