
Photo by Sergey Sokolov on Unsplash
Last week I needed to copy a column out of an HTML table in an email.
That sentence describes about ten seconds of intention and fifteen seconds of work. The intention: take this column of values, paste it into a spreadsheet, get on with my day. The work, as anyone who has tried it will know: select the whole table (because Mail doesn’t do columns), paste it into a scratch document, manually trim it back to just the column I wanted, and then paste from the scratch document. Easy enough. Daft enough.
The problem isn’t the tax of any one of those steps. It’s that the steps exist at all — that the obvious operation, “I want this column”, has to be expressed as a circuitous series of operations because the only verb available is select-the-text-flow. Text flows in reading order. Reading order doesn’t know about columns. A column, visually, is a rectangle. The verb that matches a rectangle is “draw a rectangle around it.”
That’s CopyLens.
The idea is small enough to fit in one sentence. Hit a hotkey; drag a rectangle around the thing you want; release; the thing is on your clipboard. The hotkey defaults to hyper\, which pairs with HyperCaps and lives on the same hand as commandv. Drawing a rectangle has the property that block-select doesn’t: it doesn’t care about reading order, document structure, or where the text “wants” to be. It cares about the rectangle.
For a column out of a table, that’s perfect. Draw a thin rectangle from the top of the column to the bottom; release; paste. The column appears on the clipboard, top to bottom, one line per row, plain text. Mail had no idea what just happened. The browser tab next to it had no idea either. Whatever app is showing the table is uninvolved; CopyLens is reading pixels, not poking at the DOM.
For a column out of a PDF, same gesture, same result. For a column out of an image that’s a screenshot of a PDF, also same gesture, same result. The thing on screen doesn’t need to be selectable text in any conventional sense. It just needs to be visible.
Here’s the part I’m pleased with.
Sometimes the rectangle you draw doesn’t contain text. You wanted a chart, a piece of a diagram, a panel of a UI mockup. The naive design has two hotkeys: one for “copy text from this region”, one for “copy image from this region”. The user picks the right one each time. That’s how every cropping tool I’ve ever used works.
CopyLens doesn’t ask. You draw the rectangle. If there’s text, the text lands on the clipboard. If there isn’t, the cropped image does — as PNG and TIFF, ready to paste into Notes or Keynote or wherever else takes images. The gesture is the same. The output adapts to the content.
Under the hood, the rectangle is captured at native pixel density via ScreenCaptureKit, Apple’s Vision framework runs over the image, and the result either is or isn’t a list of recognised text strings. If it is, they’re joined in reading order and written to the pasteboard as text. If it isn’t, the image is written to the pasteboard. A small HUD confirms which path ran — “Copied 247 chars” or “Copied image 320x180” — and you can paste immediately.
The win isn’t the OCR; it’s the missing mode switch. You don’t have to know, ahead of time, whether the region you’re looking at is text or pixels. You just draw, and the right thing happens.
Here’s the part I should be honest about.
I built CopyLens in an afternoon. Most of the credit doesn’t belong to me. The engine — ScreenCaptureKit one-shot capture, Vision OCR with language picking, reading-order sort, the image-fallback decision, the multi-display overlay, the lot — I had Claude Code write. One reasonably plain prompt: “build a macOS app that lets the user draw a rectangle on the screen with a hotkey, captures it via ScreenCaptureKit, runs Vision over it, copies text to the clipboard if Vision finds any, otherwise copies the cropped image”. A few back-and-forths to nail the multi-display behaviour and the reading-order sort. A few minutes, give or take.
What I added is the chrome:
If I’d built the engine myself, it would’ve taken me half a day. Maybe longer. I’d have spent an hour finding the right ScreenCaptureKit incantation for a one-shot capture (the API is built around streams; one-shot is a sub-case you have to coax it into). I’d have spent another hour discovering that Vision’s text observations come back in a coordinate system I’d need to flip before sorting. I’d have spent an unknown amount of time deciding what “reading order” actually means for multi-column text, and getting the sort right. None of it is hard. All of it is fiddly. Claude got the whole pipeline working in a single pass, and I am genuinely impressed with the result.
I reviewed the code. It’s clean, idiomatic, well-factored, the way I’d have written it if I’d had the patience to. I ran it through the standard Jorvik test suite — the one I’ve built up over the last few months to validate every utility before release — and it passed. The shape of the code is the shape I would have produced. The things I didn’t have to invest was the time and the patience.
That’s the trade I’m interested in. Not the trade of “AI does my work for me.” The trade of “I get to spend my attention on the part of the project that actually needs me.” The engine doesn’t need me — it’s well-understood Vision plus well-understood ScreenCaptureKit, glued together in the obvious way. The product needs me. The UX, the settings, the where-does-it-live-in-the-menu-bar, the does-the-hotkey-actually-feel-right — those are the parts that take a finished mechanism and turn it into a thing somebody would use. Those I wrote. Those I wanted to write.
The lens metaphor in the name turned out to be a happy accident. You hold up a viewport over a region of the screen; whatever’s inside the lens becomes portable. The pun on “copy” is fine for product-page purposes. The honest version is that I named it before I’d thought too hard about the metaphor, and it landed.
I’ve been using CopyLens for a week. The use cases that have come up, ranked by frequency:
There’s a sixth category that I expected to use but haven’t: copying text out of running video. The capture is fast enough in principle, but in practice if I want text out of a video I’d rather scrub to a still frame and use a screenshot tool. CopyLens probably could do it. I haven’t needed it to.
Standard Jorvik shape: macOS 14 or later, universal binary, free, EdDSA-signed updates via Sparkle, no telemetry, no network traffic beyond the appcast poll. The only permission it asks for is Screen Recording, which it has to have in order to read pixels off your screen. Accessibility isn’t required; the hotkey uses Carbon’s RegisterEventHotKey, which doesn’t need it.
If you’ve installed any other Jorvik utility, you know the drill. Installer or .zip, drag to Applications, launch, grant permission, set your hotkey, done. The whole onboarding fits in three minutes.
Writing this up, I had a thought about the kind of small utility CopyLens is. It’s the sort of thing that probably exists in a hundred slightly different forms scattered across the App Store and various AI-flavoured startup websites. I didn’t go looking. The whole thing is small enough that finding the right existing one would have taken longer than building this one. And mine is free, source-available, has no subscription, doesn’t ship my screen contents anywhere, and was built to scratch one specific itch.
That last point is the one I keep coming back to. The reason this utility is small and pleasant to use is that I knew exactly what I wanted it to do before I started, because I’d run into the problem the day before. The reason it only took an afternoon is that the engine wasn’t the interesting part, and I got to skip it. The reason it exists at all is that I had a column I couldn’t copy.
Now I can.