← Writing

The Last Mile of AI

The distance between a working demo and a deployed system is where most AI projects die. Notes from shipping AI into power plants, refineries, and banks.

Every AI demo works. That’s what makes them demos.

The distance between that demo and a system a power-plant operator trusts at 2am is the last mile, and it’s where most AI projects die. I’ve spent my career in that mile: threat detection scanning a million enterprise mailboxes, agents reasoning over inspection data from robots crawling boiler walls, and now customer-facing agents at Sierra for financial institutions. The industries change. The lessons don’t.

The demo is the easy 80%

A RAG pipeline that answers questions about your documents takes an afternoon. A RAG pipeline that answers questions about inspection reports from a 40-year-old boiler, where a wrong answer means someone defers maintenance on a component that fails takes months, and most of those months aren’t spent on the model.

They’re spent on the boring things: what happens when the source data is missing, or contradictory, or formatted the way a field technician formats things at the end of a twelve-hour shift. The model is rarely the bottleneck. The world it has to operate in is.

Evals are the product spec

In traditional software, tests verify behavior you designed. In AI systems, evaluations define behavior you can’t fully design. When we built agent systems for industrial customers, the evaluation pipeline wasn’t a quality gate at the end. It was the specification. If you can’t write down what a good answer looks like for your hundred ugliest real-world cases, you don’t yet know what you’re building.

Automated evals, A/B frameworks, drift monitoring. These sound like MLOps hygiene. In practice they’re the only way to iterate fast without breaking the trust you’ve built, because trust is the actual product.

Deployment is a social problem wearing a technical costume

The hardest part of putting AI into a high-stakes environment isn’t latency or context windows. It’s that a person who has done their job well for twenty years now has to decide whether to believe a machine. That decision doesn’t happen in your UI. It happens over weeks, in the gap between what the system claims and what they later verify with their own eyes.

You earn that trust the same way a new hire does: be right about small things repeatedly, be transparent about uncertainty, and never be confidently wrong about the same thing twice. Systems that show their work, with sources cited, confidence expressed, and escalation paths honored, get adopted. Systems that present answers as oracles get quietly ignored, no matter how good the benchmark numbers were.

Why I keep choosing the last mile

There’s a version of AI engineering that optimizes benchmarks, and a version that changes how a refinery schedules maintenance or how a bank serves a customer at midnight. The second one is slower, messier, and involves a lot more meetings in rooms without whiteboards. It’s also the only version where you find out whether any of it was real.

The last mile is undefeated. Respect it, staff for it, and design for it from day one, or watch your demo stay a demo.