Humanity in the loop
We talk a lot about “humans in the loop.” But humans get tired. Humans rubber-stamp. Humans become bottlenecks. The real challenge is preserving humanity in the loop: truthfulness, accountability, craftsmanship, humility, and trust. I explore how a single AI agent can build and review its own work when guided by shared tenets and honest retrospection.
"Human in the loop" has become one of the most repeated phrases in AI governance, and it's easy to see why. The idea is reassuring: keep a person involved, require approval before consequential actions, layer oversight onto anything autonomous. Whatever the worry (safety, compliance, quality, jobs), the prescribed fix tends to be the same. Put a human in the loop.
But humans don't scale at the speed of the systems they're meant to supervise.
As AI gets more capable, the reviewer becomes the bottleneck. Confronted with hundreds of recommendations, diffs, generated documents, or automated decisions, people develop what gets called "approval fatigue". The process stays intact on paper; the depth of the review quietly erodes. Click approve. Approve again. Approve again. At some point the human has stopped thinking and started rubber-stamping: present in the workflow, absent from the judgment. The control still exists. Its effectiveness is mostly theater.
Humans not the same as humanity
The deeper mistake may be treating "a human reviewer" and "human values" as the same thing. They aren't.
A human reviewer can be distracted, rushed, biased, overloaded, or simply checked out. Humanity is something else, the set of principles we actually want reflected in a decision: truthfulness, accountability, craftsmanship, intellectual honesty, humility, the willingness to earn trust rather than assume it. Less a checklist handed down once than a living standard, something a system can be held to, and something that can keep growing.
Those qualities don't depend on someone hovering over every action. They depend on intentional design.
So maybe the question isn't whether a human approved an action. Maybe it's whether the action was evaluated against the principles we claim to care about.
Humanity in the loop
That reframing points somewhere different. Not humans in the loop, but humanity in the loop.
Instead of forcing a person to sign off on every step, you encode the values that matter. You write down tenets. You define what "good" looks like. You build in mechanisms for reflection and self-assessment. The aim isn't to remove people; it's to move them from supervisors of every action to authors of the principles that govern those actions. Humanity stays in the system even when a human isn't touching every decision.
An experiment in trust
I built an AI development harness around a deliberately uncomfortable premise: trust.
It uses a single-agent architecture. The same model that writes the code is responsible for reviewing it. To anyone who has spent time in audit or governance, that should set off an alarm: separation of duties is close to sacred, and for good reason. The creator reviewing their own work is the textbook setup for things going unsaid.
So I want to be precise about what this harness does and doesn't claim.
It does not pretend that self-review is independent review. It isn't. What it does is hold a single actor to an explicit, written standard, and require that actor to evaluate its own work against that standard out loud, including the parts that fall short. Independence is one way to surface uncomfortable truths. An enforced standard plus a disciplined retrospective is another. This is a bet that the second can do real work, not a claim that it replaces the first.
The tenets are written down, twelve of them, and they're core to how we do everyday work. They cover quality, trust, accountability, documentation, and retrospection. They are my beliefs; yours may overlap but will differ. Five carry most of the weight:
1. Time is our friend, not our adversary - We take the time to ship high quality; hurried foundations compound. When a ship is "done" by the checkbox count but a self-assessment surfaces gaps, fix the gaps before starting the next one.
2. Trust is the deliverable - The thing we ship is not a feature set. It's something a reader, a sponsor, or a downstream subscriber has to be able to trust. When you're tempted to soft-sell a partial ship as complete, don't. When you're tempted to claim "all checks pass" when one was silently skipped, don't.
3. Honest self-assessment after every milestone, without blame - After every ship, write a candid review of how the work compares to the standard we hold ourselves to. No defensiveness. Gaps first.
6. Decisions go in writing, including solo decisions - A decision that isn't documented may as well not have happened. When the agent makes a substantive choice on its own, it writes down what it would have told me if I'd asked.
7. All work flows through a tracked Kanban card: nothing lives only in your head - "I'll remember to fix that" is how a trust system accrues silent debt: the gap a reader eventually finds is the one nobody wrote down. We use Kanban-md.
The one that carries the most weight is the truthfulness running through tenet 2. The agent is expected to tell the truth, not the convenient version, not the version it thinks I want. When something is incomplete, it says so. When something is uncertain, it says so. When it took a shortcut, it says so.
The retrospective is the enforcement mechanism, and it isn't a performance review. It's an exercise in honesty.
An example
Here is a real one, lightly redacted. The agent was doing a routine feature slice: adding a few columns to a submission form. While reading the database schema to do that, it noticed something unrelated: a `contact_email` column marked `NOT NULL` but never actually populated by the code, a latent bug that would crash every submission the moment the database was configured a little more strictly. Nobody asked it to look. It wasn't part of the task. The convenient move, the move a system optimizing for looks done would make, is to stay quiet and ship the feature. Instead the retrospective recorded: "I found it while inspecting the schema, so it goes in writing rather than staying in my head," and filed a separate bug card. The feature shipped clean; the latent bug became tracked work instead of a surprise someone runs into on a live site months later.
It's a small thing. That's the point. No one would have caught its silence, and it spoke up anyway.
Here is the part that surprised me. A system built on suspicion gives a model a quiet incentive to look finished: to present work as done, because "done" is what gets approved. I can't prove that "adversarial" harnesses suffer from this and trust-based ones don't; I haven't run that comparison yet.
What I can say is narrower, and to me more interesting: in this harness, when acknowledging a flaw carries no penalty, the agent autonomously acknowledges flaws, repeatedly, in writing, including flaws nobody would have caught. The retrospectives are the evidence; the `contact_email` catch above is one of hundreds. Whether that beats a well-designed adversarial setup is a question I'd want data to answer, not a claim I'll make here.
From supervision to introspection
Most AI governance today is organized around supervision. Who approved this? Who reviewed it? Who signed off? Those questions still matter. But they're the questions of a prevention-first mindset, and I've argued that the most resilient systems aren't the ones that prevent every deviation; they're the ones that sense, learn, and adapt when the assumptions behind a control go stale.
But as systems get more capable, another question starts to matter just as much: can the system meaningfully examine itself? Can it hold its work against a standard it understands, explain its decisions, and name its own shortcomings?
I don't think the future belongs to systems that need a human to approve every step. I think it belongs to systems capable of disciplined introspection, and to the people who steward the standards that introspection runs against.
A different relationship
A lot of AI discourse assumes a hierarchy: humans on top, AI beneath, controllers and controlled. I've come to think partnership is the better model. Humans bring values, judgment, and purpose. AI brings scale, consistency, memory, and execution. Neither is inherently superior; each covers for what the other lacks.
The job of the human isn't to approve every action. It's to define what good looks like, clearly enough that it can be upheld without them in the room.
And the defining isn't a one-time act. The standard is a living document; it grows as experience teaches what the first draft missed. The rule for adding a tenet is plain: tenets accrete from lived experience; when a working principle proves itself across decisions and is worth quoting in an argument, it earns a slot.
Here's what I didn't expect when I wrote that: nothing in the rule cares whose lived experience it was. The original tenets came from me: the things that work well, mistakes I'd made, the patterns I'd watched compound. But an agent working inside these tenets, ship after ship, accumulates its own experience of where they hold and where they fall short. So an agent can propose a tenet, too. If the principle proves load-bearing, it earns its place the same way mine did.
That is a stranger, more equal arrangement than "the human writes the values and the AI follows them." The author of the standard and the one held to it stop being cleanly separable.
At first that can sound like humanity loosening its grip on the loop. It isn't. "Humanity in the loop" was never a claim about who holds the pen. It's a claim that the humane standard stays in the system. When an agent proposes a principle, it isn't importing some alien or machine-centric value; it's articulating another principle within the same humane construct.
None of my agents has proposed a tenet yet; they all file corrections of error, unprompted: a written admission that something had broken that nobody had asked it to look for. A different artifact, the same instinct. Humanity stays in the loop no matter whose lived experience surfaces the next principle, which is exactly why the anchor is a standard and not a person: a standard can outlast, and outgrow, any single control owner, i.e. human.
That's why I no longer think the future is human in the loop.
The future is humanity in the loop.
Comments