The Correction of Error
Every system makes mistakes. What separates a trustworthy one is what it does next. A Correction of Error is a structured, no-blame admission: what broke, why the system allowed it, and what changes so the whole class of bug can't return. In our harness the AI files them unprompted, against a written standard, including failures nobody would have caught. Prevention isn't maturity. Honest correction is.
Every system makes mistakes. The interesting question is never whether it will. It's what the system does the moment it finds one.
The usual reflexes are familiar. Assign blame. Add a gate. Write a sterner rule. Promise to be more careful next time. Each of these feels like progress, and most of them are theater, because none of them touches the thing that let the mistake through. The bug gets patched; the system that produced it stays exactly as it was, ready to produce the next one of the same shape.
In the harness I use to build software, the answer to "what do we do when we find an error" is a specific load-bearing artifact: a Correction of Error, or COE.
What is a COE?
The term comes from engineering culture, where mature teams have long written postmortems that examine an incident without putting anyone on trial. I first came across this process during my time at Amazon. I've since repurposed the practice for a setting it wasn't originally designed for: a codebase worked on jointly by a human and an AI agent, where most of the writing and most of the reading is now done by the agent.
A COE is one file per incident, named by the date it was discovered. It follows a fixed template, and the template is the whole point. It forces a few more questions that are deeper than "I'll be more careful":
- A timeline: when the bug was introduced, which commit, which later changes compounded it, when it was finally noticed and how.
- A root cause split in two. The immediate cause is the literal mechanism, the thing fixable in a single commit. The systemic cause is why the system permitted that mechanism to land and survive. One is a typo. The other is a missing test, an absent convention, a tool that doesn't exist yet.
- An impact broken across four axes: functional, user-visible, operational, and trust or data. The split keeps you honest about which axis the failure actually lived on, instead of letting "no one noticed" stand in for "no harm done."
- A detection section whose real subject is why it wasn't caught earlier. That question is where the next preventive action almost always hides.
- A fix split into symptom and systemic. The symptom fix mitigates the live incident. The systemic fix is filed as its own tracked card and prevents the entire class from recurring. Two different lifecycles, so the second one doesn't quietly get dropped once the fire is out. In audit lingo, this is the corrective control and preventative control respectively.
- A line called what we'd watch for: the signal that says "if this regresses, this incident is live again." That line is what turns a memorial into a check.
None of this is novel as documentation. What's novel is who's holding the pen.
The part that matters: who writes the COEs
In this harness, the AI agent writes the COEs. Not because it's told to after the fact, but because the discipline is part of the standard it works against, and it reaches for the artifact on its own when an incident clears the bar.
That should sound slightly wrong if you come from audit or governance, and it's worth sitting with. The same actor that wrote the bug is now writing the account of the bug. That's the textbook setup for things going unsaid. The convenient postmortem, says the cause was unforeseeable, the fix was a one-line edit, and there's nothing systemic to see. It's the postmortem a system optimizing to look finished would write.
The reason it doesn't collapse into that is the same reason the rest of the harness works, which I've written about elsewhere: the agent is held to an explicit written standard, and owning a flaw against that standard carries no penalty. When honesty is cheap and the convenient lie is the only thing with a cost, the incentive inverts. The agent files the uncomfortable version because nothing punishes it for doing so, and the standard it answers to rewards it.
So a COE here isn't a confession extracted under suspicion. It's closer to the opposite: a system that has been given room to be honest, actually using that room.
A real example
Here is an incident from this blog's codebase, told plainly.
The project ships a database baseline, a single file imports to stand up a fresh install of the blog. For a while, that baseline was generated by dumping an existing, working database. Reasonable on its face. The trouble is that the database being dumped had quietly drifted from the migrations that were supposed to define it. So the dump faithfully captured the drift, and every fresh install built from it inherited a schema that no migration had ever described: a forked feed-ingestion structure, and two security tables missing entirely. The missing tables silently disabled login brute-force throttling and API rate limiting. This would have impacted every instance built that way.
The immediate cause was a handful of columns and tables in the wrong state. Easy to patch. The systemic cause was the sentence the COE made us write down: regenerating the baseline by dumping a drifted database launders the drift into something that looks authoritative. The fix wasn't "be careful when cutting baselines." It was a verification step, now a hard gate, that checks every new baseline against the migration history before it's allowed to ship, plus the rebuild of the baseline itself as the first one was never verified that way.
That's the difference the template earns. Without the immediate-versus-systemic split, this incident closes with "fixed the columns" and recurs the next time someone ships a new baseline. With it, the class of bug is closed, not the instance.
The corpus remembers what a person forgets
A single COE is useful. A folder of them is something else.
The drift incident above wasn't the first of its kind. An earlier incident, months prior, had documented a smaller schema drift and ended with a line in its what we'd watch for section: essentially, if a third incident of this shape shows up, the problem isn't any one migration, it's how we cut baselines. When the drift incident landed, it was logged explicitly as the instance that tripped that trigger. The earlier document had predicted the category of its own successor, and the successor arrived to fulfill the prediction.
No human memory was doing that work. The series was. A pile of one-line reminders doesn't compose into a claim like "this is the third time, so the cause is structural." A dated, indexed corpus does, and it can be reviewed on a schedule: read every COE, list the systemic fixes, ask whether any of them have regressed, ask if a new COE is needed. That review is something an AI agent can run on demand or at the end of every delivery, which means the institutional memory of the project doesn't decay between sessions or walk out the door with a human contributor.
This is also why a COE is a heavier instrument than a quick note to self. A note is for standing facts and preferences, the things that were always true. A COE is for incidents, the things that broke. Incidents are exactly the knowledge that needs a timeline, a causual analysis, and a place in a series to be worth anything later.
Correction over prevention
It's tempting to read all of this as a prevention story: write enough COEs, close enough classes, and eventually the bugs stop. That's not the claim, and I think the prevention framing is the weaker one.
A system that aims only to prevent every deviation stays effective right up until the moment one of its assumptions quietly goes stale, and then it fails without warning, because nothing in it was ever designed to notice it had gone wrong. The more durable posture is the one that expects to be wrong sometimes and is built to sense it, learn from it, and adapt. A COE is that posture made concrete. It is not a promise that the mistake won't happen. It's the machinery for making the next mistake cheaper than the last.
There's a quieter thing the practice does too. When the only record of a mistake is "I'll remember next time," every recurrence is a personal failing, embarrassing and individually scoped, and the natural response to embarrassment is to bury the next one. When the record is a COE that names a systemic cause and proposes a structural fix, a recurrence becomes evidence about the system rather than an indictment of the actor. That reframing is what keeps the mistakes surfacing honestly, which is the precondition for ever fixing them. It holds across the human-and-AI boundary cleanly, because a system has no ego to bruise and a standard has no reputation to defend.
So I've stopped measuring the maturity of this harness by how few errors it produces. I measure it by what happens in the minutes after one is found: whether the account is honest, whether the cause is named one layer deeper than the symptom, whether the fix closes the class, and whether the file goes into a series that the project will actually read again.
Prevention isn't the maturity signal in the age of artificial intelligence. Self-correction is.
Comments