You Can't Align AI Judgment If You've Lost Your Own | Jono Herrington

I pull up our PR dashboard most mornings before my first meeting. Not to approve syntax. Not because I distrust my team. I read pull requests to understand how my engineers are thinking across three continents ... what architectural decisions they're making, where patterns are starting to diverge between regions, what small fault lines are opening up before they harden into permanent inconsistency. Without that reading, my judgment about what good looks like in our codebase goes stale. And stale judgment, in an era when your team can ship ten times the code they could three years ago, is an entirely different problem than it used to be.

That last part matters. The volume changed everything.

Stack Overflow's 2024 Developer Survey found that 76% of developers were already using or planning to use AI coding tools. In the same ecosystem, technical leaders at many organizations were drafting AI governance policies from conference slides and vendor briefings rather than from any direct contact with their own codebase. The output doubled. The evaluation ability did not.

If you've already lost the thread on how your team writes code, AI makes that gap permanent. The output volume doubles. The signal you're qualified to evaluate halves.

I've been inside that gap. Not as a deliberate choice. As a slow slide I almost missed while it was happening.

The Slide Looks Like Competence

When I drift from the technical work, it never feels like abandonment. It feels like prioritization. The calendar fills with cross-functional alignment calls. The 1:1s, which used to start with code, start ending with sprint status. I hear a technical explanation in a design review that I don't fully follow, and instead of stopping to ask the question, I nod through it and tell myself I'll catch up later. The codebase goes weeks without surprising me.

Those are the signals. I recognize them because I've felt every one of them.

The nodding is the one that gets me most. When you've built credibility over years, people assume your nod means understanding. They're not trying to deceive you. You're not trying to deceive them. But you've quietly stopped asking the question that would expose the gap. And the gap grows, week by week, until the thing you're being asked to govern is a system you can no longer see clearly.

The drift is survivable in a normal engineering environment. A leader a few steps behind can catch up in a strong retrospective, ask the right questions, and close most of the gap before it costs anything. That calculus does not hold in an AI one. A team using AI coding tools is shipping code at a pace I've never managed before. If I've lost the signal on how they make decisions ... which abstractions they reach for, which patterns they default to, where the AI output is strong and where it needs a human hand ... I'm not a governing force. I'm a signature on a policy document.

Governance Is Not Judgment

There's a distinction that gets blurred in most conversations about AI standards, and it matters enough to draw clearly.

Governance is setting the policy. Defining what's acceptable. Writing the guidelines. Building the approval flow. Leaders can do this from a meeting agenda. You can attend the workshops, read the vendor documentation, benchmark against what other organizations are doing, and produce a governance framework that looks thorough.

Judgment is knowing whether the policy is working. Whether the output your team is shipping actually meets the standard you wrote. Whether the AI-generated code that passed the automated linter has the structural coherence that won't collapse under load six months from now. Whether the pattern your senior engineer merged last week is something you'd be comfortable defending in a postmortem.

You cannot develop judgment from a policy document. You develop it by staying close enough to the work that you know what you're evaluating. The governance framework is only as useful as your ability to see whether it's being applied.

You can write the AI governance policy. You can't know whether it's working if you've stopped being able to read the code it governs.

This is the specific failure mode I watch for in myself. Not whether I've shipped a framework. Whether my judgment is still sharp enough that the framework reflects something real.

What Reading PRs Actually Does

There's a version of "leaders should stay technical" that becomes its own kind of theater. I want to be specific about what I'm not describing.

I'm not in every PR. I'm not blocking deploys waiting for my review. My team doesn't need my sign-off before anything ships. What I need is enough signal to keep my model of how we build current and accurate.

When I'm reading PRs, I'm asking different questions than the reviewer checking the logic. I'm asking whether this service boundary reflects the architectural direction we set six months ago. Whether the error handling approach matches what the team in Europe is doing, or whether we're accumulating inconsistency that will cost someone two days of debugging on a production incident a year from now. Whether the AI-generated output is getting accepted wholesale, or whether engineers are pushing back and improving it.

That pattern recognition is what makes my judgment useful when it actually matters. A bug came through our system not long ago. Instead of routing it through the normal support path, I got into the logs, traced a data issue, and had the customer experience resolved in thirty minutes. That was only possible because I still knew where to look.

When I drifted ... when the reading stopped ... what I noticed first wasn't that I was wrong about things. It was that I was slower. The questions I asked in architectural conversations had a lag that came from not having current context. I was reconstructing from memory instead of observing from live knowledge. You can maintain credibility for a while on memory. But memory doesn't update. In an environment where your team's output is changing every week, memory is the wrong instrument.

The Accountability That Doesn't Shift

The most uncomfortable part of this is what stays constant.

You are still accountable for the quality. Whether you've read a PR in six weeks or six months, the production incidents belong to you. The architectural decisions that decay over time belong to you. The standards that looked good in a policy document and didn't survive contact with the codebase belong to you.

What you've lost isn't accountability. You've lost the ability to see what you're accountable for.

The output volume doubled. The signal you're qualified to evaluate halved. The accountability didn't move.

That's the version of this problem that's specific to the AI era. Teams working at human pace could be led by someone a few steps behind. The gap was manageable. A strong sprint retrospective could close most of it.

AI output doesn't wait for the retrospective. It compounds weekly. And the leader who's trying to govern AI judgment without maintaining their own isn't catching up in a sprint review. They're reading a policy document in a system they no longer understand.

I still read pull requests. I still get into the logs when something goes sideways. I make sure the codebase surprises me at least once a week ... because the day it stops is the day I've lost the thread.

If you're being asked to set AI standards for your team, start there. Not with the framework. Not with the vendor.

When did the codebase last surprise you?