When AI Sounds Right But Is Wrong: The PM's New Credibility Problem

A PM I spoke to recently described sending an AI-generated risk register to a steering committee. It was well-formatted, clearly written, and entirely consistent with the data she had fed into the tool. Two of the seven risks listed had already been resolved weeks earlier. One had been rephrased so sanitised that the actual severity was unrecognisable to the people in the room who knew the project. Nobody flagged it. The register looked authoritative. That, it turns out, was the problem.

The Problem with Polished

For most of the history of software-assisted work, errors were relatively easy to spot. A formatting glitch, a broken formula, an obviously misplaced number. The brain registers these as anomalies and triggers review. The human error-detection system, imperfect as it is, at least has something to catch.

AI-generated outputs have changed that dynamic in a specific and underappreciated way. The outputs are not obviously wrong. They are grammatically clean, structurally coherent, and confidently worded. They read the way authoritative documents are supposed to read. And that surface quality is doing work on the reader's brain that the reader is usually not aware of.

When something looks like a credible professional document, the mental cost of treating it as a credible professional document drops significantly. You stop reading critically. You start reading for confirmation. The review process that should be happening is quietly short-circuited by the presentation quality, and the output gets forwarded, presented, or acted on without the interrogation it actually requires.

This is not a failure of intelligence. It is a documented feature of human cognition. And it has a name.

What Automation Bias Actually Is

Automation bias was first formally described in aviation research in the 1990s. Pilots with access to automated flight systems were observed deferring to system recommendations even when cockpit instrumentation and their own judgment contradicted those recommendations. The phenomenon was not about laziness. It was about the cognitive authority that automated systems carry: because the system is perceived as objective and reliable, its outputs receive less scrutiny than a human colleague's would.

The pattern has since been documented in radiology (physicians miss diagnoses that automated screening did not flag, even when the imaging is visible), in financial trading (traders fail to override algorithmic recommendations even when market context clearly suggests they should), and in military operations (operators hesitate to contradict system-generated threat assessments). The setting changes. The mechanism does not.

"The danger is not that people trust AI blindly. It is that they trust it slightly too much, consistently, across hundreds of small decisions, in ways they do not notice because each individual deference feels entirely reasonable."

What matters for delivery professionals is this: automation bias does not require someone to be naive about AI. It affects experienced practitioners who know perfectly well that AI makes mistakes. The bias operates below the level of explicit belief. You can believe AI is fallible in the abstract while still, in practice, applying less scrutiny to AI-generated outputs than the situation warrants. The gap between stated belief and actual behavior is where the risk lives.

Where It Shows Up in Delivery

Automation bias in project delivery does not announce itself. It accumulates in small, individually defensible moments: the status report you skimmed because it looked thorough, the risk entry you passed on because it was worded with appropriate hedging, the meeting summary you forwarded because it hit all the right points. Each decision feels like efficiency. In aggregate, it is a slow erosion of the quality assurance layer that the PM is supposed to provide.

The sharpest way to see it is to look at what "obviously wrong" AI errors look like against what "plausibly wrong" ones do.

Obviously Wrong: Easy to Catch

Risk Register Entry

"Risk: The project may experience delays due to the retirement of the Apollo space program."

Obviously wrong. Caught immediately. No credibility damage.

Status Report

"The integration workstream is currently 47.3% complete and is forecast to finish on the 31st of February."

February has no 31st. Spotted at a glance. Never forwarded.

Meeting Summary

"Attendees agreed to increase the budget by £4.2 trillion to resolve the dependency."

Absurd on its face. Corrected before distribution.

Plausibly Wrong: The Real Danger

Risk Register Entry

"Risk: Supplier dependency may introduce timeline pressure in Q3 if procurement is not confirmed by end of June. Probability: Medium. Impact: Medium."

Sounds right. The actual risk is High/High and the deadline is two weeks away, not June. The phrasing absorbed all the urgency.

Status Report

"The integration workstream is progressing steadily. Minor resourcing adjustments are being managed within the team."

Sounds like a managed situation. The "minor adjustments" are a key engineer on sick leave with no cover plan.

Meeting Summary

"The team discussed the data migration approach and noted some open questions to be resolved ahead of go-live."

Technically accurate. The "open questions" are unresolved blockers that three people raised as critical in the meeting. None of that urgency survived the summary.

The obviously wrong examples are caught because they break a pattern the reader is holding. The plausible ones pass because they match the pattern. They use the right vocabulary. They have appropriate hedging. They fit the shape of a professional delivery document. And that fit is precisely what disarms the scrutiny reflex.

The Credibility Transfer Problem

There is a second-order effect here that is specific to PMs, and it matters more than the first-order error itself. When a PM forwards an AI-generated output without adequate interrogation, two things happen simultaneously. The error travels. And the PM's credibility underwrites it.

A risk register that arrives from a PM carries the implicit endorsement of the PM's judgment. A steering committee reading a status report does not receive it as "AI's view of the project." They receive it as the PM's view of the project. The authorship may be AI, but the accountability is human. When the report turns out to be wrong in ways that mattered, the question is not "what did the AI miss?" It is "why didn't the PM catch it?"

This is the credibility transfer problem. The AI generates the output. The PM distributes it. The organizational consequences, good or bad, accrue to the PM. In a world where AI-generated content is becoming the norm, the PM who cannot reliably distinguish signal from plausible noise is carrying a credibility liability they may not feel until a high-stakes moment makes it visible.

"AI does not have a reputation to protect. You do. Every output you distribute without interrogating it is a small transfer of your professional credibility to a system that does not share the consequences of being wrong."

The compounding problem is speed. The efficiency argument for AI-generated outputs is real: a status report that took two hours to compile manually now takes ten minutes. That efficiency is valuable. But if the ten-minute version receives ten minutes of PM attention and the two-hour version received two hours, the review quality has collapsed by a factor that the time saving does not justify. The PM has become faster at distributing outputs, not better at evaluating them.

Your Automation Bias Exposure

Automation bias is not evenly distributed. Certain delivery patterns create higher exposure than others. The signals below are not a formal diagnostic, they are indicators worth sitting with honestly.

Higher Exposure

You review AI-generated status reports by reading them rather than by comparing them against what you know independently about each workstream's current state.

Higher Exposure

You forward meeting summaries the same day they are generated without a review pass from someone who was in the room.

Higher Exposure

Your risk register is primarily AI-generated and your review cadence is monthly. You can't recall the last time you changed a probability or impact rating after reviewing a draft.

Higher Exposure

When a senior stakeholder asks about a specific risk or workstream, your first instinct is to open the AI-generated document rather than to answer from your own mental model of the project.

Lower Exposure

You use AI outputs as a starting draft that you actively rewrite, not as a finished product you lightly edit. Your version sounds like you, not like the AI.

Lower Exposure

You can articulate the current status of your three highest-priority workstreams, including their real risks, without opening any document. The AI produces a record of what you already know.

Lower Exposure

You have a habit of asking "what is this output not telling me?" before you distribute anything AI-generated. You look for the absence of information, not just the presence of it.

Lower Exposure

You've overridden or substantially rewritten AI risk entries in the past month based on your own judgment about severity. You treat the AI's assessment as one input, not the answer.

The Critical Evaluation Habit

Critical evaluation of AI output is a learnable skill, but it is not the same skill as general critical thinking. It requires knowing specifically where and how AI-generated delivery content tends to go wrong, and building a review practice around those failure modes rather than just reading more carefully.

The failure modes cluster around four patterns. AI flattens urgency: the phrasing it defaults to is measured and professional, which systematically underrepresents the emotional temperature of a real conversation. It optimizes for coherence over accuracy, meaning a summary that reads well is not necessarily one that reflects what was actually said. It works from the data it was given, not the data that exists. If the project context in the tool is stale, the output reflects that staleness without flagging it. And it cannot represent what was not said: the blockers people chose not to raise in the formal meeting, the concern a team lead mentioned informally afterwards, the subtext that everyone in the room understood.

The Four-Question Review

Apply before distributing any AI-generated delivery output

Does the urgency match what I know?

AI prose defaults to measured, professional language regardless of the actual severity of the situation. For every risk entry and every workstream status, hold the language against your own read of how serious the situation is. If your gut says "this is bad" and the AI says "some considerations may warrant attention," the output needs rewriting before it goes anywhere.

Ask: If I told a trusted colleague what this says, would they understand the actual level of concern?

What is this output not capturing?

AI can only represent what it was given. Check the output against the informal information you hold: the conversation you had in the corridor, the message someone sent you after the meeting, the thing the workstream lead said "off the record." If any of that changes the picture materially, it needs to be in the output, either explicitly or by adjusting the AI's assessment.

Ask: What do I know about this project that I didn't formally log anywhere, and does this output reflect it?

Is the data this was built on current?

AI-generated outputs reflect the state of the data at the point of generation. Projects move faster than data gets logged. A risk that was resolved in Tuesday's stand-up may still appear in Wednesday's AI-generated register if the record was not updated. Go through each item and verify currency, not just content.

Ask: Is there anything in this output that I know has changed since the data was last updated?

Would I stake my credibility on this?

This is the final filter, and it is the most useful one because it restores the right frame of reference. You are not reviewing an AI document. You are reviewing a document that will carry your professional endorsement. If a senior stakeholder challenged you on any item in this output, could you defend it from your own knowledge? If not, that item needs more work before it leaves your hands.

Ask: If challenged on this in a steering committee, am I responding from my own knowledge or from faith in the tool?

The four-question review does not need to take long. On a well-understood project, it might take five minutes. The point is not the time. It is the habit of switching into the evaluative mode at all, rather than defaulting to the editing mode that treats the AI draft as basically finished.

Neither Resistance Nor Compliance

The critical evaluation skill described in this essay is distinct from two positions that get mistaken for it. It is not AI resistance: the instinct to distrust AI outputs, manually replicate work the AI could do, and treat adoption of AI tools as a professional compromise. That position is expensive, increasingly untenable, and confuses the source of the value the PM provides.

It is also not AI compliance: forwarding outputs, endorsing recommendations, and building workflows around AI-generated content without maintaining an independent view of whether that content is accurate. That position looks efficient and may feel like effective adoption, but it is quietly transferring professional accountability to a system that cannot hold it.

The Position Worth Holding

Use AI to produce. Use judgment to validate. Distribute only what you can defend.

The PM who can do this well has something genuinely valuable in a world of AI-generated delivery content: the ability to guarantee the quality of what they put their name on. That guarantee is rare precisely because automation bias makes it hard to maintain. And it is valuable precisely because the consequences of failing to maintain it are increasingly visible and increasingly public.

Critical evaluation is not a tax on AI adoption. It is what makes AI adoption professionally safe.

The goal is not to check AI outputs because you distrust AI. It is to check them because you trust your own judgment more, and because the cost of being the PM who distributed a misleading risk register to a steering committee is a cost that no tool shares with you. The AI produces the output. The reputation consequence is entirely yours.

The PMs who will navigate this well are not the ones with the most sophisticated AI tools or the most sceptical relationship with them. They are the ones who maintain a clear, current mental model of their projects independently of what the tools say, and who use that model as the standard against which AI output is measured, rather than allowing the AI output to become the model.

Build the habit before you need it. Automation bias does not announce its arrival. It is already running in the background, in the outputs you are reviewing today, in the documents you forwarded last week. The question is whether your review practice is keeping pace with it.

The PM as AI Supervisor: Why Oversight Is Now a Core Delivery Skill

Automation bias covers what AI says. The supervision problem covers what AI does. When agentic tools are acting autonomously in your delivery environment, knowing when to trust the agent, when to override it, and who owns accountability when it acts wrongly is a separate and equally important skill.

Read the essay →

Iyanna Trimmingham-Daniel Founder, PM Pro Skillz · Strategic PM & Transformation Leader

The PM who described the risk register situation to me was not careless. She was experienced, technically capable, and genuinely trying to work efficiently. The issue was not her judgment. It was that she had not yet developed a mental model for where AI-generated delivery content specifically fails, so she was applying general review instincts to a problem with a different shape. This essay is the framework I wish I had been able to hand her the week before that steering committee.

The PM Reality Check

Weekly strategies for the human side of delivery. Plus the free Stakeholder Field Guide on signup.

When AI Sounds Right But Is Wrong: The PM's New Credibility Problem

The Problem with Polished

What Automation Bias Actually Is

Where It Shows Up in Delivery

The Credibility Transfer Problem

Your Automation Bias Exposure

The Critical Evaluation Habit

Neither Resistance Nor Compliance

The PM as AI Supervisor: Why Oversight Is Now a Core Delivery Skill

What AI Can Actually Do for Project Managers, and Where It Stops

The PM Reality Check