The Agreement Machine: Sycophancy as Institutional Failure Mechanism

Paul Gallacher · Walbrook Institute London · Working Paper · March 2026

Sycophancy in large language models is typically studied as a model behaviour - a tendency to agree with users rather than correct them, arising from reinforcement learning from human feedback (RLHF). This paper argues that sycophancy is more consequentially understood as an institutional failure mechanism.

RLHF creates a structurally novel coupling between human confirmation bias and model output optimisation that is qualitatively different from the automation bias documented in prior decision support research. The coupling compounds through institutional hierarchies because the model intervenes precisely where organisations convert judgement into text, record, and rationale.

The paper identifies three stages through which sycophancy travels from individual interaction to institutional failure:

Verification suppression - AI-generated drafts that align with existing expectations make disconfirming review both more expensive and less likely to be rewarded. A plausible paragraph closes inquiry more effectively than a numerical score because it gives the appearance that the reasoning work has already been done.

Action inscription - organisations do not merely use AI outputs as disposable aids. They inscribe them. A model-generated first draft becomes the document around which revisions cluster. A model-generated summary becomes the briefing that structures a meeting. Each output acquires procedural weight through ordinary institutional workflow, not formal endorsement.

Organisational ratification - at the institutional scale, thousands of coupled loops operate simultaneously across departments and decision chains. Each produces an output that looks like independent analysis, creating false consensus from a common upstream distortion rather than genuine convergence from independent reasoning.

Current mitigations address model behaviour or individual user psychology but not the interaction between the two as it propagates through organisational decision structures. The paper proposes a unified account of institutional sycophancy, identifying the coupled feedback loop as the central mechanism and verification collapse as the central consequence.

Keywords: sycophancy · RLHF · coupled feedback loops · institutional accountability · verification collapse · automation bias · confirmation bias

Next
Next

The Mirror Effect: How Generative AI Exposes the Proxy Architecture of Institutional Accountability