We all knew AI would eventually generate fake citations. That was almost the boring part. The more interesting question is why so many of them passed through systems designed to evaluate knowledge in the first place.
A recent study audited 111 million references across 2.5 million papers and preprints. Its estimate: nearly 147,000 hallucinated citations entered scientific literature in 2025 alone, many surviving peer review and later appearing in published journal articles. The numbers are striking. But that was not the part that stayed with me.
The Weak Point
What stayed with me was how little friction a plausible-looking citation can encounter once a system is already operating near capacity. Science has always depended partly on trust. Organizations do, too. Peer reviewers are overloaded, researchers publish under pressure, managers skim presentations between meetings. Very few people can independently verify the assumptions behind a market forecast, an AI roadmap or a strategy paper. So credibility often gets assessed indirectly: institutional reputation, internal alignment, familiar language, confidence.
Large language models fit remarkably well into environments like these. They produce communication already shaped into plausibility. Correct structure. Familiar cadence. Convincing synthesis. A citation formatted exactly the way a citation should look. Often enough to pass. Not necessarily enough to be true.
Plausibility Scales Differently
For years, the dominant assumption was that organizations mainly suffered from too much information. Too many emails, reports, dashboards, PDFs, Slack messages. I’m no longer sure quantity was ever the hardest part. The harder part may have been orientation all along. And systems built around plausibility behave differently once plausible communication becomes nearly free.
I notice this changing the way I read. Fluency used to be a fairly reliable proxy for competence. Now I sometimes become more attentive when a text feels slightly too complete. Too frictionless. Too perfectly balanced. Not because AI-generated writing is inherently bad. Often it is useful. Sometimes excellent. But increasingly, I pay attention to other signals: specificity, selective emphasis, traces of constraint, moments where a text reveals trade-offs or lived familiarity with a subject.
The Familiar Becomes Easier to Reproduce
One detail from the study keeps bothering me. The hallucinated citations disproportionately assigned credit to already prominent scholars. That feels less like a bug than a structural tendency. Large language models compress probability distributions from existing systems. The familiar becomes easier to reproduce. Visibility compounds. Canonical language becomes even more canonical.
And underneath all this, a feedback loop is beginning to form. Fake citations enter papers. Papers enter databases. Databases enter future model training. Future models generate new text from contaminated corpora.
Fluency Gains Power
Science will not collapse because of this. But our interfaces with knowledge are changing. More and more people encounter expertise indirectly now: through summaries, generated syntheses, assistants, briefings, AI-mediated search. Under those conditions, fluency gains power.
Which may explain why proximity to people with proven judgment suddenly feels more valuable again. Experts who still know where a number came from. People willing to say “I don’t know” before producing another plausible synthesis. Communities where reputation depends not only on visibility, but on traceability.
That feels less like nostalgia than adaptation. Plausibility is becoming abundant. Grounded interpretation may not scale as easily.
