I asked AI one simple question this week: how often does AI hallucinate? It pulled a real study from a real university showing one popular model hallucinated 94% of the time. Dramatic. Authoritative. But something felt off, so I asked for more details.
The AI model in that study has gone through two major upgrades since it was published. Its hallucination rate today? Roughly 4%. It literally could not have been more wrong! I was one lazy follow-up question away from repeating a stat that was accurate a year ago but now it's totally useless.
Here's what most people get wrong about AI mistakes.
Everyone talks about "hallucinations," the term for when AI fabricates something out of thin air. Invents a citation. Makes up a statistic. To make matters worse, research found that when AI hallucinates, it actually sounds more confident than when it's telling the truth. 34% more likely to use words like "definitely" and "certainly" when the information is wrong. That's a real problem, but it's also the easy one to catch.
The harder problem is what happened to me. The AI didn't make anything up. The Columbia Journalism Review study is real. The 94% number is real. It's just completely outdated and misleading. The AI pulled the most dramatic data point from its search results and presented it like it was the final word. Accurate citation, misleading conclusion. And the AI had no idea there was a difference.
The same week, AI saved me two hours of work.
I also used AI this week to analyze employee survey responses from my team. Just uploaded eight emails at once and asked for patterns. In about 30 seconds, it identified something I probably would have missed: three different people described the same frustration with our marketing strategy using completely different language. One called it "confusing." Another thinks it's our "pricing model." The third just said the process "needs work."
A human reading those surveys on a Friday afternoon connects maybe two of those. The AI treated every response with equal weight and caught the linguistic pattern across all three. We improved our marketing plan the next day.
So in the span of five days, AI gave me a dangerously outdated statistic about AI accuracy and also helped my management team with a challenge we've been talking about for months. Same tool. Same week. Very different results. And that's exactly the point.
The skill nobody is teaching.
Right now the conversation about AI falls into two camps. Camp one says AI is transforming everything and you're falling behind if you're not using it. Camp two says AI is unreliable and you can't trust anything it produces. They're both right, which is the most annoying possible answer.
The actual skill is knowing which mode you're in. When AI is summarizing data you provide, like those employee surveys, it's extremely good. You gave it the inputs. It's pattern-matching across a closed set. The answers are grounded in real material you control.
When AI is pulling information from the open internet, citing studies, quoting stats, telling you what's current in a fast-moving field, it's doing something fundamentally different. It's making editorial choices about which data to present. And it has no concept of "this was true last year but isn't anymore." It doesn't know the difference between fresh and stale. It just knows what sounds authoritative.
The question to ask isn't "Is this real?" It's "Is this still real?"
Frankly, it's the same issue with the internet, the news and gossip. We all need to dig deeper, push for clarification, and question the "facts". AI is not the altruistic, all-knowing, trustworthy guru some people make it out to be. And that's not surprising.
This week's takeaway:
Next time you use AI to research anything that changes fast (markets, technology, competitors, regulations), add four words to the end of your question: "Is this still current?" Force it to check its own work. I do this now and it's already caught two situations where the first answer pulled from outdated sources. It takes five seconds and it's the difference between a useful tool and a confident liar.
⚡ ONE MORE THING
The AI I used to analyze those employee surveys costs me $20 a month. That's less than most people spend on streaming services. And it caught a pattern across eight documents that would have taken me an hour to find manually, if I found it at all. The tool isn't expensive. Using it without thinking is.
— Matt
P.S. Curious which model went from worst to first? Grok is actually one of the best models for accuracy now. A year ago it was the worst. If that doesn't convince you to always fact-check AI-generated research, nothing will. Well, maybe AI helping find a cure for a dog's cancer will.
Forwarded this email? Sign up here.
