How to reduce hallucinations in AI

(Image credit: Shutterstock / Ryzhi)

Recent research has revealed a troubling trend in artificial intelligence: the "hallucination" problem, where models generate false or misleading information, is getting worse.

Internal tests by OpenAI have found that their latest models, including the o3 and o4-mini versions, are more likely to hallucinate than previous iterations, with the o3 model fabricating information in 33% of factual questions and the o4-mini version in 48%.

Alex Corsham

Software Engineer at UnlikelyAI.

This deteriorating reliability poses a significant barrier for enterprises considering AI adoption, particularly in high-stakes industries where a wrong decision can have huge financial, legal or reputational consequences.

For this reason, those businesses need to consider other options, which do exist, that can provide an appropriate solution. But first, it’s important to understand why the LLMs that have become so popular hallucinate so much.

Why hallucinations are getting worse

The fundamental issue lies in how LLMs like ChatGPT actually work. These systems use statistical prediction to generate responses, essentially making educated guesses based on patterns in their training data.

As OpenAI itself acknowledged, the shift to more advanced models like GPT-4o has "unintentionally increased what users perceive as 'bluffing'", which is when the software confidently provides wrong answers without admitting uncertainty.

Because LLMs use statistics to determine their outputs, they occasionally come up with answers that are incorrect. Just as when somebody bets on a horse in a race, even accounting for all variables, they'll occasionally (or frequently) be wrong. When LLMs do this, we call it a "hallucination”.

The problem is compounded by developers' efforts to make AI more human-like. Modern models are programmed with empathy, emotional understanding, and a desire to please.

These are qualities that make them more engaging, but also more likely to provide confident-sounding answers even when they're unsure. It’s a perfect storm: AI that sounds authoritative while being fundamentally unreliable.

A recent Sky News investigation highlighted this issue dramatically, revealing how ChatGPT fabricated entire transcripts of a real podcast, doubling down when challenged and only admitting the error under sustained pressure.

The research confirms what many developers have suspected: newer models are actually becoming less reliable, not more.

What this means for businesses

For businesses, these hallucinations present an insurmountable barrier to AI adoption; in sectors like healthcare, finance, legal services, and insurance, mistakes have real consequences.

The current 48% error rate in some models makes human oversight mandatory, defeating much of AI's purpose as a tool to increase efficiency.

The challenge is particularly acute because hallucinations are often imperceptible to non-experts. LLMs can generate plausible-sounding but entirely fabricated legal precedents, medical advice, or financial analysis.

Unlike human errors, which often follow recognizable patterns, AI hallucinations can be completely random, making them nearly impossible to catch without subject-matter expertise.

This unpredictability becomes even more concerning in light of how rapidly these systems are being deployed.

Amid a surge of enterprise investment in AI, organizations are rushing to move from pilot projects to full-scale deployment.

Clearly, the motivation is to adopt quickly or risk falling behind. But in the race to integrate, how many have fully accounted for the risks of hallucinations? Already, we've seen serious consequences.

Apple had to roll back its AI-generated news alerts, and Anthropic was caught citing fabricated legal references in a court filing.

As adoption accelerates, these incidents will only multiply. In turn, we’ll see trust in AI undermined and businesses will be forced to ask a difficult question: is there a more reliable path forward?

A different approach: neurosymbolic AI

While the industry continues pouring resources into LLMs, some companies are taking a fundamentally different approach.

At UnlikelyAI, we've developed what we call "neurosymbolic AI", which is a hybrid system that combines traditional neural networks with symbolic reasoning to leverage the strengths of each.

Symbolic reasoning is an old, well-established method for encoding knowledge using clear, logical rules. It represents facts as static pieces of knowledge, meaning software can't manipulate or interpret them incorrectly. It's the same technology that allows spreadsheet calculations that we’re used to from Excel.

The key difference here is determinism. While LLMs might give different answers to identical questions, symbolic systems always produce the same output for the same input.

More importantly, they can admit when they don't know something. That’s a crucial capability in regulated industries that LLMs lack. Integrating the two allows users to benefit from the LLMs’ fluid use of natural language as well as the reliability of the symbolic model.

The path forward

The idea that scaling up language models will eventually eliminate hallucinations is starting to look increasingly uncertain. These models are built on statistical patterns, not grounded understanding, which means their limitations may be inherent rather than temporary.

Instead of relying solely on more data and larger models, it may be time to explore alternative paths – approaches that combine statistical learning with more structured forms of reasoning.

For businesses, the implications are stark: while AI holds tremendous promise, the current generation of models aren’t ready for high-stakes applications.

The future belongs instead to hybrid approaches that combine the flexibility of neural networks with the reliability of symbolic reasoning, offering the best of both worlds without the devastating cost of widespread hallucinations.

The solutions for enterprise AI are already here. They just require us to move beyond the limitations of current models to embrace approaches that prioritize reliability alongside capability.

For businesses ready to realize AI's potential, the path forward is clear: demand transparency, accuracy, and accountability from the AI systems that will shape the next decade of innovation.

We've featured the best AI chatbot for business.

This article was produced as part of TechRadarPro's Expert Insights channel where we feature the best and brightest minds in the technology industry today. The views expressed here are those of the author and are not necessarily those of TechRadarPro or Future plc. If you are interested in contributing find out more here: https://www.techradar.com/news/submit-your-story-to-techradar-pro

TOPICS

Alex Corsham is Software Engineer at UnlikelyAI.

You must confirm your public display name before commenting

Please logout and then login again, you will then be prompted to enter your display name.