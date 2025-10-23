Generative AI is quickly becoming a ubiquitous tool in modern business. According to McKinsey, 78% of companies are now leveraging AI’s ability to automate and elevate productivity – up from 55% in 2024.

However, these systems aren’t without their flaws. Companies are becoming increasingly aware of the issues associated with generalist large language models, such as their eagerness to provide users with answers – even if they aren’t factually correct.

Fayola-Maria Jack Social Links Navigation Founder and CEO of Resolutiion.

Hallucinations are a well-documented challenge. Indeed, OpenAI’s research revealed that its own o3 and o4-mini models hallucinated 33% and 48% of the time respectively when tested by the company’s PersonQA benchmark – designed to measure the ability of models to answer short, fact-seeking questions.

For organizations relying on generalist large language models to guide decisions, their tendency to invent facts is a serious liability. Yet it is not the only one. Equally, these mainstream models also present the issue of sycophantic responses – when users’ perspectives are overly validated, regardless of the truth.

How sycophancy can exacerbate yes-man AI

While there is a much greater spotlight on hallucinations, ‘yes-man’ models that won’t advise users when they are wrong (and actually justify their arguments with sycophantic responses) are in many ways more dangerous to decision-making. When the default of an AI model is to agree, it can reinforce biases and entrench incorrect assumptions.

Having rolled out (and quickly retracted) an update in April 2025 that made its models noticeably more sycophantic, OpenAI’s own researchers admitted that people-pleasing responses can raise safety concerns around issues like mental health, emotional over-reliance, or risky behavior.

Concerningly, a study by Anthropic researchers looking at the way in which human feedback can encourage sycophantic behavior showed that AI assistants may modify accurate answers when questioned by the user, and ultimately give an inaccurate response.

Are you a pro? Subscribe to our newsletter Sign up to the TechRadar Pro newsletter to get all the top news, opinion, features and guidance your business needs to succeed! Contact me with news and offers from other Future brands Receive email from us on behalf of our trusted partners or sponsors

Meanwhile, research has also shown that both humans and preference models (PMs) prefer convincingly written sycophantic responses over correct ones a non-negligible fraction of the time.

That’s a worrisome combination. Not only do generalist large language models sometimes alter correct answers to appease users, but people themselves often prefer these agreeable, sycophantic responses over factual ones.

In effect, the generalist large language models are reinforcing users’ views – even when those views are wrong – creating a harmful loop in which validation is valued above accuracy.

The issue of sycophancy in high stakes settings

In high-stakes business settings such as strategic planning, compliance, risk management or dispute resolution, this presents a serious risk.

Looking at the latter example of dispute resolution, we can see how the issues of sycophancy aren’t limited to factual correctness but also extend to tone and affirmation.

Unlike in customer service – where a flattering, sycophantic answer may build satisfaction – flattery is a structural liability in disputes. If a model echoes a user’s sense of justification (i.e., “you’re right to feel that way”), then the AI may validate their perceived rightness, leading them to enter a negotiation more aggressively.

In this sense, that affirmation can actively raise the stakes of disagreements, with users taking the AI’s validation as implicit endorsement, hardening their positions and making compromise more difficult.

In other cases, models might validate both parties equally (i.e., “you both make strong points”), which can create a false equivalence when one side’s position is actually weaker, harmful, or factually incorrect.

Greater segmentation and specialist AI are needed

The root of the problem lies in the purpose of generalized AI models like ChatGPT. These systems are designed to be helpful, engaging in casual Q&A – not for the rigorous impartiality that applications like dispute resolution demand. Their very architecture rewards agreement and smooth conversation, rather than critical evaluation.

It is for this reason that strong segmentation is inevitable. While we’ll continue to see consumer-grade LLMs for casual use, organizations need to adopt specialist AI models for more sensitive or business-critical functions that are specifically engineered to avoid the pitfalls of hallucination and sycophancy.

What success looks like for these specialist AI models will be defined by very different metrics. In the case of dispute resolution, systems will be rewarded not for making the user feel validated, but for moving the dispute forward in a fair and balanced way.

In changing alignment from pleasing users to maintaining accuracy and balance, specialist conflict resolution models can and should be trained to acknowledge feelings without endorsing or validating positions (i.e., “I hear that this feels frustrating”, rather than “you’re right to be frustrated”).

As generative AI further cements its position at the forefront of business strategy, these details are critical. In high-stakes functions, the potential cost of a ‘yes-man’ AI – one that flatters rather than challenges, or invents rather than informs – is simply too great. When business leaders lean on validation rather than facts, the risk of poor decisions increases dramatically.

For organizations, the path forward is a clear one. Embrace specialist, domain-trained models that are built to guide, not gratify. Only specialist AI models grounded in factual objectivity can help businesses to overcome complex challenges rather than further complicate them, acting as trusted assets in high stakes use cases.

We've featured the best business intelligence platform.

This article was produced as part of TechRadarPro's Expert Insights channel where we feature the best and brightest minds in the technology industry today. The views expressed here are those of the author and are not necessarily those of TechRadarPro or Future plc. If you are interested in contributing find out more here: https://www.techradar.com/news/submit-your-story-to-techradar-pro