New OpenAI GPT-4 service will help spot errors in ChatGPT coding suggestions

In a bid to increase the usefulness of generative AI tools to developers, OpenAI has introduced CriticGPT, a new model it says can help identify errors in ChatGPT code outputs.

Based on GPT-4, OpenAI claims CriticGPT has been able to outperform unaided efforts 60% of the time, showing its ability to enhance human performance in code review tasks, rather than replace human workers.

OpenAI’s initiative aims to refine the ‘Reinforcement Learning from Human Feedback’ (RLHF) process in order to ensure higher quality and greater reliability in AI systems.

OpenAI launches new code-checking model

OpenAI’s latest GPT-4 series, which powers publicly available versions of ChatGPT, relies heavily on RLHF to ensure that its outputs are both reliable and interactive. Up until now, this process has been a manual one that has leaned on the human power of AI trainers, who have rated ChatGPT responses to improve the model’s performance.

With the launch of CriticGPT, OpenAI can now critique ChatGPT’s answers autonomously, which addresses concerns over the AI chatbot becoming too sophisticated for many human trainers.

CriticGPT was trained by trainers providing feedback after inserting intentional mistakes into ChatGPT-generated code. The results were promising, with CriticGPT’s critiques preferred by trainers around two-thirds (63%) of the time thanks to the tool’s ability to reduce nitpicks and hallucinations.

However, the project isn’t without its limitations, and AI-human collaboration continues to prove more effective compared to AI alone.

In its announcement, OpenAI summarized: “CriticGPT’s suggestions are not always correct, but we find that they can help trainers to catch many more problems with model-written answers than they would without AI help.”

The company also acknowledged that “mistakes can be spread across many parts of an answer,” which makes it more complex for an AI tool to identify the cause.

Looking ahead, OpenAI has confirmed plans to scale its work on CriticGPT and to put it into practice.

