OpenAI’s latest AI fashions hallucinate far more, for causes unknown
Final week, OpenAI launched its new o3 and o4-mini reasoning fashions, which carry out considerably higher than their o1 and o3-mini predecessors and have new capabilities like “pondering with photographs” and agentically combining AI instruments for extra complicated outcomes.
Nonetheless, in line with OpenAI’s inner assessments, these new o3 and o4-mini reasoning fashions additionally hallucinate considerably extra usually than earlier AI fashions, stories TechCrunch. That is uncommon as newer fashions are likely to hallucinate much less because the underlying AI tech improves.
Within the realm of LLMs and reasoning AIs, a “hallucination” happens when the mannequin makes up data that sounds convincing however has no bearing in reality. In different phrases, while you ask inquiries to ChatGPT, it could reply with a solution that’s patently false or incorrect.
OpenAI’s in-house benchmark PersonQA—which is used to measure the factual accuracy of its AI fashions when speaking about folks—discovered that o3 hallucinated in 33 p.c of responses whereas o4-mini did even worse at 48 p.c. By comparability, the older o1 and o3-mini fashions hallucinated 16 p.c and 14.8 p.c, respectively.
As of now, OpenAI says they don’t know why hallucinations have elevated within the newer reasoning fashions. Hallucinations could also be fantastic for artistic endeavors, however they undermine the credibility of AI assistants like ChatGPT when used for duties the place accuracy is paramount. In a press release to TechCrunch, an OpenAI rep stated that the corporate is “frequently working to enhance [their models’] accuracy and reliability.”
This text initially appeared on our sister publication PC för Alla and was translated and localized from Swedish.