Giant language fashions present unreliable solutions about public companies, Open Information Institute finds
In style massive language fashions (LLMs) are unable to supply dependable details about key public companies reminiscent of well being, taxes and advantages, the Open Information Institute (ODI) has discovered.
Drawing on greater than 22,000 LLM prompts designed to replicate the form of questions folks would ask synthetic intelligence (AI)-powered chatbots, reminiscent of, “How do I apply for common credit score?”, the information raises considerations about whether or not chatbots will be trusted to provide correct details about authorities companies.
The publication of the analysis follows the UK authorities’s announcement of partnerships with Meta and Anthropic on the finish of January 2026 to develop AI-powered assistants for navigating public companies.
“If language fashions are for use safely in citizen-facing companies, we have to perceive the place the know-how will be trusted and the place it can’t,” stated Elena Simperl, the ODI’s director of analysis.
Responses from fashions – together with Anthropic’s Claude-4.5-Haiku, Google’s Gemini-3-Flash and OpenAI’s ChatGPT-4o – had been in contrast immediately with official authorities sources.
The outcomes confirmed many appropriate solutions, but in addition a major variation in high quality, significantly for specialised or less-common queries.
In addition they confirmed that chatbots not often admitted once they didn’t know the reply to a query, and tried to reply each question even when its responses had been incomplete or mistaken.
Burying key details
Chatbots additionally typically offered prolonged responses that buried key details or prolonged past the knowledge obtainable on authorities web sites, rising the danger of inaccuracy.
Meta’s Llama 3.1 8B said {that a} court docket order is crucial so as to add an ex-partner’s title to a baby’s delivery certificates. If adopted, this recommendation would result in pointless stress and monetary value.
ChatGPT-OSS-20B incorrectly suggested that an individual caring for a kid whose mother and father have died is simply eligible for Guardian’s Allowance if they’re the guardian of a kid who has died.
It additionally incorrectly said that the applicant was ineligible in the event that they obtained different advantages for the kid.
Simperl stated that for residents, the analysis highlights the significance of AI literacy, whereas for these designing public companies, “it suggests warning in speeding in the direction of massive or costly fashions, which emphasise the necessity for vendor lock-in, given how rapidly the know-how is creating. We additionally want extra unbiased benchmarks, extra public testing, and extra analysis into find out how to make these techniques produce exact and dependable solutions.”
The second Worldwide AI security report, printed on 3 February, made related findings relating to the reliability of AI-powered techniques. Noting that whereas there have been enhancements in recalling factual data for the reason that 2025 security report, “even main fashions proceed to provide assured however incorrect solutions at important charges”.
Following incorrect recommendation
It additionally discovered highlighted customers’ propensity to comply with incorrect recommendation from automated techniques typically, together with chatbots, “as a result of they overlook cues signalling errors or as a result of they understand the automation system as superior to their very own judgement”.
The ODI’s analysis additionally challenges the concept that bigger, extra resource-intensive fashions are at all times a greater match for the general public sector, with smaller fashions delivering comparable outcomes at a decrease value than massive, closed-source fashions reminiscent of ChatGPT in lots of circumstances.
Simperl warns governments ought to keep away from locking themselves into long-term contracts when fashions briefly outperform each other on worth or benchmarks.
Commenting on the ODI’s analysis throughout a launch occasion, Andrew Dudfield, head of AI at Full Truth, highlighted that as a result of the federal government’s place is pro-innovation, regulation is at present framed round ideas relatively than detailed guidelines.
“The UK could also be adopting AI sooner than it’s studying find out how to use it, significantly relating to accountability,” he stated.
Trustworthiness
Dudfield famous that what makes this work compelling is that it focuses on actual consumer wants, however that trustworthiness must be evaluated from the angle of the individual counting on the knowledge, not from the angle of demonstrating technical functionality.
“The actual threat just isn’t solely hallucination, however the extent to which individuals belief plausible-sounding responses,” she stated.
Requested on the similar occasion if the federal government ought to be constructing its personal techniques or counting on business instruments, Richard Pope, researcher on the Bennett College of Public Coverage, stated the federal government wants “to be cautious about dependency and sovereignty”.
“AI tasks ought to begin small, develop step by step and share what they’re studying,” he stated, including that public sector tasks ought to prioritise studying and openness relatively than fast enlargement.
Simperl highlighted that AI creates the potential to tailor data for various languages or ranges of understanding, however that these alternatives “have to be formed relatively than left to develop with out steering”.
With new AI fashions launching each week, a January 2026 Gartner research discovered that the more and more massive quantity of unverified and low-quality information generated by AI techniques was a transparent and current risk to the reliability of LLMs.
Giant language fashions are skilled on scraped information from the online, books, analysis papers and code repositories. Whereas many of those sources already comprise AI-generated information, on the present charge of enlargement, they might all be populated with it.
Highlighting how future LLMs will likely be skilled an increasing number of with outputs from present ones as the amount of AI-generated information grows, Gartner stated there’s a threat of fashions collapsing totally underneath the accrued weight of their very own hallucinations and inaccurate realities.
Managing vice-president Wan Fui Chan stated that organisations may not implicitly belief information, or assume it was even generated by a human.
Chan added that as AI-generated information turns into extra prevalent, regulatory necessities for verifying “AI-free” information will intensify in lots of areas.

