Technology

Interview: Utilizing AI brokers as judges in GenAI workflows


Round 40 years in the past, a financial institution department supervisor in all probability knew the title of each buyer and was in a position to provide personalised recommendation and steering. However as Ranil Boteju, chief knowledge and analytics officer at Lloyds Banking Group, factors out, in right now’s world, that mannequin can’t scale.

“On the planet of monetary planning, most individuals within the UK can’t afford to see a monetary planner,” he says.

There may be additionally an inadequate variety of skilled monetary advisers to assist everybody looking for recommendation, which is why monetary establishments are how they will deploy generative synthetic intelligence (GenAI) to help clients straight.

However the massive language fashions (LLMs) and GenAI from hyperscalers are somewhat like black containers and may ship incorrect responses, often known as hallucinations in AI phrases. None of these items are acceptable in a sector regulated by the Monetary Conduct Authority (FCA).

What excites Boteju is the flexibility to scale the 40-year-old mannequin of a financial institution supervisor to satisfy present demand by utilizing synthetic intelligence in a method that gives the financial institution with confidence that the AI is ready to perceive what folks want and provides them the appropriate steering in a method that may be assessed and meets FCA tips.



“It will be a terrific ‘unlock’ for the UK when it comes to giving entry to high-quality monetary steering to a much wider and bigger set of the inhabitants,” he says.

As Boteju notes, banks have been utilizing AI for a few years. “We’ve been utilizing all types of machine studying algorithms for issues like credit score threat assessments and fraud screening for greater than 15 years,” he says. “We’ve additionally been utilizing chatbots for a minimum of 10 years.”

As such, AI is a really well-used functionality in monetary companies. What’s new, nevertheless, is generative AI and agentic AI. “Generative AI burst on the scene in late 2022 with ChatGPT. It’s been about for nearly two-and-a-half years now,” says Boteju.

Whereas banks have expertise with AI, they’ve wanted to determine use generative AI and enormous language fashions. Talking of his personal expertise, Boteju says: “We take into consideration issues like mannequin efficiency and whether or not we’re utilizing the appropriate algorithm.”

There may be additionally transparency, ethics, guardrails and the way the AI fashions are deployed. Boteju says: “These are widespread each to massive language fashions and conventional AI. However generative AI has particular challenges in monetary companies as a result of we’re a regulated trade.”

Since generative AI can typically result in hallucinations, he says banks must be very cautious about how they expose massive motion fashions on to clients. “We put a whole lot of effort into guaranteeing that the outputs of the massive language fashions are right, correct and clear, and there’s no bias.”

In a regulated trade, it’s vital to make sure the AI fashions are usually not hallucinating. “That’s in all probability one of many key issues we must be actually cognisant of,” he says.

The necessity for specialist AI fashions

As Boteju notes, a mannequin like Google Gemini is skilled on every part. “When you ask it a query, the output will likely be based mostly on its data of every part. It’s been skilled on heaps and many knowledge.”

Not all of this knowledge is related to monetary companies, nevertheless. By proscribing the AI mannequin to knowledge particular to monetary companies, the mannequin ought to, in concept, hallucinate much less.

“We felt fairly strongly that we needed to make use of a language mannequin or a gaggle of fashions that have been particularly skilled on monetary companies knowledge related to the UK,” says Boteju.

This led to Lloyds Banking Group approaching Scottish startup Aveni to help the event of FinLLM, a monetary services-specific massive language mannequin. In 2024, the corporate secured £11m of funding from Puma Personal Fairness, with participation from Lloyds and Nationwide.

Discussing the work with Aveni, Boteju says Lloyds Banking Group didn’t wish to be tied to at least one particular mannequin, so it determined to take an open method to basis fashions. From an AI sovereignty perspective, he says: “We don’t wish to be restricted to the massive hyperscale fashions. There’s a improbable ecosystem of open supply fashions that we wish to encourage, and the truth that we may create a FinLLM that’s UK-centric within the UK is one thing we discovered very interesting.”

The financial institution has been testing FinLLM in its audit group, the place an audit chatbot digital assistant developed by Group Audit & Conduct Investigations (GA&CI) at Lloyds Banking Group is reworking how auditors entry and work together with audit intelligence. The chatbot integrates generative AI with the group’s inside documentation system, Atlas, making info retrieval sooner, smarter and extra intuitive.

Boteju says the financial institution successfully skilled the chatbot utilizing FinLLM and its data of audits, based mostly on all of the audit knowledge it has collected.

He describes the method Lloyds Banking Group has taken to scale back errors as “agent as a decide”. “You will have a particular mannequin or agent that comes up with a particular consequence,” he says. “Then we’ll develop totally different fashions and totally different brokers that evaluation these outcomes and successfully rating them.”

The financial institution has been working carefully with Aveni to develop the method of utilizing AI brokers as judges to evaluate the output of different AI fashions.

Every consequence is independently assessed by a set of various fashions. The evaluation of the outputs from the AI fashions allows Lloyds to make sure they’re aligned with FCA tips in addition to the financial institution’s inside laws.

Checking the outputs of AI fashions is a very good option to double-check that the client will not be being given dangerous recommendation, in keeping with Boteju, who provides: “We’re within the means of refining these guardrails, and it’s crucial that we now have [this process] in place.”

Boteju factors out that having a human within the loop will stay essential whatever the “agent as a decide” method. “There may be nonetheless very a lot a spot for people within the loop sooner or later,” he says.

The facility of various AI fashions in agentic AI

Whereas an AI mannequin like FinLLM has been tuned to grasp the ins and outs of banking, Boteju says different fashions are a lot better at understanding human behaviour. This implies the financial institution may, as an example, use one of many AI fashions from a hyperscaler, corresponding to ChatGPT 5 or Google Gemini, to grasp what the client is definitely saying.

“We might then use totally different fashions to interrupt down what they’re saying into element elements,” he says. Totally different fashions are then tasked with tackling every distinct a part of the client question. “The best way we take into consideration that is that there are totally different fashions with totally different strengths, and what we wish to do is to make use of the perfect mannequin for every job.”

This method is how the financial institution sees agentic AI being deployed. With agentic AI, says Boteju, issues are damaged down into smaller and smaller elements, the place totally different brokers reply to every half. Right here, having an agent as a decide is nearly like a second-line colleague appearing as an observer.