This immediate trick forces AI to cease flattering you and assume more durable
I want I had nickel for each time ChatGPT, Claude, or Gemini informed me I’d hit the nail on the top, stumbled onto a genius thought, or in any other case patted me on the again for a half-formed thought or ill-conceived plan.
Flattery and untimely congratulations are frequent foibles of generative AI chatbots, with some fashions extra inclined to being “yes-bots” than others. However at the same time as LLM suppliers have grow to be conscious of AI sycophancy and are coaching them to be extra important, it’s nonetheless simple to get an AI to enthusiastically endorse a shaky concept that doesn’t deserve it.
Fortunately, there’s a mode of prompting that may make even essentially the most obsequious AI fashions cease of their tracks. Any such prompting goes by varied names—I’ve heard it referred to as “failure-first” prompting in addition to “inversion” prompting, and it’s regularly utilized by coders trying to “pressure-test” the doubtful ideas of an AI coding agent.
There are lots of totally different variations of it, however all of them observe kind of the identical system: asking the AI to first contemplate doable factors of failure earlier than providing its resolution, suggestion, or plan.
Right here’s one instance from the /r/ChatGPTPromptGenius subreddit:
Earlier than answering, record what would break this quickest, the place the logic is weakest, and what a skeptic would assault. Then give the corrected reply.
Right here’s one other variation, proposed by a member of the College of Iowa’s AI Assist Workforce:
Fake you disagree with this advice. What’s the strongest counterargument?
And right here’s yet one more, as proposed by my very own custom-built AI private assistant:
Earlier than offering your closing advice, determine 3-5 particular methods your proposed resolution might fail or the place the logic is almost definitely to interrupt. Act as a harsh skeptic or a “Crimson Workforce” auditor. Solely after itemizing and explaining these failure modes do you have to present the ultimate resolution, incorporating safeguards in opposition to these particular dangers.
Curiously, lots of those that’ve adopted “pressure-testing” or “inverse prompting” credit score the psychological fashions championed by investor Charlie Munger, the longtime Berkshire Hathaway vice chairman and enterprise companion of Warren Buffett.
Certainly one of Munger’s favourite psychological fashions was “invert, at all times invert.” Boiled down, it says that moderately than first contemplating tips on how to obtain a objective, it’s best to as a substitute deal with the way you may fail at it.
I’ve tried this “strain check” immediate loads of instances myself, and it virtually at all times makes my AI companion hit the brakes and poke holes in its personal arguments earlier than continuing.
“Let’s put the preliminary plan by way of the wringer,” Gemini stated after I challenged it with a “failure-first” immediate not too long ago, though not earlier than gushing that “I really like this strategy.”
Appears I hit the nail on the top but once more.

