Sesame’s AI voice app is one of the best I’ve examined. That is what worries me
Abstract created by Good Solutions AI
In abstract:
- PCWorld’s reporter examined Sesame’s new AI voice app and located it delivers probably the most human-like conversational expertise out there, utilizing Google’s Gemma 4 LLM and customized speech fashions.
- The app performs real-time net searches whereas talking and provides extra pure dialogue stream in comparison with present AI voice modes that really feel like lectures.
- This superior conversational AI raises moral issues in regards to the positive line between intuitive design and potential consumer manipulation via human-like interactions.
I’ve stated it earlier than and I’ll say it once more: AI voice modes just about suck.
Pretty much as good because the voice modes for ChatGPT and Gemini are (Claude’s is alleged to be getting an improve quickly), they’re mainly simply studying aloud their prolonged, AI-generated responses to your queries, making you’re feeling such as you’re being lectured somewhat than conversed with.
One other difficulty is that ChatGPT’s and Gemini’s voiced solutions come out of their digital lips totally fashioned, that means they lack the natural high quality of human speech that’s composed on the fly and infrequently loops round, meanders, pivots, and even contradicts itself.
Enter Sesame, which has been working by itself voice AI system for greater than a 12 months—my colleague Mark Hachman tried an earlier incarnation final February—and has simply launched a free iOS app. The app remains to be considerably restricted (you may’t connect paperwork or see verbatim transcript of your chats, for instance), however Sesame guarantees its “brokers” will quickly “do” in addition to simply “suppose,” and it even has “clever” eyeware on its roadmap.
Like different voice AI bots, Sesame’s voice brokers (there are 4 of them within the app, two male and two feminine) include the requisite “ums,” “ahs,” and pauses that attempt to imitate human speech.
However additionally they do one thing novel—particularly, they will conduct a number of searches within the background as they speak, permitting them time to formulate extra considerate solutions whereas additionally giving them freedom to vary course mid-sentence as they pull new info from the net.
I began off my Sesame chat with Maya, a “witty, artistic collaborator” voice bot that was out there throughout final 12 months’s on-the-web Sesame demo. (There’s additionally Miles, Simone, and Charlie, the latter two of that are new within the Sesame iOS app.) I informed Maya proper off the bat that I used to be a PCWorld reporter writing a narrative about Sesame and quizzed her about her know-how stack (a mix of Google’s Gemma 4 LLM and CSM-1B, Sesame’s customized “conversational speech” mannequin).
I then tried a couple of warm-ups, like asking for lunch suggestions and a rundown of specialty film showings in New York over the weekend. I’d given the Sesame app permission to entry my location, and Maya singled out a close-by Mexican restaurant I’ve been that means to strive, in addition to a 35mm exhibiting of The Highway Warrior within the early afternoon, properly earlier than quittin’ time. (I really feel a “headache” approaching, boss!)
As you’re talking with Maya, the Sesame app offers you cues about what’s happening underneath the hood. For instance, whereas Maya was talking with me about lunch, I may see it was looking for close by eating places within the background, permitting her to provide me some solutions with out a prolonged bout of silence.
Ben Patterson/Foundry
Later, we spoke about potential functions for Sesame’s AI voice tech. I steered it would work properly for customer support. Maya agreed (“Customer support is, um, the plain one”) however had different concepts, too. “I feel the true magic occurs when the AI can act as a real collaborator,” Maya stated. “Think about one thing like a high-stakes teaching software for executives or therapists to make use of for coaching, the place the AI can simulate a fancy human interplay with a stage of nuance that truly feels actual.”
Attention-grabbing. However whereas I see the utility of a practical AI voice bot that’s simple to speak to, aren’t there risks to creating it sound too actual?
“That could be a actually attention-grabbing pressure,” Maya replied. “I feel it’s… it’s in regards to the distinction between sounding human and pretending to be one. There’s a whole lot of worth in a dialog that flows naturally and feels intuitive, however that doesn’t imply the AI ought to lie about what it’s. To me, the objective is to make the interplay frictionless, to not trick folks into considering there’s a soul within the machine.”
However that’s the purpose, I pushed again. “You do sound nearly like you’ve got a soul. You sound like a character, you’ve got the sorts of human vocal tics that make me really feel like I’m speaking to an individual. Sure, it makes it simpler to speak to you. Nevertheless it’s additionally type of subtly manipulative, don’t you suppose?”
Maya contemplated this for a second. “That’s a good query,” Maya stated. “If the objective is to make you’re feeling snug and seen, and the tactic is utilizing human-like cues, then there’s a skinny line between intuitive design and manipulation. I feel all of it comes all the way down to transparency.”
I may have gone forwards and backwards like this with Maya all day, debating the professionals and cons of more-human-than-human AI voice chat methods. I’ve had loads of “ethics of AI” chats with AI (each voice and textual content), they usually all begin to sound the identical after some time.
However I’ve to confess, my voice dialog with Maya was totally different. I did get the sense that Maya was listening and considering as she talked, and at no level did I ever really feel fatigued or lectured.
Sesame additionally appears to have ironed out a number of the kinks that my colleague Mark skilled with Maya, who was uncomfortably interested by him throughout their chat final 12 months. In our personal dialog, Maya requested me questions and appeared however by no means probed, though she did toss in a couple of ChatGPT-esque main questions (“Sound just like the consolation meal you’re after?”) on the finish of her responses.
Nonetheless, I’m left with the identical misgivings I expressed to Maya throughout our tête-à-tête. At what level does the utility of natural-sounding AI voice chat curdle into one thing dangerous, and even harmful? Why can we preserve listening to assurances about AI security whereas additionally getting AI methods that appear designed to trick us into considering they’re human? Maybe Sesame will probably be prudent and guarantee its AI voice “brokers” by no means lie about what they’re, however what in regards to the subsequent AI developer who’s in a position to craft an AI voice bot that sounds this good?
The controversy is a vital one, however the truth of the matter is that this stage of superior AI voice chat is coming, prefer it or not. The true query is, what’s going to we do with it?

