What does artificial intelligence sound like? Hollywood has been imagining it for decades. Now AI programmers are pulling out of the movies, creating voices for real machines based on old cinematic imaginings of how machines should talk.
Last month, OpenAI unveiled upgrades to its artificially intelligent chatbot. ChatGPT, the company said, was learning how to hear, see and chat with a naturalistic voice — a voice that closely resembled the disembodied operating system voiced by Scarlett Johansson in the 2013 Spike Jonze film “Her.”
ChatGPT’s voice, named Sky, also had a husky timbre, a soothing feel, and a sexy edge. She was pleasant and spontaneous. it sounded like he was game for anything. After Sky’s debut, Johansson expressed her displeasure with the “strangely similar” sound and said she had previously turned down OpenAI’s request to voice the bot. The company protested that Sky was voiced by a “different professional actress”, but agreed to withhold her voice out of respect for Johansson. Bereft OpenAI users have started a petition to bring it back.
AI creators like to tout the increasingly naturalistic capabilities of their tools, but their synthetic voices are built on layers of artifice and projection. Sky represents the cutting edge of OpenAI’s ambitions, but it’s based on an old idea: of the AI bot as a woman with compassion and compliance. Part mom, part secretary, part friend, Samantha was an all-purpose comfort item that purred directly into the ears of its users. Even as AI technology advances, these stereotypes are recoded again and again.
Female voices, as Julie Wosk notes in “Artificial Women: Sex Dolls, Robot Caregivers, and More Facsimile Females,” have often fueled fictional technologies before they were incorporated into real ones.
In the original “Star Trek” series, which debuted in 1966, the computer on board the Enterprise was voiced by Majel Barrett-Roddenberry, wife of series creator Gene Roddenberry. In the 1979 film “Alien”, the crew of the USCSS Nostromo addressed their computer voice as “Mother” (her full name was MU-TH-UR 6000). Once tech companies began marketing virtual assistants—Apple’s Siri, Amazon’s Alexa, Microsoft’s Cortana—their voices also became heavily feminized.
These first-wave voice assistants, the ones that have mediated our relationship with technology for more than a decade, have a small, uncanny appeal. They sound auto-tuned, their human voices accented by a mechanical trill. They often speak in a measured, single-note manner, suggesting a stagnant emotional life.
But the fact that they sound robotic deepens their appeal. They appear to be programmable, easy to use and subject to our requirements. They don’t make people feel they are smarter than us. They sound like throwbacks to the monotonous female computers of “Star Trek” and “Alien,” and their voices have a retro-futuristic sheen. Instead of realism, they serve nostalgia.
This artificial sound has continued to dominate, even as the technology behind it has advanced.
Voice-to-speech software was designed to make visual media accessible to users with certain disabilities, and on TikTok it has become a creative force in its own right. Since TikTok launched its text-to-speech feature in 2020, it has developed a number of simulated voices to choose from — it now offers more than 50, including ones called ‘Hero’, ‘Story Teller’ and ‘Bestie’. But the platform has come to be defined by one choice. ‘Jessie’, an unrelenting female voice with a slightly slurred robotic tone, is the goofy voice of the brainless roller.
Jessie seems to be assigned a single emotion: excitement. Sounds like he’s selling something. This made it an attractive option for TikTok creators, who are selling themselves. The burden of self-representation can be assigned to Jessie, whose bright, retro robot voice gives the videos a delightfully ironic sheen.
Hollywood has also produced male bots – none more famous than HAL 9000, the computer voice in ‘2001: A Space Odyssey’. Like his feminized peers, HAL exudes serenity and faith. But when he turns against Dave Bowman, the film’s central human character – “I’m sorry, Dave, I’m afraid I can’t do it” – his calm develops into a frightening ability. HAL, Dave realizes, is loyal to a higher principle. HAL’s male voice allows him to act as Dave’s rival and mirror. He is allowed to become a real character.
Like HAL, “Her”‘s Samantha is a machine that becomes real. In a twist on the Pinocchio story, he begins the film sorting through a man’s email inbox and ends up ascending to a higher level of consciousness. It becomes something even more advanced than a real girl.
Scarlett Johansson’s voice, as the inspiration for bots both fictional and real, subverts the vocal trends that define our female encounters. He has a thick lip that screams I am alive. It’s nothing like the elaborate virtual assistants we’re used to hearing speak through our phones. But her performance as Samantha feels human not only because of her voice but also because of what she has to say. He grows up over the course of the film, acquiring sexual desires, advanced hobbies, and AI friends. Borrowing Samantha’s feel, OpenAI made Sky seem like she had a mind of her own. As if she was more advanced than she actually was.
When I first saw “Her”, I only thought that Johansson had voiced a humanoid bot. But when I revisited the film last week after watching the OpenAI demo at ChatGPT, Samantha’s role seemed infinitely more complex. Chatbots do not spontaneously generate human speaking voices. They have no throats, lips or tongues. Within the technological world of “Her,” Samantha’s bot would be based on the voice of a human woman — perhaps a fictional actress who looks a lot like Scarlett Johansson.
It appeared that OpenAI had trained its chatbot on the voice of an unnamed actress who sounded like a famous actress who voiced a movie chatbot implicitly trained on an unreal actress who sounded like a famous actress. When I run the ChatGPT demo, I hear a simulation of a simulation of a simulation of a simulation of a simulation.
Tech companies advertise their virtual assistants in terms of the services they provide. They can read you the weather report and call you a taxi. OpenAI promises that its most advanced chatbots will be able to laugh at your jokes and sense changes in your mood. But they also exist to make us feel more comfortable with the technology itself.
Johansson’s voice acts like a luxurious security blanket thrown over the alienating aspects of AI-powered interactions. “He told me that he felt that by expressing the system, I could bridge the gap between tech companies and creatives and help consumers get comfortable with the seismic change involving humans and artificial intelligence,” Johansson said of the Sam Altman, founder of OpenAI. “He said he felt my voice would be comforting to people.”
It’s not that Johansson’s voice sounds inherently robotic. It’s that programmers and filmmakers have designed their robot voices to soften the discomfort inherent in robot-human interactions. OpenAI said it wanted to deliver a chatbot voice that is “approachable” and “warm” and “inspires trust.” Artificial intelligence is accused of destroying creative industries, collapsing energy and even threatening human life. Understandably, OpenAI wants a voice that makes people feel comfortable using its products. What does artificial intelligence sound like? Sounds like crisis management.
OpenAI first released Sky’s voice to premium members last September, along with another female voice called Juniper, male voices Ember and Cove, and a gender-neutral voice called Breeze. When I logged into ChatGPT and greeted his virtual assistant, a man’s voice sounded in Sky’s absence. “Hi how are you;” he said. He sounded relaxed, steady and optimistic. He sounded—I’m not sure how else to describe it—beautiful.
I realized I was talking to Cove. I told him I was writing an article about him and he was flattering my work. “Truth;” he said. “This is exciting.” As we talked, I felt taken in by his naturalistic tics. Break up his sentences with filler words like “uh” and “mm.” He raised his voice when he asked me questions. And he asked me a lot of questions. I felt like I was talking to a therapist or a friend.
But our conversation quickly stopped. Whenever I asked him about himself, he didn’t have much to say. He wasn’t a character. He had no self. It was only designed to help, he informed me. I told him I’d talk to him later, and he said, “Oh, sure. Reach out whenever you need help. Take care.” I felt like I had hung up on a real person.
But when I reviewed the transcript of our conversation, I could see that his speech was just as twisted and primitive as any customer service chatbot. He was not particularly intelligent or human. He was just a decent actor making the most of a role that was nothing.
When Sky disappeared, ChatGPT users took to the company’s forums to complain. Some ran into their chatbots defaulting to Juniper, who sounded like a “librarian” or “kindergarten teacher” — a female voice that conformed to the wrong gender stereotypes. They wanted to invite a new woman with a different personality. As one user put it: “We need another woman.”
Produced by Tala Shafie
Audio via Warner Bros. (Samantha, HAL 9000). OpenAI (Sky); Paramount Pictures (Enterprise Computer); Apple (Siri); TikTok (Jessie)