Stomachs gurgle. The sound of muscles in the digestive system moving. The human body doing its thing. Sometimes, if there’s a mic nearby, those burbles and gurgles get picked up.
AI audiobook narrators don’t have to worry about strange gastrointestinal noises, but Leah Allers and engineer Craig Hinkle aren’t bots. They’re human beings, recording for Nashville Audiobook Productions in mid-January, fretting about gurgles, discussing where to put the emphasis on the word “increase,” and tending to the detailed work of giving a “real” voice to a book about how couples communicate.
NAP’s studio is at The Rukkus Room in Nashville, Tennessee, the same place Taylor Swift recorded her seven-time platinum self-titled debut album. The smell of coffee permeates the waiting room. Hinkle is tuned in to every word coming out of Allers’ mouth, glancing from an iPad with the book’s text to a large monitor sitting on the soundboard in the studio.
“I want to get some more emotions in these questions,” Allers tells Hinkle before restarting a section of a chapter.
Audiobooks are booming. The market is expected to hit $33.5 billion by 2030, up from about $4.2 billion in 2021, according to Acumen Research and Consulting. Whether this is an offshoot of the rise in popularity of podcasts, a matter of listening convenience, or a byproduct of the pandemic, it hasn’t escaped the attention of tech companies and the inevitable creep of artificial intelligence.
In 2023, the excitement around AI’s potential is high, but so is anxiety about it stealing jobs from struggling creatives. ChatGPT can write anything from insurance pre-authorization letters to dating app bios, with varying degrees of success. AI platforms like Lensa AI and OpenAI’s Dall-E spit out AI-generated art, leaving many who earn a living creating digital art worrying about their future.
Tech companies including Apple and Google have been working on AI audiobook narration for a while now. In 2022, Google rolled out its services to publishers in six countries, including the US and Canada. Google’s AI narrators have names like Archie, who sounds British, and Santiago, who speaks Spanish. In early January, Apple introduced a stable of AI voices with names like Madison and Jackson, that authors and indie publishers selling their books on Apple Books can tap to read genres from nonfiction to romance.
The increasing presence of AI in audiobook narration has human narrators like Tanya Eby in various stages of stress.
“I don’t know if in five years, this will be my full-time gig anymore,” said Eby, a Grand Rapids, Michigan-based narrator who’s recorded more than 1,000 books in the last 21 years.
Narrators like Eby say their humanity is exactly what helps them do their jobs. Particularly with fiction, narrators make decisions about everything from a character’s voice to how to communicate nuance and emotion in a way that mirrors the story.
“If a character is sobbing after the death of their father, I have to convey those tears and gasps in her speech,” said Kathleen Li, an Austin, Texas-based narrator.
Narrators describe the intimacy of being a voice in a listener’s ear, and wonder if even the most lifelike AI will fall into the uncanny valley. The danger, they worry, is disrupting the experience.
AI voices can range from stilted to quite convincing. But even the most fluid can set off those uncanny valley tripwires with a delivery or pacing that sounds off.
“The whole thing about consuming media is we want to be enveloped in it,” said Jonathan Sleep, a narrator who lives outside Atlanta, Georgia.
Audiobook diehards might have a hard time understanding why anyone would opt for a synthetic voice over a human one. But for small publishers and authors, time and money can make a more powerful argument than the sanctity of a creative performance.
Audiobooks don’t make much money for the University of Michigan Press. The publisher puts out about 100 academic books a year — by scholars for scholars or students.
It could cost as much as $6,000 to hire a narrator for a book that may earn back only a few hundred. And that’s to say nothing of the intensive production process. It can take about six hours to produce one finished hour of an audiobook, according to ACX, Amazon’s Audiobook Creation Exchange.
“The reality is that unless you have a kind of a best-seller, the economics don’t work out,” said Charles Watkinson, director of the University of Michigan Press and associate university librarian for publishing at the University of Michigan Library. He’s also president of the Association of University Presses, a professional organization of publishers in the academic space.
For smaller authors and publishers, the time and cost of producing an audiobook may be out of reach. AI could change that.
About two years ago, Google approached the University of Michigan Press about participating in a pilot program. The press was able to use Google’s tool to create about 100 digitally produced audiobooks. There’s still a degree of human intervention required. Watkinson said some professors who’ve used Google will have students listen to the recording to check it against the text. Smaller presses still may have staffing issues, despite expediting the recording process with AI.
Watkinson said the University of Michigan was interested in how AI could potentially increase the accessibility of books that otherwise might not be available in audio form.
In the early days of the pilot, they reached out to about 900 authors with a sample of the narration, and the general response was that the AI narration was only a bit better than what a screen reader could offer someone who’s visually impaired. However, for those with vision issues who may not have screen readers or the like, perhaps AI could help fill a gap in access.
In other cases, listeners may just be happy to have a recorded book in any form. An intern of Watkinson’s would use audiobooks to keep studying in moments when she couldn’t have an open book in front of her, like on the bus or walking to class. She called it “interstitial listening.”
The rise of digital voices
In addition to big names like Apple and Google, there’s a burgeoning group of smaller companies getting into the AI voice space.
DeepZen is one of them. Founded in 2018 and inspired by the 2013 movie Her, about a man who falls in love with his AI virtual assistant, DeepZen built a natural language processing system that can take cues from text and that uses AI voices built from licensed human narrators, labeled pseudonymously.
One of the biggest challenges was creating a platform that wouldn’t flatly parrot text but instead infuse it with tone, said CEO and Co-founder Taylan Kamis.
It took a few years to get on the market, but now DeepZen lets clients upload a manuscript and, depending on their pricing plan, select an automated or managed service. Both come with levels of quality control, like a pronunciation check, but the managed option features a proofing check by human editors and two rounds of corrections.
The automated service will run a customer $69 per finished hour versus $129 for the managed option. DeepZen has produced almost 3,000 books so far, both fiction and nonfiction.
On its website, you can listen to samples of 10 voices, with names like Todd, Dahlia and Alice.
Somewhere in the world, Todd, Dahlia and Alice are real people. Kamis thinks voice licensing could be a way for narrators to co-exist with AI in narration.
“That narrator will be making money in his or her sleep and his voice will be earning royalties in Japan [or] China or South Africa,” he said.
DeepZen is also working on a way to get AI voices to speak other languages, to increase market reach.
And never mind overcoming the challenges of speaking only one language — death doesn’t even have to get in the way. DeepZen approached the family of noted voice actor and narrator Edward Hermann, who died in 2014, about licensing his voice. They signed on. In a sense Hermann is still working, posthumously.
Kamis isn’t the only one who thinks there’s a way for AI and humans to get along in voice narration.
Watkinson, from the University of Michigan, wants to use AI as a way to test which books would be worth hiring a human to record. If one is selling particularly well, the success could justify the cost. He’s a fan of audiobooks himself.
“This is an on-ramp for us to get human narrators,” he said.
Not everyone is optimistic. Some in the industry worry there will be fewer jobs for narrators who aren’t famous or don’t have followings of their own.
“All those mid-tier, really solid narrators … do an excellent job and it’s their livelihood — but they’re not necessarily going to be a draw,” said Andrea Fleck-Nisbet, CEO of the Independent Book Publishers Association.
After two decades in the business, Eby said she’s wondering what happens if she eventually can’t find the work to narrate full-time.
“What skills do I have that are competitive? And how would I go into an office, and what would I offer?” she asked.
Narrator Jonathan Sleep said he knows he’s got homework to do — and he’s getting extra eagle-eyed about the contracts he signs, and what rights he’s handing over regarding his voice.
Others, like narrator Andy Garcia-Ruse, want to play to their strengths: “All we could do is make them fall in love with our performances and continue to work.”
Some authors refuse to use a digital voice.
“I feel like the purpose of fiction is to evoke the emotions of the reader or the listener, and fiction is about what it means to be human. And a machine can’t replicate that,” said author Elizabeth Bell.
Author Chris Stokel-Walker used Google to narrate his 2021 nonfiction book TikTok Boom, about the popular video app, and wrote about the result in Inverse.
“What came back was an audiobook that, while lacking some of the emotion and drama you’d hope for, sounded decent,” Stokel-Walker wrote.
Still, plenty of questions remain. In a world where people already hear digital voices like Siri and Alexa every day, will humans stop caring if a digital voice doesn’t sound perfectly human? For Fleck-Nisbet, AI narration is only one of many questions the publishing industry will face. There are other uncertainties about AI and copyright or intellectual property.
In other words, this is only the beginning.
None of this is to say narrators will be in the unemployment line next week.
John Behrens, who owns Nashville Audiobook Productions, has worked with two AI-generated books in the last few years, essentially providing quality control. The AI still ran into issues. It couldn’t pronounce Bible verses, and struggled with rhetorical questions in the text.
A bad audiobook might produce 50 to 100 entries for issues that need to be fixed, Behrens said. The AI produced hundreds. That leads him to believe human narrators aren’t going anywhere — for a while at least. He advises against panicking.
“If you’re going to live in fear… why would you keep investing in this career if you think it’s going to dry up?” he said.
Back at the Rukkus Room, Allers and Hinkle take a break to chat about the robots.
It’s Allers’ first time narrating an audiobook, though she’s done plenty of voice-over work and dubbing, including for Netflix.
Hinkle is unimpressed by AI.
“A robot reading a book,” he said. “I still think it’s going to take a long time before it sounds natural and gifted.”
Just don’t tell Madison and Jackson.
Editors’ note: CNET is using an AI engine to create some personal finance explainers that are edited and fact-checked by our editors. For more, see this post.