Episode 314 - Harnessing AI for Affordable Audio with Phil Marshall
Are you getting value from the podcast? Consider supporting me on Patreon or through Buy Me a Coffee!
Amazon Music | Apple Podcasts | Spotify | Overcast | Castbox | Pocket Casts | Podbean | Player FM | TuneIn | YouTube
Let me know your thoughts by leaving a comment on YouTube!
Phil Marshall discusses HARNESSING AI FOR AFFORDABLE AUDIO, including how AI narration is transforming audiobook creation for indie authors, how tools like Spoken make professional-quality audio faster and more affordable, and how authors can customize voices, experiment with multi-voice storytelling, and bring their books to new audiences through AI-powered audio production.
Phil Marshall is a technologist, entrepreneur, and storyteller who thrives at the intersection of imagination and execution. He is the founder and CEO of Spoken, a platform transforming how authors and readers connect through AI-powered audio storytelling. Spoken empowers authors to create immersive single, dual, and multi-voice audiobooks.
Episode Links
https://www.youtube.com/@Spoken-Press
Referenced in interview:
Episode 307 - Crafting Stories for the Ear with Cindy Gunderson
Audio of two of Matty’s Writer’s Digest article using her Spoken-generated voice clone: https://www.theindyauthor.com/publications
Summary
In this episode of The Indy Author Podcast, Matty Dalrymple talks with Phil Marshall about harnessing AI for affordable audio, exploring how artificial intelligence is transforming audiobook creation for authors. Phil is the founder and CEO of Spoken, an AI-powered platform designed to help authors bring their stories to life through high-quality, cost-effective audio narration.
ORIGIN STORY AND MOTIVATION
Phil explains that his journey to founding Spoken began with writing. Although trained as a surgeon, he spent over 25 years in technology, including leading a conversational AI company in healthcare. After selling that company in 2021, he turned his attention to finishing his debut science fiction thriller, TAMING THE PERILOUS SKIES, which launched in September. As an “audio-only reader,” Phil was naturally drawn to audio storytelling and began exploring AI text-to-speech tools.
During a speculative fiction workshop at Taos Toolbox with Nancy Kress and Walter Jon Williams, Phil wrote a short story filled with dialogue and multiple accents. When he tried to create an AI-narrated version, he found that “it was horrible—not quality-wise necessarily, but the tools weren’t there, the workflow wasn’t there.” Recognizing the potential for AI to help authors produce vivid, affordable audio, he created Spoken to make professional-grade narration accessible to all writers.
AI NARRATION AS AN ACCESSIBLE OPTION FOR AUTHORS
Matty notes that for most authors, “paying for human narration is the most expensive part” of producing a book—often more costly than editing or cover design. Many books, she says, “are never gonna get into audio unless there’s some option other than the time that a human narrator would put into it.” Phil agrees and describes Spoken as a platform “built by authors for authors,” emphasizing affordability and creative control.
Spoken’s workflow allows users to upload manuscripts, analyze text for genre, style, and tone, and then select narration options, including single narrator, duet, or multi-voice formats. Phil demonstrates how the system parses dialogue, assigns voices to characters, and uses emotional cues to create dynamic narration.
AI VOICES, EMOTION, AND PERFORMANCE
One of the key advances Phil highlights is the automation of emotional cues and dialogue pacing. Spoken’s AI analyzes story rhythm, emotional tone, and character interactions to apply effects such as whispering, laughing, or shouting only when appropriate. Earlier versions, Phil admits, were “overzealous” with emotional assignments, but now “we only use them sparingly if it’s something dramatic that is needed.”
Phil gives an example using a short story in which an American man and his British female friend lie on the Oregon coast looking at the stars. Spoken automatically identifies the characters, assigns distinct voices, and even generates a personalized introduction.
COST STRUCTURE AND VALUE
Phil explains that Spoken’s pricing is transparent and scalable. Users can narrate 5,000 words for $10, or $5 if they subscribe for $50 per month. “If you do enough volume, which is 50,000 words in a month, you’re well justified in subscribing,” he says. “For a 100,000-word novel, if you’re a subscriber, that’s $100. And I don’t care if that’s single narrator, duet, or multi-voice—it’s still going to be $100.” He calls this “a whole new world” for indie authors who previously couldn’t afford audio production.
AI INFLUENCING WRITING STYLE
Matty observes that hearing how AI tools interpret dialogue has made her rethink how she writes. Phil agrees, noting that he “actually write[s] differently for this medium” and often removes dialogue tags, since “you know by listening who it is that is speaking.” He emphasizes that AI audio doesn’t just change how authors publish; it changes how they compose stories, pushing them to think in sound and performance.
For writers using Spoken, dialogue tags can be retained or deleted within the tool. Phil says he often keeps about two-thirds of them: “You want your reader to do the least amount of work necessary—that’s the main thing.”
CONSISTENCY AND CONTROL IN SERIES PRODUCTION
When Matty asks about using consistent voices for characters across a series, Phil confirms that this is easy to achieve. Once a user assigns a specific AI voice to a character, it can be reused across multiple projects. The system also excels at attributing dialogue correctly.
AUTHORS’ VOICES AND PERSONALIZATION
Spoken also allows authors to use their own voices. Phil demonstrates how users can record a short script to create a personal AI model that’s “available to you exclusively.” Authors can then use that voice to narrate entire books or combine it with AI or voice actor voices for collaborative narration.
Matty mentions her interest in co-authoring audiobooks, alternating narration with her co-authors. Phil says Spoken doesn’t yet allow real-time pairing of custom voices, but that functionality is planned due to user demand.
AI LIMITATIONS AND EVOLUTION
Phil and Matty discuss areas where AI narration still struggles, such as question intonation and homographs like “read” and “content.” Phil says these issues have improved dramatically and can often be resolved with small adjustments.
He also recounts a humorous example from his own novel, where an AI unexpectedly sang a line of dialogue. “They didn’t know it could do that,” he laughs, describing how the model generated a melody after interpreting the phrase “then began singing to himself.”
Phil emphasizes that AI narration is improving constantly. “This is the worst it will ever be. Today’s the worst it will ever be. Tomorrow is the worst it will ever be,” he says. With each iteration, AI tools gain more contextual awareness and natural delivery.
COMPARING SERVICES: HUME VS. 11 LABS
Spoken integrates with both Hume and 11 Labs, two leading AI voice services. Phil explains that 11 Labs offers “a really nicely reliable service that gets it pretty well right out of the gate,” while Hume delivers a “more natural, emotive sound” but requires more fine-tuning. He often uses both, depending on the project: “If you want a nice delivery where most every reader is gonna be happy, choose single or dual narration with 11 Labs. If you want more emotive, natural-sounding performances, Hume is a great choice.”
PUBLISHING AND DISTRIBUTION
Once an audiobook is complete, authors retain full rights to their work. They can download the files in multiple formats—including chapter-level MP3s or the LPF format required by distributors like Voices and Buy in Audio—and upload them to platforms that accept AI narration. “Spotify for Authors just opened it up to all digital narration on August 1st,” Phil notes, describing it as “a big validation of the AI narration market.”
Spoken also allows authors to stream and share their work directly from the platform, embedding playable samples on their websites or social media. While Spoken doesn’t currently monetize content, Phil says it’s focused on empowering creators: “We don’t claim any rights over it whatsoever. It is entirely yours.”
EXPERIMENTATION AND FUTURE DIRECTIONS
Phil advises authors new to AI narration to start small. Spoken gives users the first 5,000 words free, and there’s a “sandbox story” that allows them to practice editing and narrating. The company also plans to adjust when payment is required, allowing users to refine projects before purchase. “Sometimes we’re the biggest problem—we get in our own way. So we need to get out of our way,” he jokes.
Matty notes that she plans to experiment further with Spoken after seeing its capabilities in action, especially the ability to choose between Hume and 11 Labs. Phil emphasizes that the system gives authors unprecedented control: they can adjust pacing, emotional tone, and pronunciation (“you can change a word once phonetically and it’ll apply across the board”).
THE ROLE OF AI IN THE FUTURE OF AUDIOBOOKS
Both Matty and Phil agree that AI narration doesn’t replace human narrators but instead expands what’s possible. Phil summarizes it as “not really a zero-sum game.” He sees AI as a way to bring backlist titles and shorter works into audio that might otherwise never be produced. “Audio has always been an afterthought just because of the cost,” he says. “Now those can be brought to life—and be brought to life in new and exciting ways that weren’t possible before.”
Phil concludes that the mission of Spoken is simple: “It’s really about bringing stories to readers.” AI, he says, is making that easier and more affordable than ever before.