Can’t find the right voice? – Design one from a text description

AI Create Index 550 (meaning that although the copy is original, AI was used in the research for the content, and the copy was refined and improved using AI. AI was alsoused in generation of some graphics) THIS IS 50% ORIGINAL CONTENT.

#AI #eLearning #AIinLearning #TextToSpeech #AIvoicegeneration #InstructionalDesign #OnlineLearning #ElevenLabs #CustomVoices

If you’ve ever spent hours scrolling through AI voice libraries trying to find the “perfect” voice for your project, you’re not alone. The options are often good — but not quite right. Too formal when you need warm. Too young when you need authoritative. Too flat when you need expressive.

For instructional designers, that gap can mean settling for a voice that doesn’t fully fit the tone of your course — or spending extra time and money sourcing a human voiceover.

That’s why tools like ElevenLabs and Hume AI are so exciting. Both now let you describe the voice you want in plain language — and the AI generates it for you. Instead of picking the “least wrong” option from a library, you can create exactly the voice your project needs, whether it’s empathetic for healthcare training, upbeat for onboarding, or even… a pirate for International Talk Like a Pirate Day.

What’s New: Voice Design by Description

ElevenLabs’ Voice Design feature lets you generate a custom voice model simply by typing what you have in mind. Want a “warm, calming female voice with a gentle Irish lilt for mindfulness training”? Or a “confident, upbeat male voice with a West Coast American accent for onboarding”? Just describe it — and the AI synthesizes a consistent voice.

And because you’re not limited to whatever happens to be in a preset library, you can get creative. For example: let’s say it’s International Talk Like a Pirate Day and you need a “gruff, West Country pirate voice, sinister but with a mischievous laugh” for a light-hearted learning challenge. Other platforms may not stock pirate voices in their libraries — but with description-based design, you can generate one in seconds.

Hume AI offers a similar capability with its Octave engine, where you can enter a prompt like “a grizzled, charismatic pirate captain with a booming laugh and stereotypical accent” and hear it instantly. Like ElevenLabs, it emphasizes emotional tone and context, so you can design characters as well as narrators.

Use Cases in eLearning

For learning professionals, the ability to design voices from scratch is more than a novelty — it’s a powerful storytelling tool. In role-play and simulation, for instance, you can instantly create distinct characters that feel believable without the cost or logistics of hiring multiple actors. A mentor, a peer, and even a challenging customer can each have their own authentic-sounding voice, adding depth and realism to the learner’s experience.

This flexibility also opens the door to more inclusive narration. Rather than defaulting to a handful of generic accents, designers can represent regional and cultural diversity more accurately, helping learners connect with content in ways that feel familiar and relatable.

Scenario-based learning benefits too. A compliance module might call for a firm, authoritative delivery, while a healthcare training course might need a softer, more empathetic voice. Onboarding materials, by contrast, can be lifted by an upbeat, energetic tone. Being able to design these voices directly means the narration can match the intent of the content, not the other way around.

Finally, there’s the matter of personalization and accessibility. Learners don’t all respond to the same style of narration. Some prefer calm, steady delivery, while others engage better with lively, conversational tones. With text-to-voice design, you can provide alternatives that allow learners to choose the voice they’re most comfortable with.

In short, these tools make it possible to create entire “casts” of voices tailored precisely to your course material, adding richness and authenticity that keeps learners engaged. Now you don’t have to settle for “generic corporate narrator #3” — you can make your module sound like a kindly mentor, a stern compliance officer, or even a West Country pirate.

How It Works in Practice

The workflow is simple on both platforms:

Enter a descriptive text prompt (tone, gender, age, accent, mood, etc.).
Preview the generated samples.
Refine the description until you get the right fit.
Save the voice and use it like any other voice model for narration or dialogue.

ElevenLabs also lets you layer this with its emotional control features (e.g. serious, playful, calm) and access a growing voice library of community-generated voices. Hume leans heavily into empathy and expressiveness, encouraging prompts that specify emotional tone (“whisper fearfully”, “speak sarcastically”).

How They Compare to Other Tools

Most mainstream TTS platforms — including Microsoft Azure Neural TTS, Google Wavenet, Amazon Polly, and Murf — rely on:

Selecting from a large voice library, and/or
Voice cloning from an audio sample.

While Murf offers strong fine-tuning (pitch, pace, emotion), and Azure allows custom voices with more technical effort, they generally don’t provide “describe it in plain English and build it” workflows.

That’s where ElevenLabs and Hume stand out: they’re pushing text-description-based voice creation into production use.

Feature	ElevenLabs	Hume AI	Murf	Azure / Google / Polly
Pre-set Voice Library	✅	✅	✅	✅
Voice Cloning	✅	❌	✅	Azure Only
Emotion/Style Controls	✅	✅ (very strong)	✅	Limited
Voice from Text Description	✅	✅	❌	❌
Character / Fun Voices (e.g. Pirate)	✅ (library + prompts)	✅ (prompt-driven)	❌	❌

Why It Matters for Learning Designers

For years, creating diverse, authentic voiceovers meant juggling actors, recording sessions, and editing — all costly and time-consuming. With ElevenLabs and Hume’s design-by-description workflow:

You reduce production costs without sacrificing quality.
You can localize or personalize faster than ever.
You can experiment creatively with voices that fit your storylines and scenarios.

The result: more engaging, relatable learning experiences at scale.

Final Thoughts

Just as ElevenLabs revolutionized AI narration with realism, its Voice Design feature shifts the focus to creative control. Hume is also innovating here, particularly in emotional nuance and expressive character voices.

For us, ElevenLabs remains a go-to tool in eLearning production because of its balance of realism, ease of use, and growing voice library. But it’s worth keeping an eye on Hume — especially if your projects lean heavily on conversational agents or expressive, character-driven scenarios.

And yes, sometimes it’s simply about adding a pirate voice to your module for fun — and now, you can.

Want expert advice on integrating AI-generated speech into your eLearning content? Profile Learning Technologies specialises in AI-driven learning solutions tailored to your business needs.

📩 Contact us today to explore how AI speech can enhance your eLearning strategy.

If you’re curious to explore more of what ElevenLabs can do, we encourage you to try it for yourself. Disclosure: This article contains an affiliate link. If you decide to subscribe after clicking, we may earn a small commission — at no extra cost to you. It helps support future research and articles like this.