Craft amazing voices in Eleven v3 (Alpha) with these five tips
How to add emotion, nuance, and multi-voice control to your generated voices with the new advanced audio tags
Let’s face it: generating voices with AI used to be a monotonous task.
But Eleven v3? It’s more like directing your own audio drama, if you know how to prompt it right.
If you’ve recently started experimenting with ElevenLabs’ Eleven v3 (Alpha) voice synthesis model, congratulations, you’ve just been handed a much more expressive, powerful, and frankly, unpredictable tool.
And like any powerful tool, it can either produce something wonderful… or leave you wondering how to make your character sing in a Russian accent.
This guide is here to help you avoid the latter.
The Elephant (or Eleven) in the Room: V3 Is Still in Alpha
Let’s get the caveats out of the way: Eleven v3 is still in alpha. That means short prompts can be inconsistent, and some voices might behave like they’ve had too much digital coffee.
The solution? Think long. Prompts over 250 characters perform better and give the model room to shine. Think of it like improv acting, your voice needs a full script to really perform.
Tip 1 - Voice Selection is your model’s soul
The voice you choose is arguably the most important decision you’ll make. Each voice comes with its own emotional range and limitations. If the voice is naturally high-energy and expressive, don’t expect it to whisper convincingly. And if it’s monotone, adding [laughs hysterically]
won’t turn it into a stand-up comedian.
💡 Pro tip: PVCs (Professional Voice Clones) aren’t fully optimized for V3 yet. Use IVCs or stock voices for best results in this alpha stage. The gang from ElevenLabs have compiled over 22 excellent voices for V3 here.
Tip 2 - Finding the sweet spot with Stability
The stability slider controls how closely the output sticks to the voice’s original tone. You’ve got three options:
Creative – Expressive and emotional, but might hallucinate. Fun, but a bit unhinged.
Natural – Balanced, neutral, best for general use.
Robust – Super consistent, but less responsive to tone changes.
✨ Want emotional expression? Stick with Creative or Natural.
Tip 3 - Audio Tags are your secret weapon
Now to the goodies: Eleven v3 introduces audio tags that control tone, sound effects, accents, and more. These tags act like stage directions. Use them wisely.
Emotional & Vocal Tags:
Use the following:
[whispers], [laughs], [sarcastic], [curious], [crying], [sighs], [excited]
Example using voice of Nichalia Schwartz:
[whispers] I never knew it could be this way... but I’m glad we’re here.
Sound Effects:
[gunshot], [applause], [swallows], [explosion]
Example using voice of Nichalia Schwartz:
[applause] Thank you all for coming. [gunshot] Wait—what was that?!
Experimental & Fun:
[strong French accent], [sings], [woo], [fart]
Example using voice of Nichalia Schwartz:
[strong French accent] I had a wonderful evening, my dear [fart] Oh!
Warning: Results may vary. Especially with [fart]. But I had to try.
Tip 4 - Punctuation is not just grammar, it's drama
Ellipses (...) = pause for effect
ALL CAPS = emphasis
Proper punctuation = natural rhythm
Example:
"It was a VERY long day [sigh] … NOBODY listens anymore."
Tip 5 - Create multi-speaker prompts
Assign a different voice to each speaker and structure your script like this:
Example using voices of Liam and Nichalia Schwartz:
Speaker 1: [curious] Wait, are you a robot?
Speaker 2: [robotic voice] Of course not. Why would you think that? [binary beeping] 010010001!
Yes, Eleven v3 try doing even this. Whether or not it should is another matter.
Extra tips
Mix tags: Try mixing [laughs][sarcastic] for complex tones.
Match tone to voice: A somber narrator won’t do well with
[giggles]
.Structure matters: Write like a screenwriter, not a programmer.
Test thoroughly: Tags don’t behave the same across voices. Run a few pilots.
Here is a more elaborated prompt, with a Midjourney image animated with Heygen:
Prompting Eleven v3 it’s a highly creative task. Think of it as directing your virtual cast. With the right tags, the right structure, and the right voice, you’re crafting a performance.
So take a breath, grab your [coffee]
, and start experimenting.
Just… maybe don’t prompt it to sing in French and Russian accents and whisper while laughing during a gunshot. Unless you’re building the world’s weirdest podcast.
In that case: carry on.
Disclosure: This post may contain affiliate links. If you click on a link and make a purchase, I may earn a small commission at no extra cost to you.