Veo 3 just gave your Midjourney characters a voice
You can now add images to Veo 3, giving you more control over the look of your characters
If you've ever wished your Midjourney-generated character could do more than just stand there looking mysterious and doing amazing moves, Veo 3 has news for you: It can now talk.
You already know what Veo 3 can do. I’m still amazed by it.
And now, you can add your images from Midjourney (and any other AI Tool you use) to it.
Yes, the latest update introduces speech support to first-frame-to-video workflows.
That means you can upload a picture of your character and have them animated with audio included. It’s still in beta, so your video might ghost you on the sound sometimes, but hey, by now you know that’s the price of living on the cutting edge.
Why is this important?
You’ve probably hit this roadblock before:
You’ve got a character image, a script, and a vision, but no quick way to give it life without jumping through several tools. You can animate it, but adding sound and voices is still a pain.
Veo 3 now takes you one step closer to a streamlined, all-in-one storytelling workflow: image, motion, and voice, all under one roof.
No more bouncing between animation tools, voiceover generators, and editors. You upload your image, set the mood, and Veo tries to do the rest.
Let’s give it a try.
The workflow
First start with an image. I want to test images from Midjourney, as those have specific styles, and should be interesting to see if Veo 3 can keep the mood.
But any tool works.
Side note: I use Midjourney, and tend to assume most creators use it. But if you don’t, please let me know which tool you use. I will try to add prompts and references to the most used ones.
I used the following prompt in Midjourney:
Portrait of a woman --chaos 10 --ar 16:9 --exp 10 --sref 3399284916 --stylize 1000 --v 7
Now we are going to upload the image to Veo 3, using Flow. Select the following options:
Frames to Video - To be able to add Initial/Final frames.
Veo 3 Fast - It uses less credits and adds a russian roulette to my life.
Livin' On The Edge, yeah.
Mmh. I found a problem.
Veo 3 confused the girl from Midjourney as Taylor Swift, or something. I just need to roll again the prompt until I get a woman that doesn’t resemble a celebrity.
There you are.
Now we need to describe the video. Take your time, imagine it with as much detail as you can. Veo 3 can take it, most of the time.
Static camera, woman says with ironic calm “Things are going to get weird. Be ready”. The camera looks away 360, to the scene around her, there are people running scared screaming “Fire!”, cars and buildings on fire, skyscrapers collapsing
Set. Camera. Action. This is the result:
What do you think?
A few notes:
Veo 3 Fast it’s a quicker, more credit-efficient mode to help you test ideas without burning through your budget.
More stable audio coverage, fewer phantom subtitles, and various latency and bug fixes. We’re not at perfection yet, but let’s hope we're past the "Why does my video have gibberish subtitles?” phase.
We are one step closer to making our AI-generated film without needing ten apps and a meditation break.
If you test this workflow, please share the results with me. Would love to watch what you create.