I dunno. They both do a pretty crappy job following instructions in my experience. Just a few minutes ago I tried two different prompts for the same scene in Gemini video. It was supposed to be a basset hound dog chasing a rabbit. The first one had the rabbit chasing the dog and the dog's ears kept changing between dog and rabbit ears. The second video just created two dogs.
In my experience, relationships between two elements it's the hardest thing for AI models. Are you using text-to-video, or are you using frames/ingredients?
Text to video. There's some other things I've tried, but we're a bit more complicated which only got worse the more I tried it. Using the same prompt with different models. Meta AI (Meta Vibes) is probably the worst of the few I've toyed with.
Another experiment was using an image for reference. Using my photo to create a plush doll version of myself (thanks Snapchat.) But the models struggle with using the reference image. Sora2 insisted on using the image itself as the first frame before continuing with the scene. So I removed the background to green screen and told it to exclude the green screen. Apparently negative prompts make it worse.
For the plush doll, I would suggest the best tool is nano banana or wan 2.5. And then maybe you can use A-to-B Frames to animate your photo into the result from nano-banana.
Thanks for the suggestion. I'll give it a try later.
I also used the image and completely removed the background and saved as a PNG. I guess Sora cached the original image in my session. No matter how many times I uploaded for a new prompt it stuck with the green screen. Even renaming the file didn't help.
I dunno. They both do a pretty crappy job following instructions in my experience. Just a few minutes ago I tried two different prompts for the same scene in Gemini video. It was supposed to be a basset hound dog chasing a rabbit. The first one had the rabbit chasing the dog and the dog's ears kept changing between dog and rabbit ears. The second video just created two dogs.
In my experience, relationships between two elements it's the hardest thing for AI models. Are you using text-to-video, or are you using frames/ingredients?
Text to video. There's some other things I've tried, but we're a bit more complicated which only got worse the more I tried it. Using the same prompt with different models. Meta AI (Meta Vibes) is probably the worst of the few I've toyed with.
Another experiment was using an image for reference. Using my photo to create a plush doll version of myself (thanks Snapchat.) But the models struggle with using the reference image. Sora2 insisted on using the image itself as the first frame before continuing with the scene. So I removed the background to green screen and told it to exclude the green screen. Apparently negative prompts make it worse.
For the plush doll, I would suggest the best tool is nano banana or wan 2.5. And then maybe you can use A-to-B Frames to animate your photo into the result from nano-banana.
Thanks for the suggestion. I'll give it a try later.
I also used the image and completely removed the background and saved as a PNG. I guess Sora cached the original image in my session. No matter how many times I uploaded for a new prompt it stuck with the green screen. Even renaming the file didn't help.