Improving Character Consistency in AI Video with Kling's AI upgrade
I pushed the new 'Elements' feature to its limits with a three-scene cinematic test. Here are the unfiltered results.
For anyone creating stories with AI video, one challenge has towered above all others: character consistency.
Getting the same face, outfit, and style to appear across multiple shots has been the holy grail. But that might be changing. Kling AI just rolled out a major upgrade to its 'Elements' feature, promising to solve this very problem. I decided to put it to the test.
What is Kling AI’s Elements?
Elements is a feature in Kling AI, one of the leading platform for creating AI videos, that allows users to combine multiple images to generate a single, consistent video.
It enables users to upload up to four images as "elements" and specify how they interact within a prompt to create a cohesive video. This helps maintain consistency in characters, objects, and backgrounds across different frames of the video.
To learn more: Kling's Elements: Now you can achieve Character Consistency in AI Video
So I guess it’s testing time. Nothing too scientific. To see if Kling can really handle a multi-scene narrative, I designed a simple three-act test.
It's not a scientific benchmark, but a practical test: can I direct my AI actors through a simple story? We'll see them arrive, have a confrontation, and encounter a mystery. If Elements can handle this, it's a game-changer.
Defining the Characters
The first thing we need to do in any scene is to cast our characters. I’m using Midjourney for this. With the proper sref codes, it delivers awesome characters.
photography, full body sci-fi character, flat colored background, cinematic still --no helmet --chaos 10 --ar 3:4 --exp 10 --raw --stylize 1000
Add your favorite sref code / style to this prompt.
One cool feature of Elements is that you can select the part you want to use: the face, the costume, or the whole character:
In our case, we are using the full characters.
Next, let’s craft a setting for the scene:
photography, epic scene setting --chaos 30 --ar 16:9 --exp 10 --sref 691694565 2738280702 --stylize 1000 --weird 30
And we are set.
The First Scene: Walking into the setting
The goal here is to have the characters arrive together to the scene. With the help of Deepseek (included in Kling) I crafted the following prompt:
The woman in a sleek black turtleneck and fitted jacket, gripping a pistol firmly, and the man in a trench coat with a guarded stance, walk cautiously up the red-carpeted stone steps flanked by weathered pillars, their eyes sharp and movements deliberate under a star-dotted night sky, as the camera orbits them in a slow, suspenseful arc.
I would expected a little more camera movement, but otherwise it’s very good. The characters hold well.
The Second Scene: Face-off
The goal is to have a more complicated camera movement. Let’s see if it can be done.
Shot starts with a extreme close up on the sleek black-clad woman in a fitted turtleneck, then zooms out to reveal a face off against a man in a trench coat with crimson accents, both gripping pistols with sharp-eyed intensity on weathered stone steps draped in crimson carpet. The camera circles dynamically around their tense standoff under a star-speckled night sky, capturing deliberate movements against ancient pillars and shadowy rock formations.
Mmhh. This is where we hit a snag.
In my experience, AI video generators struggle with complex camera movements that also involve character interaction. A 'face-off' implies a specific spatial relationship, and the dynamic camera move I asked for was probably too ambitious. The output lost the characters' likeness and the composition fell apart.
It's a great reminder that while these tools are powerful, they still have clear limitations. We're still the directors, and we need to learn what shots the 'AI camera' can actually pull off.
The Third Scene: Finding a strange being
The goal here is to have them both interact with an alien robot.
The backs of the woman in a black high-collar outfit and the man in a military-style coat can be seen, gripping their weapons tightly as the towering alien robot with concentric light rings around its lens-like core methodically advances toward them across the rocky terrain, its mechanical limbs casting jagged shadows under the crimson glow of the pillar-lined cavern.
That robot stole the legs and shoes of our male character. Now THAT’s Evil.
I used 4 Elements in this scene: Our 2 characters, the floating robot, and the setting. I would suggest you try to stick to 3 Elements to get better results.
Let’s wrap up.
What worked? Character consistency held up well in simpler shots (walking, reacting). The ability to use the entire character from an image is powerful.
What didn't? Complex interactions and elaborate camera choreography are still a major hurdle. The AI struggles to maintain relationships between two characters in a dynamic shot.
My Recommendation: While this improvement is good enough, it’s not enough for Prime Time. My guess is this is a needed step to bring Elements into Kling 2.0, which is in an another category by itself.
For now, Kling' AI’s Elements is excellent for creating consistent establishing shots and character intros, but you'll need to be clever with your prompts to simulate complex scenes.