OpenAI’s Sora is a giant leap in AI Video tech, here’s why.

The whole AI industry has been shaken by the announcement of Sora and its amazing AI Video capabilities.

Feb 20, 2024

The whole AI industry has been shaken by the announcement of Sora and its amazing AI Video capabilities.

On Thursday, OpenAI unveiled Sora, its cutting-edge text-to-video generator, with stunningly realistic videos that highlight the AI model’s capabilities. “Sora is able to generate complex scenes with multiple characters, specific types of motion, and accurate details of the subject and background,” said OpenAI in a blog post.

“The model understands not only what the user has asked for in the prompt, but also how those things exist in the physical world.”

Read that last line again.

You all know OpenAI already has its AI-powered generators for text and image, ChatGPT and Dall-E. So we think we know about their products.

But Sora is unique because it’s less of a creative tool, and more of a “data-driven physics engine,” as pointed out by Senior Nvidia Researcher Dr. Jim Fan. Sora is not just generating an image, but it is determining the physics of an object in its environment and renders a video based on these calculations.

Sora builds the whole scene, places the camera, and films it.

Let’s take a look at the videos to explore what this means exactly. First, let’s apply the same prompt used by OpenAI in one of the existing tools for AI Video:

Prompt: A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage. She wears a black leather jacket, a long red dress, and black boots, and carries a black purse. She wears sunglasses and red lipstick. She walks confidently and casually. The street is damp and reflective, creating a mirror effect of the colorful lights. Many pedestrians walk about.

This is just a video of 8 seconds. Please keep that number in your mind. In that period, it shows two of the biggest problems common in current AI Video tools:

The persons walk slowly and unnaturally.
The items morph.

women walking in tokio, two scenes — In the 8-seconds video, both women change, as their glasses, clothes and hats.

Current tooling slowly loses one hat, because it’s focused on building frame after frame, pixel after pixel, but not in the context.

Now, let’s see the video from Sora:

I want to point a very important detail:

The whole sequence is ONE MINUTE long.

That’s just impossible in the existing tooling open to users. (Yes, Sora is not yet open to users. That is important too).

Although OpenAI did not specify a public release date for Sora, its introduction marks the company’s foray into AI-generated video content, complementing existing tools like ChatGPT and DALL-E for text and image generation.

Woman in Tokio, two scenes. — Two stills of the same sequence.

After 20 seconds (!), Sora’s Video has the following details:

Clothing remains the same.
Reflections are not random. There is a logic in them.
Bystanders have their own life.

This is what sets Sora apart: its emphasis on data-driven physics simulation. Unlike traditional creative tools, Sora goes beyond merely generating visuals; it simulates the physical behavior of objects within its environment to produce lifelike videos based on these calculations.

Creating videos with Sora (and any AI Video tool) is a straightforward process: users input a few sentences as prompts, akin to how AI-image generators operate. Within minutes, users can witness astonishing results in either a photorealistic or animated style.

In the second half of the video, there is a close up sequence of more than 20 seconds as well (!). Note the following details:

There is an internal logic in the reflections of the glasses.
The face details are sustained. Even the moles in the face of the woman, as she turns.

Sora’s generated videos surpass its competitors in terms of duration, liveliness, and continuity. The output from Sora truly resembles genuine videos, in contrast to competitor models that frequently appear as a series of AI-generated still images pieced together. OpenAI continues to shake up the AI landscape by unveiling a video generator that outperforms its rivals with exceptional effectiveness.

Thanks for reading!

Hi👋 I’m Erik, a product designer by day, and AI educator by night.
I‘m exploring this new world of AI tools for creative work. I will be sharing my learnings here, it would be amazing if you can join me on this journey.

Discussion about this post

Ready for more?