"China News Weekly" reporter/Li Jing

  Published in the 1132nd issue of "China News Weekly" magazine on March 18, 2024

  By inputting just a few words, a coherent and stable 60-second video can be generated.

After the release of OpenAI's new hot model Sora, another heavy "bomb" was dropped in the field of AI. It has been some time since the release in the early morning of February 16 (February 15, US time), and the discussion is still at a fever pitch.

  Although Sora has not yet opened registration to international users, practitioners inside and outside the technology industry have begun to pay attention and investigate. Some people are worried and some are excited. Among them, the cultural industries such as film and television, games, animation, etc. have suffered the most direct impact.

After all, in the past, if people wanted to shoot even a simple video, they had to write a script first, then set up the scene, spend time recording, and then perform various post-production and editing before it could finally take shape. But now, all this seems to have been overturned. .

  In the 60-second video released by OpenAI, the scenes, characters, and even their actions and interactions are all presented with amazing authenticity and detail. Even in a jump-cut scene, Sora looks like an experienced film director. Similarly, ensure smooth connection of every frame of image.

It seems to have been able to handle the continuity of objects and characters. Between different shots of a single video, even if some characters temporarily do not appear in the picture, when they appear again, they can still maintain their previous state stably.

It can also create multiple perspectives for the same character, just like people in real life view the same thing from different angles.

  Some people say that if this continues, Sora will soon be able to "feed scripts and spit out movies." If every ordinary person using Sora technology can easily make a movie, will the director's profession still exist?

If directing is no longer a profession that requires a professional background, what about lighting, photography, scenery, makeup, post-production, and even actors?

Sora "understands human speech"

  In the early morning of February 16, Yu Linfeng, CEO of Hangzhou Zhiliao Siyuan Internet Technology Co., Ltd., was preparing to go to bed when he suddenly received a Sora-related report from a friend. His first reaction was that he was "slapped in the face."

  Yu Linfeng's company has been working hard in the generative AI industry for a year. This year's Spring Festival "generative AI game" that focuses on the social pain points of young people, "Decisive Battle·New Year's Eve", was developed by them.

In Yu Linfeng's view, until the end of last year, in the field of "generative artificial intelligence AIGC", the most mature ones were text, followed by text-based pictures, and video mode has always been biased towards academic research, regardless of Runway, Google or Pika. News is released every three days, but the effects of those applications are "simply not for people to see," let alone industrial applications.

Therefore, Yu Linfeng has been saying that the "GPT moment" in the Vincent video field will not come so early. Unexpectedly, just over a month after 2024, Sora will appear.

  The speed of Sora's appearance exceeded almost everyone's expectations.

Before the emergence of Sora, Runway had always been regarded as the default choice for AI-generated videos, especially since the launch of the second-generation model in November last year. Runway has also been called the "MidJourney of AI video."

The second-generation model Gen-2 not only solves the problem of low coherence between each frame in the first-generation AI-generated video, but also gives good results in the process of generating video from images. Runway can generate 4 seconds For long videos, users can extend them up to 16 seconds, which is the longest record that an AI-generated video can achieve in 2023.

Stable Video also offers 4-second videos, while Pika offers 3-second videos.

  Li Dongdong, founder of Geek Movies and deputy director of the Special Committee for the Integration of Science and Film, told China News Weekly that the Special Committee for the Integration of Science and Film is a group of people who are most interested in the topic of the integration of science and technology and film. In the exchanges and discussions between journalists and the scientific community, everyone discussed what artificial intelligence might develop into. Wencheng Video was something everyone could imagine, but no one expected it to be so fast.

Li Dongdong has several AI-related WeChat groups. After Sora's video was released, everyone in each group was shouting "shocked".

"It's two years faster than we thought." Li Dongdong said.

  After seeing the Sora report sent by a friend, Yu Linfeng, who was about to go to bed, studied the samples and technical reports released by OpenAI overnight.

Compared with "predecessors" such as Runway and Pika, the samples displayed by Sora are not only longer, but also more stable and controllable. In layman's terms, the videos generated by Sora are "reasonable".

OpenAI's technical report details its "friendly" application scope. For example, when users enter their thoughts into Sora, ChatGPT-4, a large language model belonging to the same company, will help users generate simple thoughts into more details. It is also supported by the Vincentian graph model Dall-E3 developed by OpenAI.

"Dall-E is in its third generation, and its ability to understand natural language is already very strong. Plus ChatGPT-4 helps organize it. When using Sora in the future, people don't need to carefully organize prompt words. Simply put, that's it' You can understand human speech." Yu Linfeng explained to China News Weekly, "It is conceivable that the threshold for its use will be very low."

  Li Dongdong feels that this is scary when you think about it carefully, because ChatGPT-4 has become an application tool for most people around the world. It brings together people all over the world’s understanding of the world. These data are the “crude oil” of AI. In large language models, With ChatGPT-4 having the unique advantage of having enough data, Sora's intelligence is likely not to increase linearly, but to increase exponentially with acceleration.

  Not only can you directly generate videos with text, Sora also accepts pictures and videos or a combination of the two, such as uploading pictures and prompts, specifying the picture as the first frame of the video or the last frame of the end; it can adapt the video, such as a car The background of a racing car driving in the desert can be changed to a tropical rainforest through Sora.

Sora can also achieve smooth transitions between two videos, which OpenAI officially calls "Connecting videos". It can achieve gradual interpolate between two input videos, and the composition of themes and scenes can be achieved through gradual interpolation. Create seamless transitions between completely different videos.

Judging from the several test videos released, the effects are magical and dreamy.

  Movie blogger "Movie Fan Xinsheng" said with emotion: "You can try to think about it. If it were you, you only got the corresponding original video and let you use your imagination. You just want to realize the transition from video A to video B. They should all be conceived carefully for a while (leaving aside the time and cost of technical “implementation”).”

  As a result, many people say that the film and television industry is about to change, especially science fiction and fantasy works. After all, such works test imagination, spectacle and the ability to create another world.

If you can't beat it, join

  Yu Gang is the assistant director of the first part of "The Wandering Earth". In recent years, he has been preparing his own science fiction movie and has been paying attention to the development of generative artificial intelligence.

As soon as he got up on February 16, he saw that his AI groups had "exploded".

What impressed him most about the 60-second Sora official video was a woman walking on the streets of Tokyo. Not only was the subject coherent and stable, but there were multiple shots, including slowly cutting from street scenes to close-ups of facial expressions, and skin flaws were clearly visible. The wet street floor reflects the light and shadow effect of neon lights.

  From the director's point of view, he feels that Sora has made a qualitative leap in dynamic performance and stability compared to the commonly used video generation AI such as Runway. Some shots have almost reached the level of professional film and television production. Without careful identification, it is no longer possible to distinguish. Distinguish whether it is a real shot.

  This makes many content creators anxious.

A YouTuber named Paddy Galloway commented to Sora: "The content creation industry has changed forever, it's absolutely no exaggeration... anyone can make incredible products without barriers to the 'idea' behind the content ' and stories will become more important." Dong Runnian, director of this year's New Year's Day hit film "The Annual Party Can't Stop," said on social media: "The traditional film and television industry is basically coming to an end. Let's think about what to do after changing careers. "

  "The emergence of Sora is of course shocking, but in fact the most anxious moment has passed. The time when everyone felt most anxious was at the turn of summer and autumn last year." Li Dongdong told China News Weekly that at that time, the film and television industry had not yet completely recovered from the epidemic. Recovery, after OpenAI released ChatGPT, generative artificial intelligence has made great strides in development. Facing the unknown, people will instinctively feel fear.

Seeing the anxiety and discussions in the industry, in the summer of 2023, the Science and Film Integration Committee simply held an outdoor seminar. Many decision-makers and creators in the film and television industry came to participate. Li Dongdong remembered that the seminar location happened to be behind the scenes. Facing the lake, everyone joked: "This is going to be a last-ditch fight."

  At that time, everyone in the industry knew that the most anxious person was director Guo Fan.

At the Pingyao International Film Festival in October 2023, director Ning Hao said: "Guo Fan said every time he saw me that AI has improved again."

  Guo Fan told China News Weekly: "People are most anxious when they don't know what to do." For nearly half of last year, except for a few hours of sleep every day, he spent almost all his time after work thinking about artificial intelligence. .

New technologies are being updated every day. Shortly after all the publicity work for "The Wandering Earth 2" ended, Guo Fan led the team to start a global inspection and study.

From Shanghai Artificial Intelligence Laboratory, SenseTime, Huawei, Xiaomi to Weta Studio in New Zealand, and then to Apple, Google, Meta, Intel, Pika in Silicon Valley in the United States... In addition to high-tech companies, there are also universities such as Stanford and Zhejiang University. School.

  After returning to China, Guo Fan sorted out the film-level research and development directions based on the inspection results of 19 domestic and foreign technology companies and universities, and began to establish various strategic cooperation.

The anxiety visibly receded.

"Now at least I have found a direction, I have a strategy, I know where to start, and I know what to do." Guo Fan told China News Weekly, "Of course, whether the 'how to do it' is right, and whether it will succeed in the end, this I don’t know, but at least I have to move first.”

  The Special Committee for the Integration of Science and Film has provided scientific consultants and introduced new technologies for two film and television dramas such as "The Wandering Earth" and "The Three-Body Problem". In Li Dongdong's eyes, Guo Fan, whom he has known for many years, is a typical representative of geek creators. Have strong interest and curiosity in science and technology.

"Not only Guo Fan, but now there is actually a new generation of directors. They live in an era where technology is changing the world, and they are very willing to embrace new technologies."

  Nowadays, these geek creators are no longer so anxious. If you can’t beat them, just join. Everyone puts all their efforts into understanding, learning and using new technologies.

Yu Gang is also one of these directors. He especially understands Guo Fan's anxiety. "After all, Director Guo is rushing to create the best science fiction film IP in China, so he is under a lot of pressure." As a director who is seeking to achieve his goal, Yu Gang, a new director of science fiction movies who has made breakthroughs from scratch, has always been optimistic about the development of AI, such as the emergence of Sora. He believes that this means that outstanding creators of science fiction movies finally no longer need to be restricted by huge visual effects costs.

  "The most expensive part of our science fiction movies is visual effects." Yu Gang's voice rose a lot when he mentioned this, "If it was really a shot like this on the streets of Tokyo, I wouldn't need to use Sora at all, I would just use a camera Just shoot on the streets of Tokyo. But what if you want to shoot a cyberpunk city? What about Mars? Places that are completely unfilmable will require a lot of money for us." But all new directors know that, When it is unknown, the resources that can be mobilized are limited and how difficult it is to attract investment. If artificial intelligence can "stably and accurately generate" such a scene, it will be a great benefit.

Yu Gang couldn't help but imagine that one day when filming a science fiction movie, Sora would directly generate a science fiction scene with camera movement, environment, and extras. "Give me some space in the middle, and I'll put my actors in, and that's it," he said. ”

  Another Sora visual effect that excited the directors is the extremely realistic "cats and dogs", because looking at the global special effects industry, creatures are the most troublesome. Animals are uncontrollable, and realistic and textured animal hair is even more so. At the top of the special effects production pyramid.

When "Crazy Alien" was released in 2019, director Ning Hao revealed in an interview that the film spent 200 million yuan on special effects alone, almost all of which was spent on the biological special effects of aliens and monkeys, because this is the most popular thing in the world. The range is the most difficult special effects to do.

  "Creative special effects have always been a shortcoming in the domestic special effects industry. Of course, there are companies that can do it, but only the top few, and they are expensive." Yu Gang told China News Weekly that if Sora can solve this problem, then small and medium-sized video Any efficient company can take on this kind of business, and the price will come down immediately.

"If Sora is 'obedient' enough, we can ask it to generate what I want in a green shed, including how the animals move, and we just need to cut it out. Or we can prepare the scene, feed it to Sora, and tell it to Add a bunch of animals and it would be so convenient.”

  However, all this can only be imagined. OpenAI has not fully opened up the use of Sora. Judging from the current samples, it is still quite far away from the film production process people are talking about. What's more, just like a Just like a movie trailer, the producer always wants to show the best and most exciting pictures first.

"A fuzzy input gets a fuzzy result"

  Since so many people want to take advantage of this top stream, various "Sora videos" have appeared on the Internet since the day Sora was released.

Some of them are purely fake, using real videos to falsely claim to be generated by Sora, because as long as you have a relationship with Sora, the number of views will start from almost one million.

Some are purely playful, such as the viral “Will Smith Eating Spaghetti” video.

  People who are interested in AI videos probably know that "Will Smith Eating Noodles" is nicknamed "the Turing test in the video generation world" because the character's hands, noodles, and the deformation of the noodles after they are eaten are all important to AI. It is a huge challenge. Famous video generation AIs such as Runway and Pika have contributed a lot of ghost material on this topic.

Someone launched this challenge to OpenAI CEO Sam Altman (Sam Altman), who "takes orders online", but Altman did not answer the challenge.

Not long after the results, the so-called "Will Smith Eating Noodles" generated by Sora began to spread all over the Internet. Judging from the video, Sora has completely overcome the problem, and every detail is accurate and realistic.

It didn’t take long for everyone to discover that the video was recorded by Will Smith himself. He acted as himself generated by artificial intelligence and deceived netizens...

  Therefore, amid videos that are hard to distinguish between true and false and people's always overly idealistic imaginations, the bustling Sora has become another trend that almost everyone recognizes, and no one wants to miss it.

On Taobao, some people have begun to sell Sora usage tutorials and internal testing qualification application services, with prices ranging from 9.9 yuan to 199 yuan.

A few days after Sora came out, a team called "Sora's First Movie Co-Creation Organizing Committee" was born in Hengdian. Many screenwriters' WeChat groups received a flyer called "Sora's First AI Movie Co-Creation Plan". Shuyun Documents, the document publisher said, is soliciting scripts from the whole society, "preparing to jointly create the feat of the world's first AI movie."

  In fact, although the video produced by Sora has been released, it has not been publicly used, nor has it been opened to the public for testing. It has only been shared with a select group of researchers and scholars.

Ordinary people who want to use Sora have only two ways: either leave a message on CEO Altman’s social media. Not long ago, he started an online order-taking mode. When netizens make requests, he will help everyone complete them; or apply to become an OpenAI developer. Red team members, the so-called red team can be understood as simulated hackers, help OpenAI test the security of Sora. The recruitment conditions are very strict. They must have a US credit card, a US residential address, a US-registered computer, and a non-host IP address.

  Science fiction director and screenwriter Zhang Xiaobei always pays attention to the videos released by OpenAI. These videos are certainly amazing, but from the perspective of movie-level applications, he thinks there is still a long way to go, let alone "feeding scripts and spitting out movies." , Zhang Xiaobei feels that this is completely the imagination of an outsider. Everyone thinks of film production, which requires complex and extremely professional skills in every aspect, too simply.

  Even the samples released by Sora have many bugs: women who can turn their heads 180 degrees, candle flames that cannot be blown out, characters with six fingers... Even in the most popular "Tokyo Street Fashion Women" video, Her left and right legs also strangely switched positions twice.

Maybe Sora can solve bugs quickly with its rapid development speed, but it is likely that it will still not be used by professional film and television creators in the creative process in the short term. In Zhang Xiaobei’s view, the biggest problem is that AI cannot Be precise.

  In this regard, Li Dongdong gave a vivid example: "You give Sora a prompt word - a little bear sitting at the table doing homework, and it will show a video. Then, if you enter the same prompt word again, it will show a different video. Bears, different desks. Its rules are like a black box. You have no idea and no control over what comes out."

  For all content creators, this is AI’s most fatal shortcoming.

"You can only get a fuzzy result through a fuzzy input." Zhang Xiaobei told China News Weekly. "In the final analysis, it is an algorithm, and precise expression cannot be solved by current algorithms. Videos like Sora, as a civilian It’s okay to use social media to communicate with everyone, but if you want to put it into a serious and formal artistic creation, there are too many errors.” After all, the industrial process cannot open blind boxes, it requires The style of every picture must be consistent, and every action must be smoothly connected. Especially on the big screen, every little loophole will be magnified several times.

In Zhang Xiaobei’s view, “It is very likely that within one or two years, video AI will only remain in a gimmick-like stage.”

  Li Dongdong still remembers that when Midjourney first came out, it was considered to be subversive to the design world. Later, she discovered that the people around her who used Midjourney the most were the creative directors of 4A advertising companies, and Midjourney could not be used to make the final product, but could only be used to Try generating some ideas as an aid to the creative phase.

  "Because AI has no aesthetics." Design aesthetics blogger and designer Awen told China News Weekly.

He started using AI in design last year and also used AI to make videos. In the designer industry, they call using AI to generate pictures or videos "drawing cards" because it is too difficult to know what kind of things AI will give. No idea.

Its main advantage is that based on the style, theme and elements given by designers, AI can provide creative inspiration and reference, helping them explore different ideas.

  Awen briefly described the process of making videos with his partners using applications such as Runway at the beginning of this year. It was far from as simple as "one-click generation": first, find music, determine the video rhythm based on the music, and estimate the required shots and editing points.

Then divergently search for ready-made still frame references to determine the style.

Once you have collected enough reference pictures, you can build a timeline based on the scene.

The fourth step is to use Midjourney to draw still frame cards according to the prompt words of the reference picture that you are satisfied with the style and your specific needs.

The fifth step is to use the still images generated by Midjourney to draw video cards. Awen is usually not limited to one platform. In addition to the mainstream Runway, he also often uses Morph Studio and Stable Video.

Finally, edit according to the found music and do post-production.

  Awen is not worried at all about publishing the specific process of how to use AI. Everyone can use the same process as him to make videos. The difference lies in the level of editing, scene combination, soundtrack, post-production, etc.

To put it simply, it is aesthetics and creativity, and these two points are not possessed by any AI so far.

  If every aspect of an artist's production of a short video is inseparable from professional aesthetics and expression, then that is not to mention a movie that is at least 180 minutes long.

Zhang Xiaobei gave a simple example: "Autofocus technology has been very developed for a long time. Almost all civilian cameras have autofocus as standard. However, in movie shooting, the lens focusing of movie cameras is still done manually. , why is this? Because where the focus is placed, as well as the process of zooming, the speed and the way of zooming, are all part of artistic creation, including aesthetics, and are all expressions of the creator."

  Turing Award winner and Meta chief AI scientist Yang Likun believes that generating a large number of realistic videos based on prompt words does not indicate that such an AI system understands the physical world.

What's more, in this physical world, people's intricate and unpredictable emotions may be the parts of literary and artistic works that move us the most.

In "Ponyo on the Cliff", the little Zongsuke took several steps back but still caught Ponyo running towards him like a stalwart man; in "Interstellar", Cooper said goodbye to his daughter but still subconsciously lifted the lid. Opening the blanket where my daughter once hid; the romance of Iron Man snapping his fingers in "Avengers 4"... Without these, everything would be just a transition, just an empty scene.

  Human aesthetics, creativity, and emotion are still things that calculations cannot solve.

"Strive to be a survivor"

  All generative artificial intelligence, including Sora, will remain in the stage of efficiency tools and auxiliary creative tools for the foreseeable period of time - this is almost the collective of geek creators who have or plan to start using new technologies. The answer given.

Although it will not completely overturn the film and television entertainment industry, it does not mean that it will not have a profound impact.

  "Some professional aerial photography teams may lose their jobs." Li Dongdong said that among the videos released by Sora, the long-range aerial photography is the most realistic and has almost no bugs. "The transition of Sora's aerial photography also has technical content, especially for platforms that sell video materials. The kind you spend money to acquire.”

Aerial photography is a relatively difficult shot to handle in film and television dramas involving crimes and crime themes.

Yu Gang explained to "China News Weekly": "These film and television dramas usually take place in a fictional city, because in reality no city is willing to be the place where such a story takes place. Since it is fictional, it cannot be used in the camera. It's obvious to people where it is. If Sora can generate aerial footage of a fictional city, we won't need to find an aerial drone pilot."

  The improvement in efficiency is significant. Take the large number of props used in science fiction and fantasy works. “In the past, many artists spent a lot of time designing, but now AI can produce pictures very quickly. One hundred pictures can be produced at once. , the director selects the one with the right style, and the designer can directly make slight modifications on the drawing and then use the modified drawing to make a 3D model or put it into production." Yu Gang said that although the process has not changed, the manual labor has been greatly reduced and the efficiency has improved. If it is improved, the production cost will naturally come down.

  Communication costs are also greatly reduced.

A film and television drama, especially a science fiction film and television drama, is always full of complicated and tedious communication from the beginning of the project, because the imagination in the mind is always difficult to express in words. Li Dongdong feels that AI will be much more convenient, "What kind of effect do you want?" , now you can generate images at almost zero cost. With Sora in the future, when you go to meet with investors, you don’t even need to make a slideshow, you just go with the video. It doesn’t matter if there are some bugs, what you see is the visual effect.” 10th place. The work that can only be done by a designer may soon only require one.

  As a partner, Awen's company's main business is slide presentation design. As one of the first groups of designers in China to learn and skillfully use AI, he feels that not only has the company's original business not been affected so far, but it has also been affected by AI's impact on work. Speeding up, working less overtime, and expanding business lines.

In April last year, he said in a social media push: "AI just saved my life again, and smoothly solved an outrageous modification request that would have required me to stay up all night." During the Spring Festival this year, he and his friend Haixin produced An AI animation video, which transformed a pas de deux performance into a ceramic artwork, was featured on the 2024 CCTV Spring Festival Gala and became the animated background in the song "Her Pillow in the Light".

  Awen feels that no matter which industry, groups that embrace new technologies will not be replaced.

Zhang Xiaobei also holds a similar view: "The film industry has experienced five or six major technological changes from its birth to the present. Every time, some people fall behind, but some people survive. What we have to do is strive to become survivors." Industry During the Revolution, British workers smashed Jenny textile machines angrily, but when textile machines gave birth to large factories, the development of technology created more job demands and new occupations.

  Although it is still not possible to truly discover Sora's true and false identity, it does not prevent us from making preparations to "jump on board" as soon as possible.

Recently, Yu Linfeng has held a team meeting to discuss and predict what business niches may be created for long-term development after the birth of Sora. They already have some ideas and plans.

  OpenAI's technical report said, "Sora can deeply understand the physical world in motion and can be called a true 'world model'."

  "In the high-tech field, people always tend to overestimate short-term breakthroughs and ignore long-term changes." Zhang Xiaobei said, "This is the case with AI technology. We will not see it in a few months or a year or two. Great changes in the world. But from a long-term perspective, great changes in the world have already begun.”

  No one can predict the end of the great changes. Perhaps after several iterations, Sora will finally give birth to its own aesthetics, creativity and emotions.

If that day really comes, perhaps human beings will always be asleep like in "The Matrix", or it may be like what is described in a science fiction novel, getting rid of all worldly affairs and only focusing on pursuing the ascension of philosophy, art and the spiritual world. Or, that future is simply unimaginable with today’s human understanding.

In this case, we probably only need to embrace the birth of new technologies and changes in the industry, so why be anxious?

  "China News Weekly" Issue 10, 2024

Statement: The use of articles from China News Weekly must be authorized in writing.