China News Weekly reporter Wang Shihan

  Published in the 1132nd issue of "China News Weekly" magazine on March 18, 2024

  The night Sora was born, the AI ​​community suffered from insomnia.

Some people think this is a "dimensionality reduction attack", while others are extremely happy.

  "We are very happy. In the past year, we have always believed that video generation is a big thing and thought it would have a huge impact on the entire world. However, not many people believed in us. We spent a lot of time educating the market." Aishi Technology founder and CEO Wang Changhu told China News Weekly.

  It has been nearly a month since OpenAI released Sora. How are Chinese players currently performing?

According to incomplete statistics, at least 15 major Chinese manufacturers and startups have successively launched AI video tools, many of which have been released to the global market.

  Relevant papers involving Sora's core technology have been published as early as 2022.

Before the emergence of Sora, the investment environment in the field of video generation had been cold for a long time; after the emergence of Sora, entrepreneurs will also face more brutal market competition while verifying their technical beliefs.

At the same time, the debate over large models continues in the AI ​​venture capital circle, with excitement and caution always intertwined.

  But obviously, the financing period for AI video has arrived.

On March 11, Aishi Technology announced the completion of the A1 round of financing worth RMB 100 million, led by Dachen Caizhi; at the same time, the domestic version of Wensheng’s video product “Aishi Video Big Model” was launched for internal testing.

  "As a cutting-edge explorer, we have to try many technical routes. The emergence of Sora has reduced our trial and error costs." Wang Changhu said.

  In the eyes of the interviewees, when the law of scale becomes an open secret in the industry, the competition for Vincent Video’s large models has also become an “open game.” In the future, whether it can continue to have sufficient resources will become an important issue at the poker table. chips.

The next biggest unknown left for Chinese companies is: Among China’s major manufacturers and innovative companies, who can escape China Sora?

Who is most promising to become the leading company on this track?

Entrepreneurs’ technological beliefs have been verified

  In the eyes of many entrepreneurs, after ChatGPT, Sora once again opened a door for mankind to lead to AGI (artificial general intelligence).

  "AI is essentially creating another life and civilization, and humans are on the eve of creating it. Sora means that humans may have just found its eyes and tried to put it together." Entrepreneur Max explained to China News Weekly, AGI with self-awareness requires multi-modality. At present, compared with the reading and writing capabilities of language models, the development of AI in "eyes (images and videos)" and "ears (audio)" is relatively lagging behind.

  In the early morning of February 16th, Beijing time, OpenAI released the Wensheng video model Sora. The sample shows Sora's transcendent achievements in basic video generation, multi-modal generation, video editing, world simulation and other capabilities. It also verifies the spatiotemporal coding ( spacetime patches) and the DiT model (combination of Diffusion and Transformer models) technical routes.

  Sora, which brings "dimensionality reduction impact", has set a technical benchmark in the field of Vincent video large models. While attracting market attention, it will also accelerate the technological development and product progress of the video generation track.

For the entrepreneurial team, it means that the technical beliefs that they have worked hard to adhere to have been verified.

  "Previously, the rapid growth in the number of users had already made us feel that our judgment was correct. Sora added another fire, which further verified our initial belief." Wang Changhu sighed.

  On the same night that Sora was released, Google also released Gemini, a large multi-modal AI model, which can achieve 1 million tokens (word elements) contextual capabilities.

"I was very excited at the time. The two pieces of news came out together, indicating that the law of scale is still in effect. We don't know what the big model will do next year, but as long as it continues to grow, it will be good for investors. We have The project's capabilities will be stronger and it can do more things," recalled Chen Shi, investment partner of Fengrui Capital.

  The law of scale belongs to OpenAI’s “aesthetics of violence”.

From language models to multi-modal models, OpenAI believes in the power of scale and believes that following the rules of "big models, big data, and big computing power", model performance will continue to improve, and intelligent "emergence" will occur based on specific algorithms. Ability.

  In the past, in the field of AI, the threshold and difficulty of starting a business were higher than in other tracks: countless entrepreneurs at home and abroad have been relentlessly stacking technology and inspiration in this field, looking for the intersection of technical ideals and business possibilities, and they may introduce it to the world at any time. Suddenly a bomb was dropped, once again detonating a subversive technological revolution.

  Now, these "bombs" are coming more intensively and violently.

  "One of the outstanding features of this round of AI market is that 'there are no secrets in the world.'" Chen Shi said that from GPT-3, GPT-3.5 to GPT-4 and Sora, shortly after the product was released, its principles and algorithms quickly Whether it is roughly guessed or proactively disclosed, it is ultimately attributed to the law of scale, and this does not have strong barriers. The premise is that you have money to buy a GPU card, which also requires the accumulation of talents, data, and engineering.

  Previously, major manufacturers and startups at home and abroad have launched many large-scale AI video model products.

In terms of overseas teams, there are Runway’s Gen-1 and 2, Pika Labs’ Pika, Stability AI’s Stable Video Diffusion, Meta’s Emu Video, Li Feifei’s team and Google’s WALT, etc.

In terms of domestic startups, there are Aishi Technology’s PixVerse, MewXAI’s Yiying AI, Right Brain Technology’s Vega AI, etc.; domestic major companies include Alibaba, Baidu, Tencent, Byte, etc.

  It is foreseeable that in the next "big game", the competition among AI video models will become more and more "cruel."

"Everyone realizes that more resources are needed, and the price tag for staying on the poker table is getting higher and higher. Teams that failed to obtain financing in time in the previous stage may not have the opportunity to stay." Wang Changhu said.

  Chen Shi believes that currently, companies that develop large models need to rethink their positioning: if they continue to explore basic engines, they need to find a different entry point from Sora while following the technical route; if they turn to vertical models or commercial application scenarios, they need to build own technical barriers or business barriers.

"Do you want to invest in large models?"

  “When the team was established last year, the entire investment environment did not have much confidence in video generation. After we tried our best to convince investors, we found that it was not easy to convince investors. Very few investors recognized and invested in us. After Sora came out, we There are also regrets. If we had received more resources in the past year, maybe we would have made Sora ourselves." Wang Changhu said frankly.

  “Should we invest in large models?” Or this is the question that investors have been thinking about since 2023 or even 2022.

The division of investment opinions has not changed after the emergence of Sora.

Recently, two character interviews published by Tencent News "Perspective" have aroused widespread resonance in the AI ​​venture capital circle. The views of Yang Zhilin, founder and CEO of Dark Side of the Moon, and Zhu Xiaohu, managing partner of Jinshajiang Venture Capital, are respectively considered to be " Representatives of "technological believers" and "market believers".

  Several large model startups have announced financing information completed since the second quarter of last year.

In April 2023, HiDream.ai completed the seed round of financing in the future, and in December completed the angel round of financing led by iFlytek.

In May, Morph Studio completed a multi-million dollar seed round of financing, solely invested by BV Baidu Ventures.

In June, Right Brain Technology completed tens of millions of yuan in angel round financing, led by Lightspeed Photosynthetic, followed by Qiji Chuangtan and others.

  New rounds of financing have accelerated this year.

Aishi Technology completed tens of millions of yuan in angel round financing in August last year; on March 11 this year, it completed the A1 round of financing in the amount of 100 million yuan, led by Dachen Caizhi.

Shengshu Technology completed angel round and angel+ rounds of financing in June and August last year respectively; on March 12 this year, it completed a new round of financing of several hundred million yuan, led by Qiming Venture Partners, Datai Capital, Zhipu AI , BV Baidu Venture Capital and others will continue to invest.

  “When this AI wave emerged a year ago, we believed that AI video generation would become a new huge opportunity and concluded that it would have a disruptive impact on all related industries of content production and consumption. China will also emerge with the ability to It is a foundational AI industry enterprise." said Zheng Xuangle, founder and CEO of Light Source Capital.

The agency participated in the early incubation of Aishi Technology.

  "When the investment community doesn't think this is a big deal or can be done in the short term, we have something we believe in and believe in. If we had been influenced by different voices a year ago, we wouldn't have been able to make PixVerse, and we wouldn't have been able to get where we are now." Wang Changhu believes that "entrepreneurship requires a long-term perspective. Entrepreneurs must do things that are not consensus but correct."

  But investors have their own perspective on issues.

Since the investment direction of each fund is affected by factors such as funding sources, project exit and income requirements, and actual conditions, investors have their own positions: only a few invest heavily in large models, and most investors are relatively cautious about model-level projects. , more inclined to look for application-side projects that have been found or seen landing scenarios.

"Companies that make general-purpose large models may find it difficult to find a reasonable path for commercialization. Even OpenAI itself has not solved this problem," said entrepreneur Max.

  To this end, startups are looking for their own path.

Taking Aishi Technology as an example, the company plans to take two steps in the future: in the first stage, it will provide high-quality video generation services to creators to better understand the creators' motivations, and at the same time, it will directly face users and accept user feedback for iteration; The second stage is for consumers. On the basis of providing tools, it will open up the entire process of creation and consumption and provide AI-native consumable content.

  In fact, the fundamental difference between the underlying general large model of the Sora class and the vertical class model or application of the application layer is that the former is "0~1" and needs to cross the early high threshold of technological progress and focus on breakthroughs in core technologies; The others are "1~10", which are to find specific applications on the basis of general large models, and pay more attention to the recognition and construction of business scenarios.

  "Last year we looked at a lot of Wensheng Tu, Wen Sheng Video, and Wen Sheng 3D projects, but we didn't invest in general large models. Multi-modal input and output does not necessarily seem to be something that startups can do, because this is a large model company. The only way to go." Chen Shi, an investment partner at Fengrui Capital, recalled.

  Zhou Xinhua, a partner at Morning Trail Investment, believes that large-scale models have fierce competition, low user stickiness, low monopoly potential and too high costs. It is still necessary to reinvent the wheel, and it may be iterated as soon as it is built.

"This is not the first time that giants have thrown out bombs and start-ups have been wiped out overnight, which often has a fatal blow to entrepreneurial projects and investors." She believes that the emergence of Sora makes Pika seem useless; and Google's Gemini 1.5, Meta Yang Likun’s V-JEPA architecture, and Stability AI’s Stable Diffusion 3 are also potential Sora encirclement and suppression forces.

  The high risk of iteration after "reinventing the wheel" is a common concern among investors about the model side.

"Perhaps the biggest tragedy is that the closed-source model we created is no better than other people's open-source model." Chen Shi added.

  "To reach the end of the bottom investment, you really need huge funds, talents and resources. Resources include computing power, data and scenarios, so major companies in Silicon Valley are investing in large model companies. In the current domestic capital environment, Big Internet companies or market-oriented VC funds may not necessarily have the confidence to be a money-splitting boy, and it is difficult to carry out large-scale model investments to the end when the business model is unclear and the probability of investment success is low." Zhou Xinhua said.

  From the perspective of the overall environment, capital has been cautious for a long time.

CVSource investment data shows that as of February 2024, the total investment scale of China's VC/PE market reached US$6.774 billion, a year-on-year decrease of 28.83%; the number and scale of investment cases in the past three months have shown an overall downward trend.

  However, in the cold winter of venture capital, the scale of investment in the AI ​​field is still the most prominent among the sub-sectors, reaching as high as US$1.106 billion in February.

"The topic of Sora has attracted attention, and the team of Vincent Video is generally popular, but overall, the difficulty of financing other AI projects may still be similar to before." Entrepreneur Max said.

  For ordinary entrepreneurs, improving their hematopoietic ability and surviving are the current primary goals.

"First find some commercial certainty among the uncertainties, and then pursue the long-term value of the product. During periods of change, getting on board first is the most important thing." Fimmo, who is currently working on an AI video startup project, shared.

  The above-mentioned investors are more likely to be optimistic about application layer projects with clear business models and clear implementation scenarios.

Chen Shi said that the application projects invested by the team are closely integrated with business practices and have their own business depth.

  "However, many current application-layer projects mostly use AI to show off their skills, which may touch users' itch points, but not pain points." Zhou Xinhua pointed out that when it is not possible to achieve the goal in one step, there may be opportunities in some transitional states.

She summarized several major advantages of current application layer projects: First, use AI to embed workflows in business scenarios and use AI to participate in the process, making it easier to implement projects; second, make good use of large models in user interaction and small models in The advantages in privacy and know-how in vertical fields are vertical application projects that combine large and small models; thirdly, using the capabilities of AI to make overseas expansion more efficient and feasible projects.

  "Another advantage of the application layer project is that the large model at the head is still pursuing the stars and seas of AGI, and it will not spend too much time on customizing too many business scenarios for the time being." Chen Shi analyzed, "Therefore, application layer entrepreneurship Investors still need to find their own ecological niche, keep a 'safe distance' from the big language model, don't run into its inevitable path, and find the depth of their own technology or business."

"Low-key" layout of major manufacturers

  "This matter is quite expensive. Big manufacturers have raised the valuations of large model projects, and they can eventually find someone to pay for it. If we invest, who will pay for it is the biggest problem." said investor Li Tong.

  The main force in the model competition must be large companies with advantages in computing power, capital, data and manpower.

Industry insiders generally believe that bottom-level investment requires a huge amount of funds and resources, and major manufacturers have the strength and responsibility to pay attention to this competition. 

  Since last year, major Chinese manufacturers have been making frequent moves to develop video generation model business while promoting language models.

Especially at the end of the year and the beginning of the year, the pace accelerates significantly.

On January 17, 2024, Tencent AI Lab launched VideoCrafter2 to realize Wensheng videos and Tusheng videos.

On January 19, Baidu launched the video generation model UnivG, which supports combined input of text and images. It adopts different generation methods for high-degree of freedom and low-degree of freedom tasks. The project is led by Xiao Xinyan, chief architect of Baidu Wenxin Yige. Lead.

Alibaba Tongyi Lab has developed an open source video generation model and code series VGen.

In November 2023, Alibaba announced in a paper the open source Tusheng video model I2VGen-XL; in December, it launched the Wensheng video open source large model ModelScopeT2V, and the model and code are completely open source.

  ByteDance is also keeping pace.

In January this year, Bytedance released the Wensheng video model MagicVideo-V2. Around February 20, Bytedance launched the video model product Boximator in a low-key manner, which can generate the movements of characters or objects in the video through text control, but Byte soon The response stated that "Boximator is not yet available as a complete product, and there is still a long way to go compared to leading foreign video generation models."

  On February 7, Douyin Group CEO Zhang Nan resigned, saying that in the future he would focus on the editing business and bet on generative AI.

On February 23, Byte launched the AI ​​video generation function with an independent homepage in the overseas version of Capcut, which was briefly released and then offline.

Recently, the video generation function of Dreamina, an AI creation platform owned by Jiuying, has also opened invitations for internal testing.

  In addition, products launched by listed companies include Wondershare’s “Sky Curtain” large model, Meitu’s MiracleVision large smart model, etc.

  When the law of scale becomes the consensus of the industry, the importance of major manufacturers, as well as the resource investment and stacking capabilities in computing power, models, and data, have become important factors in measuring the possibility of success.

  Based on this, some AI entrepreneurs are optimistic about ByteDance.

According to the Financial Associated Press, as of September last year, Byte has established more than 10,000 Nvidia Ampere architecture GPU clusters and is currently building a Hopper architecture cluster.

"Byte is one of the few companies in China that has the advantage of 'Wanka Cluster' computing power resources." Entrepreneur Max believes that Byte's video data volume is in a leading position in the world, and it is also a relatively young manufacturer. He is optimistic about its Sensitivity to strategic layout.

"With the blessing of Clip and its overseas version, Byte has an advantageous position in short videos and personal productivity tools. At least it will not be the worst among major domestic manufacturers." Entrepreneur Fimmo added.

  Unlike in the Internet era, which found the ecological niche and created the brilliance of phenomenal applications, Byte is still in the position of a follower in the era of large models.

At the end of January, ByteDance CEO Liang Rubo mentioned the "sense of crisis" many times in his speech at the all-member meeting, "ByteDance's current business has a very large inertia. Even if the team does not make extra efforts, the company can still rely on inertia to glide for a long time." for a long time, but it's dangerous," he said.

  Chen Shi believes that making judgments easily is a bit "taken for granted".

Although every major manufacturer currently attaches great importance to it and has different "branding" strategies, after all, it is a process of "copying" at first and may eventually converge. "The major differences between China's major manufacturers may not be big in nature. The main gap is And the limit is how many GPU cards you can buy."

According to his prediction, in late 2024 or early 2025, we may witness the reappearance of Sora by major manufacturers.

  But he also emphasized that large manufacturers “must follow and surpass.”

"Big manufacturers with determination and strength need to pay enough attention. The leading companies must first have closed source capabilities and cannot rely on open source. On this basis, they can build up the ecology. You can wait at this position and wait for the future. When the cost of computing power is getting lower and lower, it is more ideal to continue to challenge and move up." Chen Shi said.

  Regarding generative AI, the thinking of major mobile phone manufacturers is different from that of major Internet manufacturers.

"As a smart terminal manufacturer, we cannot take Internet application service manufacturers as our own direction, but must use AI to reconstruct the operating system. In the future, various large models can appear on mobile phones, and mobile phones provide computing power interfaces. Help more 'Sora' run efficiently." Honor CEO Zhao Ming told China News Weekly.

Who will win Sora in China?

  So, if Chinese major manufacturers and start-ups are trying their best to catch up to or surpass Sora, who can be the first to do so?

  "China's large-scale model is still in a follow-up state at this stage, and there are many participants. The future differentiation and progress are not easy to predict. However, it will still be a little more difficult for start-up companies. Everyone does not need to do large-scale models, but big manufacturers or national The will of central enterprises and the government still needs to be there," Chen Shi said.

  In Li Tong’s view, in addition to resource advantages in capital, computing power, data, etc., “big manufacturers have also invested in many large model and computing power companies, and they themselves serve all downstream applications, which is in line with their overall strategic layout. What you make is money from the entire industry chain."

  "In the future, in the algorithm layer, platform layer and computing power layer in the field of AIGC video generation, large manufacturers are suitable for the layout of the entire industry chain, start-ups are suitable for getting involved in a certain segment of the application layer or middle layer, and central state-owned enterprises are suitable for infrastructure. Start planning." Jiazi Guangnian Think Tank believes.

  So startups are not without opportunities.

Li Tong believes that “in terms of creativity, everyone competes on the same starting line.” Wang Changhu said, “Large companies have advantages in resources, data, and traffic, so startups must think about innovation and seek differentiation. , this is an important aspect for startups to seek success." Judging from the February data compiled by the AI ​​product list, PixVerse's user visits are already competing at the same level as the leading domestic AI language models and application tools.

"This is due to the first-mover advantage brought by our judgment and cognition a year ago. When everyone was expanding the language model, they differentially chose the video model and accumulated technology in advance." 

  "The relationship between start-ups and big manufacturers is by no means an either-or relationship." Wang Changhu believes that the cooperation between OpenAI and Microsoft is a typical representative of "win-win". Start-ups and big companies should make a win-win cooperation. Differentiated development.

“Not only can we see this possibility in the existing market, but when all users can play video generation, there will be a huge incremental market.”

  Looking at the world, what is the future geometry of China's large-scale video generation model?

Chen Shi proposed the idea of ​​"model following + application ecology".

He believes that the current opportunities for Chinese companies in the field of AI mainly lie in the application layer. "China is an outstanding student in applications in the digital economy era. Many applications are ranked first in the world in terms of practicality, development capabilities, and ease of use." In his view, , vigorously develop AI applications in the future, and ultimately use the advantages in the application ecology to reverse technological breakthroughs, which is one of China's AI technology solutions.

  Being in the game, in Wang Changhu's view, compared to the field of language and image generation, China's large-scale video generation model is no different from overseas in terms of globalization.

On the one hand, teams born in the early days of video generation have seized the opportunities of globalization; on the other hand, Chinese companies have created world-class video applications in the UGC era, which means that Chinese teams will have richer product experience and Scene recognition, and these advantages can feed back into the development of video generation technology itself.

  Currently, Sora has not been tested for the public, and it is still unknown whether the actual user experience can be consistent with the officially released video effects.

Whether the AI ​​video large model has reached the GPT-2 or GPT-3 moment still has to wait for the world to witness the official launch of Sora.

  “We can boldly imagine that when video generation technology matures and second-level real-time video generation, video editing, and video interaction become a reality, the production model of all video creators and the consumption model of video content will undergo earth-shaking changes. ." Wang Changhu predicted.

  (Li Tong, Max and Fimmo are pseudonyms in the article)

  "China News Weekly" Issue 10, 2024

Statement: The use of articles from China News Weekly must be authorized in writing.