[Explanation] The AI ​​anchor extracts the voice, expression, lip shape and other characteristics of the live broadcast, combined with a series of intelligent comprehensive simulations, synchronized with the live broadcast to be released, to achieve the same broadcasting effect as the live broadcast.

On December 29th, reporters walked into iFLYTEK. The AI ​​anchor they independently developed was not only lifelike, but also proficient in broadcasting in multiple languages.

  [Concurrent] AI virtual anchor Xiaoqing

  Hello everyone, I am Xiaoqing, the AI ​​virtual anchor of iFLYTEK. I can broadcast in multiple languages ​​and dialects. Now I will broadcast to you in Cantonese; now in English for you; now in Russian; now in Japanese Broadcast for you; now broadcast for you in Korean; now broadcast for you in French.

I wish you all good health and success in your work.

  [Explanation] It is understood that the original AI virtual anchor is a single-dimensional voice broadcast. If you want to achieve smooth speech, rich languages, beautiful sounds, and natural and vivid body movements and expressions, you need to change the content you want to broadcast. Synthesize speech through personalized multi-language synthesis technology.

  [Concurrent] Gao Jingwen, head of the virtual host’s R&D team

  At present, our virtual anchor has supported broadcasts in more than 30 languages.

It is realized by using several technologies. First of all, it needs to collect some material of people. We need about half an hour of data to model this virtual person. Then we will use it in the future. , Only need to input text, it can produce a video output.

  [Explanation] It's simple to say, but the training process is extremely difficult.

After customizing part of the anchor’s audio and video synchronization video data, separate the voice and video of the data, use the voice data to train the personalized language synthesis model, and extract video parameters such as face recognition and expression capture; In the training of the modal synthesis model, many core technologies such as deep learning, machine translation and multilingual synthesis are involved.

  [Concurrent] Gao Jingwen, head of the virtual host’s R&D team

  Then, through some combinations of virtual humans and speech synthesis, it has become a multi-modal virtual human structure. Here we have to show the realization of speech and some emotional expressions. The emotions include The performance of voice, as well as some lips, facial expressions, and flexible display of some body movements. In this process, we need to overcome the artificial intelligence analysis of text, the analysis of emotions and some multi-dimensional aspects. Combine.

  [Explanation] Currently, the AI ​​virtual anchor, AI virtual customer service, and AI virtual teacher developed by the R&D team have been widely used in media, finance, customer service and other fields.

  [Concurrent] Gao Jingwen, head of the virtual host’s R&D team

  Then we also hope to use AI artificial intelligence technology and virtual human technology with our developers and partners to create more and richer applications to create more convenient services for our lives.

  Liu Honghe and Zhang Jun report from Hefei, Anhui

Editor in charge: [Lu Yan]