Lip-reading software is far more accurate than humans

  "Move your mouth", AI will know what you are talking about

  It is conceivable that due to the traction of huge potential demand in public welfare, public security, national security and other fields, as well as the strong impetus of the rapid development of AI technology, in the near future, AI lip reading is expected to achieve rapid promotion and deep popularization, and the industry prospect is very bright. Can be expected.

  ——Yan Huaizhi Associate Professor, School of Computer Science, Beijing Institute of Technology, Director of the Institute of Network and Security

  ◎Our reporter Zhai Dongdong

  Although the TV series "Hurricane" has come to an end, the popularity has not diminished in the slightest. Some netizens use the characters in the play to create entertainment videos, and some netizens edit the highlights.

However, there are still some "real" netizens who found that the dialogues, dubbing and mouth shapes of some characters in "Hurricane" do not match, so they wanted to use artificial intelligence to perform lip language recognition to restore the original plot of the script.

  However, AI lip reading is not only used to decipher "hidden plot".

According to statistics, there are more than 20.54 million people with hearing disabilities in my country. In addition to the main sign language communication, lip reading is also an important way for them to communicate.

However, manual interpretation of lip language is easily affected by factors such as personal experience, visual perception ability, and language comprehension ability, and the accuracy rate is not satisfactory. Therefore, people began to try to use AI technology to interpret lip language.

Learn more about lips than a lip expert

  "The so-called AI lip reading, that is, artificial intelligence lip recognition, its core technical framework is visual recognition and natural language processing." Yan Huaizhi, associate professor of the School of Computer Science, Beijing Institute of Technology and director of the Institute of Network and Security, introduced, specifically, Use machine vision technology to continuously recognize the face from the image, extract the continuous mouth shape change features, input it to the lip language recognition model, identify the pronunciation corresponding to the character's mouth shape, and then output the most likely expression sentence .

  "Visual recognition and natural language processing have a huge technical system and different technical routes, but in essence, they both use a large amount of lip language data to train AI models and strive for the accuracy of text output." Yan Huaizhi added.

  In recent years, AI giants have begun to make attempts on the lip language recognition track.

Deep Mind, a subsidiary of Google, has cooperated with the University of Oxford in the United Kingdom to develop an AI lip-reading software, which trains its lip-reading ability by letting the AI ​​lip-reading software "watch" thousands of hours of TV programs.

Interestingly, in the lip-reading test of 200 randomly selected video clips, the accuracy rate of AI lip-reading software reached 46.8%, while the accuracy rate of professionally trained human lip-reading experts was only 12.4%.

  Why can AI lip reading quietly rise?

Yan Huaizhi gave his own analysis: one is strong demand traction, and the other is huge technological promotion.

In terms of demand traction, lip language recognition can not only provide convenience for some disabled people, but also play a huge role in many fields such as public security; AI technology has achieved great success in the field of lip language recognition.

  Many problems need to be overcome

  However, Yan Huaizhi also said that my country's artificial intelligence lip language recognition technology is still in its infancy, and there is still a long way to go if we want to use artificial intelligence to accurately identify lip language.

  From the perspective of language itself, human language has a high complexity. Of all the phonetic symbols involved in human speech, only about 30% are directly controlled by human lips, and 70% are difficult to pass through the naked eye or even machine vision. Differentiated dental, lingual, and guttural sounds.

Moreover, factors such as tone of voice, dialect, conjunctions, accents, and even beard covering of different people will lead to subtle changes in the shape of the mouth, and it is precisely this subtle change that will seriously affect the recognition and judgment of artificial intelligence for lip language.

  From a technical point of view, the environment in which artificial intelligence collects lip language is usually relatively complex, and it is very difficult to accurately identify it.

As far as the current artificial intelligence technology is concerned, the recognition level of long sentences and complex sentence patterns is not satisfactory, not to mention that there are still problems such as multi-scene recognition and multi-person lip recognition.

  Yan Huaizhi said that only by solving the above problems can AI lip reading achieve a breakthrough and move towards a mature stage of development.

  There are thousands of differences between different human languages. Can AI understand the lip language of each language?

  According to Yan Huaizhi, most of the previous successful AI lip-reading systems were limited to English models, because most AI models were trained based on English data.

However, in terms of technical framework, the training models in different languages ​​are basically the same, or can be realized by relying on the same technical means.

  Of course, in order to adapt to lip language recognition in different languages, some adaptive adjustments are also required: on the one hand, the data of the corresponding language should be selected for targeted training; on the other hand, the AI ​​​​model needs to be adjusted, such as incorporating time masking , optimize language models, and improve hyperparameters.

  In addition, the same language also has different mouth shapes, even if the mouth shapes are similar, they may represent completely different meanings.

Therefore, a mature AI lip reading system requires a large amount of lip language feature sample data, and covers as many application scenarios and types of speaking groups as possible, so as to improve the generalization ability of the trained lip language recognition model. Improve the recognition accuracy of AI lip reading for different mouth shapes and different ideographic languages.

  A technological double-edged sword in urgent need of regulation

  Despite all the difficulties, more and more AI companies have begun to set foot in and plan to deeply cultivate the artificial intelligence lip language recognition track.

At present, the choices of major AI giants are not the same, which can be divided into lip language data, lip language video recognition, lip language understanding, etc.

  Yan Huaizhi also said that at present, initial breakthroughs have been achieved in many fields of artificial intelligence lip language recognition technology, the prospect of full-chain integration is promising, and industrial clusters are gradually forming.

  From the perspective of application scenarios, AI lip reading has begun to emerge in fields such as social welfare and public security.

Judging from the current layout of major giants and the development trend of related technologies, AI lip reading is expected to have broad application prospects in identification, national security, and smart systems.

"It is conceivable that due to the traction of huge potential demand in public welfare, public safety, national security and other fields, as well as the strong impetus of the rapid development of AI technology, in the near future, AI lip reading is expected to achieve rapid promotion and deep popularization. The industry prospect Very promising." Yan Huaizhi said.

  For example, in the field of security monitoring, many security monitoring scenes are noisy or only have video signals, which cannot accurately capture the sound, and artificial intelligence lip language recognition technology can come in handy; in the field of identity recognition, AI lip reading can be used to Realize mouth-shaped payment password input, and "move your lips" to realize identity recognition and payment transactions; in the field of public security, use AI lip reading to analyze the lip information of parties involved in various videos to assist case investigations ; In the field of smart systems, AI lip reading can be used to realize "silence is better than sound" - only relying on mouth shape to control smart devices, such as smart home appliances.

  Of course, technology application is a double-edged sword.

Many people worry that AI lip reading will reveal the private content of people's conversations, whether the parties are speaking publicly, whispering or talking to themselves.

"Open your mouth" and the chat content will be stolen by others, which is really scary when you think about it carefully.

  Yan Huaizhi said that this worry is not unfounded.

The privacy leakage caused by AI lip reading may be caused by someone maliciously acquiring and recognizing lip language, or it may be the normal use of the AI ​​lip reading system, but the storage and use of the system are improperly protected, resulting in the loss of related data. be stolen or misused, thereby causing damage to personal rights and interests.

Moreover, since the content of the conversation involving the parties has obvious directionality, the harm of this kind of privacy leakage may be more serious than ordinary personal information leakage.

  Therefore, Yan Huaizhi suggested that from the perspective of privacy and security protection, the formulation of relevant laws and regulations should be strengthened at the management level, the application scenarios, scope and purpose of AI lip reading should be strictly regulated and restricted, and the supervision and punishment of malicious use of technology should be increased.

In addition, it is necessary to strengthen the security protection system construction of the AI ​​​​lip-reading system at the technical level, improve the recognition accuracy of the system by technical means, avoid technical abuse, and effectively protect the content security of user conversations.

(Science and Technology Daily)