A video or a voice may not be shot or recorded by a real person. Maybe someone is stealing your face in the background of the mobile app, payment interface, and access gate that you don't know.

With the increasingly sophisticated artificial intelligence (AI) deep synthesis technology, synthetic audio, video and other fake content can be more and more fake.

There is no doubt that the real world we live in is facing the risks and challenges of technology abuse.

Stealing faces, tampering with voices, that's not a problem

  In the past two years, in Zhejiang, Anhui, Jiangsu and other places, a number of criminal suspects who have stolen personal information have been arrested by the public security department.

The criminal suspect's crime process is very similar: first, he illegally obtains photos of others or acquires other people's voices and other "materials" for a fee, and then uses artificial intelligence technology to "activate" the photos and synthesize dynamic videos, and then directly deceive people on social platforms and Alipay accounts. face verification mechanism to make illegal profits; or deceive the manual review process in the mobile phone card registration process, and then use the mobile phone number under the name of others to conduct telecommunication network fraud, online gambling, etc., so that the person whose information is collected suffers security threats and property. loss.

  How to "animate" a picture of a stranger into a video?

  Banyuetan reporter saw in front of the demo computer in the laboratory of the Institute of Artificial Intelligence of Tsinghua University that a still photo of a stranger's face just downloaded from WeChat Moments was imported into the computer. Characters can instantly "live", make corresponding fine movements and expression changes such as blinking, opening mouth, frowning, etc. according to the instructions, and generate smooth video in just ten seconds.

  "The technology that completes the driving operation from static to dynamic is called deep synthesis technology, which is a kind of artificial intelligence content synthesis technology." Xiao Zihao, an engineer at the Institute of Artificial Intelligence of Tsinghua University, said that deep synthesis technology has been derived including image synthesis, video synthesis , voice synthesis, text generation and other technologies.

  With the blessing of technology, stealing faces is no longer difficult.

These forged synthetic videos can assist criminals to pass the background audit and verification in the links that require dynamic facial recognition, such as mobile phone card registration, bank card application, and payment software login.

  The technical staff demonstrated the operation of sound synthesis to the reporter Banyuetan.

A few 60-second speeches of strangers can be generated through deep synthesis technology, such as "you don't need to punch in, just transfer money to me via WeChat", "you don't need to pick up your child today. I'm near the school, drop by to pick up your child" and other voices, The effect is like the sound of a real person.

This kind of sound synthesis is "very scary".

Deep Synthesis is Disrupting 'Seeing Is Believing'

  On domestic and foreign content platforms and social platforms, in-depth synthetic content presents “both quality and quality”.

Among them, the synthesized film and television drama clips and the face-changing videos of topical characters have been widely disseminated due to their strong entertainment.

  According to the "Top Ten Trends in Deep Synthesis (2022)" jointly released by the Artificial Intelligence Research Institute of Tsinghua University, Beijing Relais Smart Technology Co., Ltd., Tsinghua University ATM Research Center, National Industrial Information Security Development Research Center, and Beijing Big Data Center , From 2017 to 2021, the average annual growth rate of the number of deep synthetic videos on mainstream audio and video websites and social media platforms at home and abroad will exceed 77.8%.

The number of newly released deep synthetic videos in 2021 is 11 times that of 2017.

At the same time, the exposure, attention and dissemination power of deep synthetic content have also increased exponentially, and the number of likes for newly released deep synthetic videos in 2021 has exceeded 300 million.

  "The videos and voices circulating on the Internet are not necessarily shot or recorded by real people." Ren Kui, Dean of the School of Cyberspace Security at Zhejiang University, said that it is often difficult to distinguish with the human eye whether it is full face synthesis, audio synthesis, or real shooting and recording.

  Zhu Jun, professor of the Department of Computer Science of Tsinghua University and director of the Basic Theory Research Center of the Institute of Artificial Intelligence, believes that deep synthesis technology is changing the underlying logic and complexity of the information dissemination content trust chain, and the hidden dangers are rapidly increasing.

On the one hand, the definition of "seeing is believing" has changed.

Although the public already knows that static information such as photos can be easily tampered with, they still have a high degree of trust in dynamic information such as video and sound. Deep synthesis technology once again disintegrates the trust logic of "seeing is believing".

The second is the widespread dissemination of short videos, which makes the abuse of deep synthesis technology have a wide range of influence and destructive power.

  Xue Lan, Dean and Professor of Schwarzman College of Tsinghua University, believes that when artificial intelligence technologies such as deep synthesis are "abused", a series of ethical and governance issues will arise: from infringing on personal property safety, harming personal dignity and privacy, and seriously It threatens national security and affects social stability.

Guide technology for good, improve AI risk management system

  Technology is a double-edged sword.

Make good use of this double-edged sword, neither let technology become a wild horse, nor let technological innovation stand still.

  From the perspective of making good use of technology, Wu Hequan, an academician of the Chinese Academy of Engineering and an expert in information technology, proposed that new applications and new development of technology should not be prohibited and intervened "one size fits all", so as not to hinder their innovation.

Instead, security problems derived from technology should be solved from the source, and technological innovation and technological confrontation should be used to continuously improve and iteratively detect technology capabilities.

  Zhu Jun believes that the current detection technology for deep synthesis applications is still in the exploratory stage, and the means are not yet mature.

It is recommended to give full play to the strengths of scientific research institutes, technology enterprises, etc., to form an effective and efficient deep synthesis application technology detection capability as soon as possible, so as to strive for technological advantages in the public opinion war and information war.

  From the perspective of risk governance, Qiu Huijun, deputy chief engineer of the National Industrial Information Security Development Research Center, pointed out that the digital transformation in recent years has forced the implementation of artificial intelligence security risk governance in many countries.

The EU has taken the lead in launching legislation in the field of artificial intelligence, based on a risk analysis approach, focusing on clarifying the regulatory framework for high-risk artificial intelligence systems.

  "AI security includes data security, framework security, algorithm security, model security, operational security and other components. In this regard, we should build an integrated governance rule system of 'regulations + standards + laws', and issue guidelines and standards for risk governance , evaluation norms, and improve legislation when conditions are met." Qiu Huijun suggested that the focus should be on data, algorithms, models, and operation and maintenance. First, build quality specifications for data collection; second, classify artificial intelligence system risks according to application scenarios; The third is to establish a security responsibility system, clarifying the respective responsibilities of design and development units, operation and maintenance units, and data providers.

  Chen Jihong, a partner of Zhong Lun Law Firm, said that in combating the crime of "changing face" fraud, we should regulate the legal use of technology, safety assessment procedures for technology, and legal regulations on abuse of technology, so as to increase the illegal cost of technology abuse.

  Zhu Jun reminded that the public should form a correct understanding of new technologies and applications of deep synthesis, raise awareness of prevention of bad applications, protect personal voiceprints, photos and other information, and do not easily provide personal biometric information such as faces, fingerprints, and irises to the public. others.

  Source: "Ban Yue Tan Internal Edition" 2022 Issue No. 6 Original title: "Beware of AI Deep Synthesis Breakdown Risk Bottom Line"

  Banyuetan Reporter: Zhang Manzi and Zhang Chao