Since the chat robot ChatGPT launched by OpenAI last November, while people are amazed at its information search and text generation capabilities, doubts about the intellectual property rights behind it have also followed.

  Internationally renowned linguist Noam Chomsky has previously publicly stated that ChatGPT is a high-tech plagiarism system that discovers patterns from massive amounts of data, and connects the data together according to the rules to form articles and content that resemble human writing.

Regarding this statement, some opponents also pointed out that human learning is also a process of inheriting, analyzing and regularizing from existing knowledge. The real threat is to make human beings stop thinking.

  Are works created by artificial intelligence suspected of plagiarism?

Is the content it generates protected by copyright?

Are there any legal risks?

The popularity of ChatGPT has pushed these issues into people's vision.

Is there any risk of infringement in AIGC?

  As a natural language processing system, ChatGPT is trained through a large text corpus, and then answers questions or generates text based on what it has learned. Its ability to learn largely relies on massive data.

  OpenAI's paper titled "Language Models Are Few-Shot Learners" published in May 2020 shows that the company mainly uses CommonCrawl, WebText, Wikipedia and book corpora for training.

  Is there any risk of copyright infringement by using the data?

He Baohong, director of the Cloud Computing and Big Data Research Institute of the China Academy of Information and Communications Technology, believes that the unrestricted use of ChatGPT may lead to intellectual property disputes.

"The developers of ChatGPT did not disclose the operation mechanism of generating synthesis and the source of training data. During the process of user-guided question and answer, ChatGPT's answer lacked reference to the source, so it is possible that when the user uses the generated content without indicating the source cause plagiarism."

  Fang Chaoqiang, a lawyer at Beijing Yingke (Hangzhou) Law Firm, said that the process of AI training will inevitably involve the copying and use of other people's copyrighted works, and there is a certain risk of copyright infringement.

"Of course, this risk can also be avoided, such as using public works resources that do not enjoy copyright, or authorized literary works."

  He also pointed out that if the written works produced by AI are substantially similar to existing written works, then its creation or subsequent use of AI written works will have the risk of infringing on existing written works.

  According to public reports, foreign news media have accused OpenAI of using their articles to train ChatGPT without paying any fees.

  Previously, Wall Street Journal reporter Francesco Marconi publicly stated online that he asked ChatGPT for a list of news sources used to train it, and the responses he received included Reuters, The New York Times, The Guardian , BBC News and other 20 media, but it is not clear whether OpenAI has reached an agreement with the publishers listed.

  Jason Conti, General Counsel of Dow Jones, one of the world's largest media groups, News Corp., also stated in a statement to the media that anyone who wants to use the works of "Wall Street Journal" reporters to train artificial intelligence Everyone should have obtained proper authorization from Dow Jones, but "Dow Jones has not reached a related agreement with OpenAI."

He said Dow Jones was reviewing the situation and would take misuse of journalists' work seriously.

  According to Xiao Sa, a partner of Beijing Dacheng Law Firm, this actually involves the issue of whether "text data mining" requires corresponding intellectual property authorization.

ChatGPT needs to mine and train the data in the corpus, and copy the content in the corpus to its own database. The corresponding behavior is usually called "text data mining" in the field of natural language processing.

"On the premise that the corresponding text data may constitute a work, whether text data mining violates the 'right of reproduction' is still controversial." Xiao Sa said.

  She pointed out that in the field of comparative law, both Japan and the European Union have expanded the scope of fair use in their copyright legislation, adding "text data mining" in AI as a new fair use situation.

"At present, my country's Copyright Law still maintains the closed regulation of the fair use system. Only the thirteen situations stipulated in Article 24 of the Copyright Law can be identified as fair use. "Text data mining" is included within the scope of reasonable application, and text data mining still requires corresponding intellectual property authorization in my country."

Can AIGC be Copyrighted?

  Public reports show that ChatGPT can write codes, write poems, and even complete the writing of short stories through certain prompts, which makes people wonder: Does AIGC belong to creation?

Can artificial intelligence replace human authors?

  Tian Taoyuan, an expert in artificial intelligence research, told reporters from China Youth Daily and China Youth Daily that at present, ChatGPT is essentially arranging and combining entries, and will give answers that are closest to human preferences according to the needs of human expression.

"When its strength reaches the level of 'words', it will make people feel as if it is creating, but in fact it cannot jump out of the generalization scope of the training text library, that is, it cannot create new knowledge that humans do not know."

  Fang Chaoqiang also pointed out that AI creation is essentially a derivative of human creation; what needs to be clarified is that humans designed and trained the program, so that AI software can output some works that meet human requirements relatively intelligently.

In the final analysis, it is essentially people who dominate.

  In fact, when asked "Is the content you generate a work", ChatGPT also admitted, "I can generate text according to the input prompts, but these generated texts are not considered works because they do not contain creativity and originality. Elements such as sex, artistry, etc., are just generated based on the input prompts of the pre-trained model. Therefore, the content I generate is more like a tool or auxiliary tool that can help people automatically generate some text, but they are not considered creative works or original work."

  Previously, ChatGPT has been listed as an author in several academic papers.

According to the "Nature" website, at least 4 published and preprinted papers use ChatGPT as the "co-author" of the paper.

  In response to this situation, a number of academic journals have issued statements a few days ago, completely prohibiting or strictly restricting the use of artificial intelligence robots such as ChatGPT to write academic papers.

"Science" magazine stated that it does not accept submitted papers generated using ChatGPT, and does not allow ChatGPT as a co-author of the paper.

"Cell" and "Lancet" stated that the authors of the papers cannot use artificial intelligence tools to replace themselves to complete key tasks, and the authors must also explain in detail how they use these artificial intelligence tools in the paper.

  Zhao Zhan, deputy director of Beijing Yunjia Law Firm and special researcher of the Intellectual Property Research Center of China University of Political Science and Law, said in an interview with reporters from China Youth Daily and China Youth Daily that according to the current copyright laws of most countries, the use of artificial intelligence software to generate The content is not a work in the legal sense, the artificial intelligence software itself cannot be regarded as the author in the legal sense, and the user is not the copyright owner.

  "But from a commercial point of view, AI smart companies have spent a lot of money and technical capital to create highly intelligent AI programs, and the 'works' derived from this program are not protected at all, which is also against fairness." Fang Chaoqiang pointed out that domestic judicial In practice, copyright protection is given to the original and artistic "fonts" generated by font library software, which is also a manifestation of the protection of AI works.

He believes that how to adjust the existing intellectual property theory and legal system to provide a reasonable and necessary rights protection system for AI texts or other AI works, including AI art works, music works, etc., is an urgent proposition at present.

  "I personally think that AIGC works that are original and have a certain artistic height should be protected by intellectual property rights, and vice versa; the corresponding intellectual property rights should belong to the AI ​​company; as for the protection of this type of AIGC works , use rules, and whether it is necessary to reduce or restrict human creations remains to be further explored.” Fang Chaoqiang also suggested that “limited protection should be carried out for AIGC content. If no restrictions are imposed, AI companies may appear in the future. The situation of 'creative hegemony' affects or even inhibits the creative enthusiasm of human creators, and in the long run, it will affect human intellectual property creation."

  But judging from the current performance of ChatGPT, the views reflected in its "creation" are usually not clear and lack originality. Although the expression method may not be substantially similar to other people's works, it often draws on one or more subjects. way of expression.

At present, it can only replace human work in some fields and aspects.

But it also brings certain disadvantages, which is easy to bring thinking inertia to some groups, and may inhibit innovation to a certain extent.

  (At the request of the interviewee, Tian Taoyuan is a pseudonym)

  China Youth Daily China Youth Daily reporter Li Ruoyi reporter Wang Lin trainee reporter Jia Jiye Source: China Youth Daily