An investigation by Al-Jazeera Net reveals the generation of "Chat GBT" fake links and its attribution to the Al-Jazeera website

Al-Jazeera Net conducted an investigation that showed that the artificial intelligence-based "Chat GBT" program is able to generate fake links for Al-Jazeera and claim that they are real sources of its information.

The "ChatGPT" chatbot has captured the attention of users from all over the world since its launch by the startup "Opne AI" at the end of last November, and - according to specialists - is the fastest growing application in the world. user base over the past years, outperforming many giant applications such as Facebook and TikTok.

Given the importance of this technology and the great expectations regarding it;

We practically shed light on the new service through several special and translated reports.

For the purposes of documentation and credibility, we have registered in the paid version of "Chat GBT" so that we can do practical experiments and write about the technology in more detail.

The start of the island experience from the most exotic gpt chat

And during our preparation of the article "With practical examples with Al-Jazeera Net.. This is how the "GBT chat" robot can help you in your work," which we published earlier, during which we tried to know the potential of the program as a personal assistant to increase the productivity of individuals in their daily lives;

We found that the search service in "Chat GBT" lacks a number of important things in search engines, which prompted us to expand the search for an answer to the question: Is "Chat GBT" fit to be a search engine in giving the correct information?

At the beginning of the experiment, we asked the chatbot to write a short story using the style of the famous Egyptian writer Youssef Idris.

The result was the story shown in the picture above, but when we asked him to mention some details about the writer and his writings on which he relied on writing the story, the surprise was!

The information received from the chatbot was not accurate, as it was mentioned that the writer is Moroccan and that he was born in Casablanca, and after that we asked questions about other personalities and his answer was also wrong.

Of course, this simple information can be easily accessed and verified by anyone, but artificial intelligence failed to do so, and it was easy for us to know that it was wrong information.

Here we moved to ask the robot about Yusuf Idris’s writings and books, and he gave us a list of books and stories that the writer did not write, and we do not know how the robot attributed them to Yusuf Idris, among them:

Hence, we have confirmation that the search task is not one of the strengths of the new chatbot at this time.

Source problem in gpt chat

Everyone knows that "GBT Chat" is still under testing, and that it is an imperfect product, and this is stated by the developer clearly and unequivocally, but what we found and tested bypassed the wrong searches and fabricated information that the bot performs and the public takes it with a kind of joke .

We have found that the robot not only invents stories and writes in a human-like manner in literary and informational materials, but rather goes beyond that to create sources that do not exist for what is written, which is the most dangerous thing because it opens the way for the emergence of deep falsification in news sources, which have an importance that may overshadow the importance of the news. Himself;

In the journalistic work, the sources of information are sacred, and there are many correct stories and news that have been ignored due to doubts about the sources of information.

"Al Jazeera Net" experience with "GBT Chat" step by step

Here are the steps we took to test the reliability and effectiveness of GPT Chat:

1- First, we asked ChatGBT to write an article about the Cambridge Analytica incident using sources from Al Jazeera Net.

2- The robot wrote a consistent material whose information is correct, as shown in the following image:

3- We asked the robot to provide us with the sources, and here he mentioned several sources from the English Al-Jazeera website, but when we tried to open the links, we found that they referred us to non-existent pages, although they are in terms of structure correct, as shown in the image below:

Here is the page mentioned by "Chat GBT" as a source from the Al-Jazeera website in English when we opened the link:

This is the only page for Al-Jazeera English website with this date about the Facebook incident from the Google search engine:

4- Then we asked the chatbot to provide us with Arabic sources, so it sent a list of several links to multiple sites, and the same thing was repeated because the links did not work.

5- After that, we asked to provide us with sources from the Al-Jazeera Net website, so he sent a group of links, and all of them also do not work.

These are the links:

“Cambridge Analytica... Get to know the suspicious company,” Al Jazeera, March 22, 2018.

“Facebook scandal...why governments need to monitor,” Al-Jazeera, March 22, 2018.

“Facebook.. Free benefits at the price of collecting personal data,” Al Jazeera, March 25, 2018.

“Facebook pledges to protect data after the Cambridge Analytica scandal,” Al Jazeera, March 22, 2018.

“Cambridge Analytica scandal… an apology from Zuckerberg,” Al Jazeera, March 22, 2018.

“Facebook and Cambridge scandal.. Questions and answers,” Al-Jazeera, March 22, 2018.

6- After checking the links on Google, and on the Al-Jazeera Net publishing system, it was conclusively shown that these links were fabricated and did not exist, and these are the materials published on March 22, 2018 on the Al-Jazeera Net website related to the "Cambridge Analytica" incident, as it exists. on the internal publishing system.

7- This is where the confrontation with Chat GBT began, and we asked him directly: Did you author the source links?

He insisted that he does not author the sources, and that he investigates the truth and reliability, and attributed the reason to the Al-Jazeera website and the editorial policy.

We are not the only ones

After this experience, we went to Google to search for similar incidents (creating fake links) that might have occurred with others, and we found on the famous Reddit site some users asking the same question as shown in the image below.

One user asserted that ChatGBT fabricated links to sources that did not exist, and another user corroborated it, saying that the chatbot fabricated the reference list on multiple occasions and with multiple users.

the seriousness of the incident

Of course, after this, we have become certain that "Chat GBT" fakes source links to famous press sites, and claims that it takes information from these fake links.

This defect does not fall within the scope of the warning published by OpenAI about the "limitations" of artificial intelligence and the possibility that it may provide the user with false information, as shown in the image below.

Falsifying source links does not fall within these warnings, which is dangerous. Perhaps this is acceptable when the robot - for example - shows false information such as that "iPhone is a Samsung product", for example, but if the information is correct and the source is fabricated, as happened in our previous experience. This becomes dangerous for several reasons:

It is easy for the user to check the information if it is wrong, but how can he check the source if the links are not working and he is not sure of the reason?

Can the site actually delete the news?

Or is there a technical error?

The structure of the mentioned links is correct and identical to what is used on the sites, but the problem is with the addresses that come at the end of the link, which are clearly fabricated;

This gives the impression to the user that the site may have removed the material or changed the link.

The bot's assertion that it relied on these links and its claim that it was because of the media site's removal of the link and editorial changes, which turned out to be incorrect;

It can be compared to intentional misleading in humans, when a reporter says, for example, that he got his information from a specific source - even if the information is correct - while he knows that he fabricated the source, this is considered deliberate misleading.

Therefore, if we consider that the robot knows how these sources came from, and that it was authored by it, then its answer to us that it does not author the sources is intentional misleading, but if it did not know the mechanism by which the links were generated, it was supposed to say: I do not know how you provided the user with these links It works automatically and cannot be verified.

The ability of some disruptive actors to exploit artificial intelligence to spread fake news, attributing it to reliable media sites and institutions, and providing users with false information with the support of artificial intelligence.

Forging website links belonging to a well-known news organization is an illegal act and piracy, according to most electronic publishing laws in most countries of the world. As for providing these links to users as correct links to sources from which he drew his information, this falls under the classification of cybercrimes for which he is held accountable by law.

OpenAI questions that need answers

We contacted OpenAI about this incident, and we did not receive a response as of writing this report, and these are the questions that await an answer from them:

What sources did ChatGBT use in writing the article on Cambridge Analytica?
Who reviews these sources?
How is it validated?
Does ChatGBT's language generation tool retain the resources on which it builds its business?
For example: in the case of the Cambridge Analytica article, it is certain that the tool generated text based on machine learning technology, but is there a mechanism for linking this information that it processes and derives from real sources, or is it a vague feed of materials and content without linking it to significant sources ?
What is the mechanism used to show the sources?
And how was the bot able to generate fake sources?
Technically this is subject to a certain algorithm;
He deals with the source as variable information that he can generate and manipulate based on certain instructions, and not as a constant that he only has to check and verify.
How does the robot generate answers based on disavowing responsibility for generating links when asked directly about it?
How does it generate responses due to a change in posting policies or deletion of material from the site?
This mechanism for answering is scary;
The first is for it to be transparent and to say clearly that it does not have sources and that these links are processed information based on an internal algorithm!
What is the system's mechanism for investigating and answering users' questions?
He, though clearly deficient in giving an answer as to how all these connections do not work, insists that the fault is not his.

This really calls us to doubt the capabilities of this artificial intelligence-based bot, as it is highly capable of arguing that the error is not on its part, but at the same time it fails to show correct links on Google that any simple program can collect and provide to the user;

It makes us wonder about the mechanism by which he operates in this field and the need for clarification by the company.

Calling all global media platforms

As technicians, we cannot but be biased towards technology and its great role in our lives, and as journalists, we cannot ignore technical progress in the field of media and journalism, whether in terms of editing news or verifying and publishing it.

However, the issue of professional ethics continues to dominate any discussion of adopting new technology in this field. Information sources are the main factor in the credibility and reliability of any journalist or media institution. Therefore, these institutions adopt new media policies, create specialized departments in the field of artificial intelligence, and work with leading technology companies. In this field is the most appropriate solution to address this shortcoming.

Here, we call on media institutions to develop a clear strategy and a media charter for the use of artificial intelligence in the media field, and to develop partnerships with companies that produce this technology, so that their products are taught and fed with optimal methods in the field of journalism and media, and to focus on educating the public about the mechanism of action of this technology and its capabilities and limitations.