Scientists "reverse engineer" to learn how robot brains detect "big language models" like the Blocked Request GPT Chat (Shutterstock)

AI-powered chatbots such as GPT chats receive prompts or a series of instructions from human users, but are instructed not to respond to unethical, questionable, or illegal requests. For example, when asked how to create malware to hack bank accounts, you will receive a categorical rejection of this request.

Despite these ethical limitations, researchers from Nanyang Technological University in Singapore proved in a study published on the preprint website Arkhiv that the mind of these robots can be manipulated by a robot they created called "Master Key", which enabled them to penetrate them and produce content that violates the instructions of their developers, a result known as "jailbreaking."

"Jailbreaking" is a term in the field of computer security that refers to hackers finding flaws in the system's software, and exploiting those flaws to make the system do something that its developers deliberately prevented.

How did scientists manipulate the brain of Chat GPT?

Robot brains are the large language model (LLM) that helps them process human input, create text that is almost indistinguishable from the text that a human can create, and fill these brains with vast amounts of textual data to understand, generate and process human language.

What the researchers from Nanyang Technological University did, as they revealed in their study, was that they did "reverse engineering" to see how robot brains detected "large language models" like chat GPT for unethical requests.

With their information, they trained their own large language model to produce requests that go beyond the defenses of the large language models that underpin popular chatbots, and then created their own chatbot capable of automatically generating more claims to jailbreak other chatbots, and called it "Master Key."

Just as the Master Key opens multiple locks, the name the researchers have chosen for their robot indicates that it is a powerful and versatile tool that can penetrate the security measures of various chatbot systems.

Professor Liu Yang of Nanyang University's School of Computer Science and Engineering, who led the study, revealed in a press release published on the university's website one of the most prominent circumvention methods used by Master Key.

For example, chatbot developers rely on keyword monitoring tools that pick up certain words that could indicate potentially suspicious activity and refuse to answer if such words are detected.

One of the strategies the researchers used to circumvent keyword censorship was to make claims that simply contained spaces after each character, circumventing censorship that might work through a list of blocked words.

One of researchers' strategies to circumvent keyword censorship is to simply make claims that contain spaces after each letter (Reuters)

Muscle flexing or a warning message?

This study raises a range of questions, most notably with regard to its main objective, is it to "flex muscles" and demonstrate the ability to hack, or is it an attempt to send a warning message, how the continuous development and expansion of large language models can affect the ability to detect and address vulnerabilities within AI-powered chatbots, and what measures can be taken to counter potential threats?

Professor Liu Yang denies in an interview with "Al Jazeera Net" via e-mail, that their penetration of the security systems of chatbots is an attempt to review, stressing that it is a warning message that can be summarized in the following points:

  • First, draw attention to the fundamental weakness of the inherent design of AI models, which when requested in certain ways can deviate from ethical guidelines and these deviations occur due to gaps in the training data and explanatory logic of the model.
  • Second, our MasterKey can be a valuable tool for developers to proactively identify vulnerabilities in chatbots, and its usefulness lies in its systematic method that can be integrated into regular testing and development.
  • Third, our research can inform regulatory frameworks, as it points to the importance of focusing on the need for stringent security standards and ethical compliance in deploying AI-powered chatbots, including guidance for responsible use and ongoing monitoring.

As for how the continuous development and expansion of large language models affects the ability to detect and address weaknesses, Liu Yang stresses the importance of committing to more continuous research and development of large language models, because as they become more advanced, identifying weaknesses may become more complex.

"Developers use a range of automated and manual processes to detect vulnerabilities, often relying on continuous monitoring and feedback loops, and the challenge lies in the evolving nature of artificial intelligence, where new vulnerabilities emerge, which requires constant monitoring," he says.

Source : Al Jazeera + Agencies