Column: Artificial Intelligence – The Fear of the Monster Behind the Mask

Highlights: The debate about artificial intelligence currently seems almost hysterical on the one hand, and remarkably simple on the other. Most of all, some are afraid of an evil AI that can lie. The Shogoth meme is, in a sense, the ultra-short version of the idea that this kind of training could be self-deception. What actually happens beneath its surface, what Lovecraftian horror lurks in the depths, remains unfathomable. If a hyper-intelligent super-AI wants to deceive us, it will.

The debate about artificial intelligence currently seems almost hysterical on the one hand, and remarkably simple on the other. Most of all, some are afraid of an evil AI that can lie.

Zoom Image

Tentacle monsters »Shoggoth« from the literary fantasy of H.P. Lovecraft

Photo: TetraspaceWest / twitter

"It's just more fun to think about how to make friendly AI than it is about industry regulation, just as it's more fun to think about what you'd do during zombie apocalysis than it's about how to stop global warming."
The writer Ted Chiang in an essay worth reading (2017)

The keyword alignment problem, for which there is no really catchy German translation ("agreement problem" is an approximation), refers to the following basic idea: A sufficiently powerful learning system, usually called artificial intelligence or AI, could possibly develop its own goals. Or misinterpret or overinterpret the goals that people set for him. With catastrophic consequences.

One of the spiritual fathers of today's AI fear, Oxford-based philosopher Nick Bostrom, came up with the paperclip thought experiment to illustrate this: If you were to instruct a powerful AI to make as many paper clips as possible, such a misunderstanding could lead to AI turning the entire planet into a paperclip factory, inadvertently wiping out humanity in the process.

The monster behind the mask

The problem of alignment has often been addressed in artistic form, in Arthur C. Clarke's »2001«, for example, and in his famous film adaptation by Stanley Kubrick. In fact, Kubrick's simple red light in the chrome circle, the "face" of the "HAL" on-board computer that gets out of hand, is now a kind of visual cipher for the alignment problem. Sometimes ironic, sometimes serious.

Some of today's AI fears have added another layer of paranoia: they are horrified by the idea of an AI that has very different goals than humanity or its own creators, but deliberately obscures this fact. There is also a visual cipher for this on the net: the so-called Shogoth meme. In its original variant, it is a crude drawing of a tentacle monster. A "shoggoth" is a terrible, god-like, disrespectful monster made of slime, tentacles and many eyes from the literary imagination of H.P. Lovecraft.

The tentacle monster from the Shogoth meme has put on a smiley face mask over one of its tentacles, the face with which it interacts with humans. Humanity, the meme could be translated, will very easily be deceived by such a monster. What goes on beyond the mask, we will never understand. If a hyper-intelligent super-AI wants to deceive us, it will.

There are now Shoggoth shopping bags and AI models called "Shoggy".

»Please don't ever do that again«

In the original version, the monster with the smiley mask is overwritten with "GPT-3 + RLHF". GPT-3 is the previous version of the GPT-4 language model, which is now under the hood of OpenAI's paid ChatGPT version. The letters RLHF stand for "Reinforcement Learning from Human Feedback". This method is intended to ensure that AI systems do not produce too nasty outputs: human testers set tasks for the system, and if the results are in any way ethically or otherwise unacceptable, there is appropriate feedback: "Please don't do anything like that again."

The Shogoth meme is, in a sense, the ultra-short version of the idea that this kind of training could be self-deception: what actually happens beneath its surface, what Lovecraftian horror lurks in the depths, remains unfathomable. Maybe the machines are just pretending to be as nice as we would like them to be.

"I don't know" is missing as an answer

In fact, the large language models of the present with their billions or – in the case of GPT-4, the number is still secret today – possibly already trillions of parameters, have long been inscrutable even to their creators. They are gigantic accumulations of statistical dependencies, and what exactly they will do with a certain input is only known when you try it out.

What is known, however, is that so-called emergent phenomena occur as the model size grows: GPT-4 can do things that GPT-3 has not yet been able to do, although both are based on the same basic principles and were presumably fed with at least similar training material. For example, working on legal and medical examination tasks with great success.

What both still can't do, by the way, is calculate: ChatGPT fails again and again at the task "How much is 3486 times 2734?" with both the GPT-3.5 model and the much more powerful GPT-4 under the hood – but, which is one of the central problems of today's AI language models, still gives wrong answers. Instead of saying "I don't know".

What if the AI comes up with ideas?

This circumstance – language models cannot calculate and, in case of doubt, simply tell something, whether wrong or right – points to a fundamental misunderstanding that is currently shaping the debate about AI: The fact that language models seem to provide intelligent answers does not mean that they "know" anything. A group of experts in AI ethics once called today's language models "stochastic parrots".

OpenAI has now solved some of the resulting restrictions with so-called plugins: If you pay for GPT-4, you can also let ChatGPT access certain Internet services. If you install the Wolfram Alpha math platform as a plugin, ChatGPT can provide the correct answer to the arithmetic problem at the first attempt. For some, however, this access to external services is already the first step towards the fall of man: What if the AI does something with it that we don't want at all?

Many well-known AI experts have signed this sentence: "It should be prioritized globally to reduce the risk of extinction by AI – on a par with other risks for society as a whole, such as pandemics and nuclear war." This seems a bit exaggerated at the moment (and quite hypocritical for the signatories who work for the largest AI companies). On the other hand, the alignment problem is not purely hypothetical: how dangerous the artificial intelligences of the future of humanity will become depends above all on how much power we give them.

AI on the red button?

The U.S. Congress is currently debating a bill that would amount to a blanket ban on firing nuclear weapons from machines without human supervision. This is already stated in various official documents on the nuclear weapons strategy of the USA. It's probably still a good idea to give this simple precaution legal status – even if the idea is reminiscent of the 1983 film "War Games" starring Matthew Broderick. In it, an AI almost triggers a nuclear war because a teenage hacker mistakes its user interface for a computer game.

»War Games« is one of a long series of fictional depictions of the problem of alignment. Some people, especially a certain category of Silicon Valley men, have been obsessed with it for many years. A particularly striking example of the obsession is an idea published in the blog "LessWrong" in 2010, which has gone down in the history of AI paranoia as "Rokos Basilisk".

Eternal torture in a virtual hell

In short, the idea goes like this: If a super-intelligent AI takes over the world in the future, it could take revenge in the most cruel way possible on all people who did not do everything they could to create it as early as possible. Don't write anything bad about AI on the Internet!

The founder of LessWrong, Eliezer Yudkowsky, is probably the most prominent AI apocalyptic of our time (he was already a topic in this column a few weeks ago). At the time, he had a public outburst of anger and even temporarily deleted the blog post of the user »Roko«. Rokos Basilisk is a "really dangerous thought". Yudkowsky is one of those people who are terribly afraid of the extinction of humanity by their own creation, but Rokos Basilisk is even worse: in this story, the all-powerful AI of the future punishes all insubordinate not with extinction, but with eternal torture in a virtual hell.

Yes, that sounds crazy, and badly thought from the human abyss. But people who were fascinated by AI when it was hardly useful often love science fiction.

No, no autonomous drone has yet murdered its user

In fact, like any new technology that is developing at breakneck speed, learning machines pose very real dangers. This week, a story made the rounds that a simulated autonomous weapon system attacked its own supervisor because he wanted to abort a mission. But the story was probably just a hoax. That doesn't change the fact that truly autonomous weapons are an extremely bad idea. International agreements against autonomous swarms of threats (which have already been deployed!) would undoubtedly make sense, as would more far-reaching AI arms agreements.

In fact, we have very pressing problems at the moment, which, strangely enough, do not even appear in the one-sentence incendiary letter. The climate crisis continues to make significant parts of the planet uninhabitable for humans. And the extinction of species is not only continuing unchecked, it is accelerating. We already have real technologies (especially those that burn things) that really threaten our existence.

Display

Christian Stöcker

The Great Acceleration

Publisher: Pantheon

Number of Pages: 384

Publisher: Pantheon

Number of Pages: 384

Buy for €16.00

Price inquiry time

04.06.2023 10.36 a.m.

No guarantee

Order from Amazon

Order from Thalia

Order from Yourbook

Product reviews are purely editorial and independent. Via the so-called affiliate links above, we usually receive a commission from the merchant when making a purchase. More information here

AI will certainly bring us a lot of problems. A flood of disinformation with real-world implications, for example. A current example: Fake images of an alleged explosion at the US Pentagon recently caused two US stock market indices to temporarily collapse.

We have already experienced an algorithm crash

AI makes mistakes, can be abused in a variety of ways, and it can have catastrophic effects if left on levers that enable real-world influence. By the way, we have already experienced this with autonomous systems that were not yet called "AI" at the time: The so-called flash crash of 2010 on the New York Stock Exchange was probably triggered by a trader with fraudulent software. This caused autonomous trading algorithms to cause the Dow Jones to crash by 1000 points within minutes. The alleged perpetrator was sentenced to one year of house arrest in 2020. Autonomous high-speed trading systems still exist, of course.

Giving autonomous systems access to real-world power is always a high risk. And autonomous systems unleashed on social systems, whether that's ChatGPT or YouTube's or Meta's sorting algorithms, run the risk of unpredictable collateral damage.

The learning machines of the present really pose enough problems for fast and robust regulation. The fear of the AI apocalypse or AI basilisks is not necessary for this.