Zoom Image

ChatGPT on mobile: There is now an official app for iPhones

Photo: Jessica Lichetzki / dpa

To emphasize it directly: Apparently these are only a few isolated cases. But a report from Hamburg is currently making headlines nationwide. According to the report, several examinees in the Hanseatic city are suspected of having secretly and illegally used chatbots such as ChatGPT during the written Abitur exams. As the broadcaster NDR 90.3 reported, in at least one case, a high school graduate is said to have even been caught by a teacher opening appropriate software on his or her mobile phone.

The other suspected cases reported to the school board's legal department are less clear, according to the report: They say teachers have become suspicious of correcting because parts of the exam were weak, while others were error-free. The schools then used software to check the probability of texts being created by artificial intelligence (AI). However, those who have not been caught red-handed could get away with it as a suspected perpetrator without consequences, as the testing software cannot clearly prove plagiarism either.

So what do the suspected cases from Hamburg reveal about schools and their use of chatbots like ChatGPT? Here are five questions and answers.

1. How can high school graduates access apps during exams at all?

Smartphones are taboo in the written Abitur exams, usually they have to be left at home or handed in at the beginning of the exams. However, students can try to smuggle in a device. ChatGPT can be accessed via the browser, and there has recently even been a dedicated app for iPhones.

However, relying on the help of the software is risky. Even the unauthorized use of a cell phone is an attempt at deception, the Hamburg school authority tells SPIEGEL, as is the secret use of artificial intelligence (AI). If pupils are caught doing this, either a repetition of one or more parts of the Abitur examination is ordered or one or more parts of the examination are graded with zero points. It could even happen that the Abitur examination is declared failed altogether.

2. How helpful would ChatGPT be in a high school exam anyway?

It is unknown in which subjects the alleged attempts at deception took place. In principle, however, ChatGPT can help with many school assignments: This ranges from retrieving factual knowledge to collecting pros and cons to clean formulation.

The Bayerischer Rundfunk (BR) has already tested twice in the form of an experiment with all kinds of human assistance how well or badly the software would perform in the Bavarian Abitur. The result: In February, ChatGPT's performance was still quite sobering. At that time, students would probably have preferred not to rely on his suggestions.

Now, however, in May, in a second test with a newer version of the chatbot, ChatGPT delivered a much better performance, although again humans, for example, split the current Abitur tasks into smaller bites. "The Abi performance of GPT4 is much better than that of GPT 3.5," it said as a conclusion on the BR website. "The former problem student has almost become a high school graduation nerd. And so it's probably only a matter of time before the AI writes an A-levels.«

3. How prepared are teachers for trickery with chatbots?

The hype around ChatGPT has been so great in recent months that most teachers may have heard of the software before. Some have already addressed or used them in class. The Hamburg school authority refers to a so-called technical letter from February, with which the State Institute for Teacher Education and School Development wanted to educate about the opportunities and risks of AI and ChatGPT. For example, the letter, which tends to be technology-friendly, stated: "Students already use ChatGPT for homework creation, presentation elaboration and summarization of texts, as well as to obtain information about a specific topic, to structure topics, etc. Some students also use ChatGPT to ask supposedly stupid questions that they don't dare to ask others.«

In another technical letter from last Wednesday, the Hamburg school authority is dedicated to the oral Abitur examinations. Regarding these, it is said that it is not possible to check whether AI is used when "processing the task for the presentation examination in the home environment": "In this respect, the possibilities of using artificial intelligence do not differ from the already existing possibilities of using the help of third parties or having the presentation and documentation made entirely by third parties."

Whether the performance of an examinee was performed independently and only with the approved and specified aids and sources, the members of the Abitur examination board would have to try to recognize in the technical discussion. This conversation usually lasts twice as long as the prepared presentation.

4. Are all exams checked for AI cheating?

No. Hamburg's schools are not obliged to inspect work without cause, according to the school authority. Thus, for example, there are no random samples. At the same time, it says: "If reasonable doubts arise about the independence of the performance during the examination or already after the documentation has been submitted, the schools are obliged to investigate these doubts." If there is a suspicion of plagiarism or the unauthorized use of an AI application, technical aids could be used, such as the so-called "AI Text Classifier" from OpenAI.

5. How reliable are programs that want to expose AI texts?

Virtually all programs of this type are currently considered to be unreliable. Their test results are at most suitable as an indication for the use of text generators. In February, an AI newsletter published by the magazine "MIT Technology Review" said that it was very unlikely that there would ever be a tool that recognizes AI-generated texts with one hundred percent certainty. An expert in machine learning was quoted in the article as saying that it is really hard to recognize AI text as such, since the point of AI language models is precisely to generate fluent and human-looking text. New language models are becoming more and more powerful and increasingly better at generating fluent speech – which is why the previous instruments for recognition are quickly becoming obsolete.

The extent to which the topic challenges even leading development teams is shown by the aforementioned AI Text Classifier. The program comes from OpenAI, i.e. the creators of ChatGPT. A professional software, one might think. But to this day, the tool even classifies some Bible texts, for example, as "probably AI-generated". The same is true of certain parts of the United States Declaration of Independence. Meanwhile, in an official experiment before the launch of the AI Text Classifier, it recognized an average of only one for every four AI texts submitted on a test basis.

OpenAI itself comments on the tool: "In the case of texts written by children and texts that are not written in English, it may well happen that the classifier is wrong, as it was primarily trained with English content written by adults." And yet another clue gives an idea of how difficult it is to expose secret AI support: OpenAI writes that AI-generated texts can easily be revised to trick the classifier. In the course of an Abitur exam, this note makes all the more sense. It may help many students to get ideas for content or formulations via ChatGPT. However, bringing these ideas 1:1 from the mobile phone to the handwritten one in an exam situation seems more time-consuming than with a homework assignment that is written on the computer.

For teachers, this leaves two main ways to track down attempts at deception. Either they catch students red-handed. Or, similar to the suspected cases in Hamburg, they notice abnormalities, for example in the writing style of an examinee. After all, time still plays into the hands of many teachers: their students have often known them for years, they know approximately how they work and write. However, most students have only recently started using programs like ChatGPT as a tool.