Is there anyone listening? Are deleted pictures really gone? Does my smartphone make me addicted? Which of these concerns are justified and which exaggerated, the ZEIT-ONLINE focus "Digital Fears" explores. This article is part of it.
When dough kneading or potato peeling turn the recipe or Skipping to the next Spotify song - who does that, quickly has a smudged recipe book or tablet. Fortunately, there are smart assistants that we just have to ask.
And they are becoming more popular. About every third German - industry surveys came to different numbers - uses devices with language assistants. They are not only in tablets and smartphones, but also in cars, washing machines and even glasses. Especially popular: smart speakers like Amazon Echo, Google Home or the Apple HomePod.
But the convenience has its price. In the case of language assistants, these are private data that users disclose - they need to, so they can use a voice-based device. Every question that someone asks his cellphone, speaker or car, every prompt, every command is stored on company computers. And while you need to press at least one more button on Android and newer iOS devices to enable the voice feature, it's less obvious for smart speakers whether they're listening or not.
Attention, it is flashing - recording!
Usually, they acoustically scan their surroundings and wait for the activation word to drop - a name or phrase that you can choose yourself. Or you leave it at the factory setting: Amazon speakers by default on "Alexa", Apple on "Siri" and the Google Assistant on "Okay, Google" or "Hey, Google". Once the systems recognize the word, they connect to the cloud. That is, the voice file is forwarded to a corporate data center. The question of the weather or a joke then does not remain in your own living room, but ends up on servers of the companies.
What happens then, Amazon describes in a white paper so (it should work in other language assistant similar): The software for speech recognition converts what someone says into text. Each term is rated with a confidence score - it indicates how confident the system is of having understood the command. If this score is high enough, the text is processed using natural-language understanding (NLU). The software interprets what has been said and derives therefrom the supposed intentions of the person who has spoken. If someone has asked, for example, what the weather is like in Berlin today, the NLU breaks it down into "weather", "Berlin" and "today". Based on this, it is looking for a source that can answer this question: a database with current weather data, for example. Often, the systems also rely on external sources such as so-called skills , Sprachapps, which are mostly developed by third-party providers. Sensitive data is sometimes sent: If the user orders a pizza by voice, Alexa transmits his address to the external provider. In the final step, a text-to-speech system translates this information into a voice file - and the speaker speaks in the living room. All this happens within fractions of a second.
Critics call devices such as Amazon Echo or Google Home because of their constant attention as "eavesdropping" or even as a "spy in the living room". The accusation: They were listening 24 hours a day.
"In the debate about smart speakers is often said that they were always listening - that's not true," says Stephan Noller. The graduate psychologist and entrepreneur has taken the networked speakers apart to understand the architecture and operation of the devices, and rebuilt its own prototype. The internal recording memory of the devices is constantly overwritten so that nothing is stored on the device itself in the long term. "The streaming of audio into the cloud does not begin until the device is addressed and responsive."