Enlarge image

Google data center with new, in-house TPU v5p AI chips: Initially only "for early experiments and feedback"

Photo: Google

Google unveiled its new artificial intelligence Gemini on Wednesday, with which the company wants to catch up in the AI race – or rather: start to overtake. Google first mentioned the new model at its developer conference in the spring, but the release dragged on. Even now, Gemini is far from finished. The bumpy start fits in with the fact that Google is lagging behind the competition from OpenAI and Microsoft in AI applications.

An example that is supposed to show how Gemini's skills could help in everyday life is at first glance aimed at overwhelmed parents who have just as little idea about physics and math homework as their children. The new artificial intelligence understands not only the printed questions from the textbook, but also the handwritten answers and what is wrong with them.

This may not sound like the new AI killer application par excellence to everyone, but Google wants to use the example to present an important advance in the field of artificial intelligence from the company's point of view. This can be described in one word: multimodality. Gemini is designed to be multimodal from the ground up, so according to the developers, it can handle text, images, audio and video content equally. And again and again the English word reasoning appears in Google's announcement: logical thinking or reasoning. Gemini is said to be particularly good at this. The homework help illustrates this chain of understanding text and images, checking and reasoning.

"This is a significant milestone in the development of AI and the beginning of a new era for us," Google says somewhat redundantly.

Not a single app, but a model

In one fell swoop, the beginning of this era does not become visible. Gemini is not a complete application like ChatGPT, but a model like GPT-4 from its competitor OpenAI. So, it will run in the background in various Google products, sooner or later.

The first generation Gemini 1.0 will come in three sizes: Nano, Pro and Ultra. The smallest is designed for efficiency and should even be able to run on mobile devices such as smartphones. More precisely: on special AI chips for these devices. The advantage of this design, as opposed to a cloud connection to a larger model, is that Gemini Nano does not require a connection to Google's servers. That's why it can also work with confidential chats in WhatsApp, for example, to suggest answers or correct grammatical errors. The Google Pixel 8 Pro will be the first smartphone to get Gemini Nano. The AI is likely to be rolled out on the phone with one of the next software updates, but Google did not initially give an exact date.

Europe still has to wait for Gemini

Gemini Pro, on the other hand, will be immediately usable – because it is in the chatbot Bard, Google's answer to ChatGP. It's the biggest update for Bard to date, Google said, but will initially only be available in English "in more than 170 countries and territories." Europe is not among them.

The Ultra version will run in Google's data centers and will be the most powerful model Google has to offer. It is the first model to trump human experts in the MMLU (massive multitask language understanding) test in its knowledge of mathematics, physics, history, law, medicine and ethics, among others, and the ability to solve problems in these areas.

Gemini Ultra is superior in almost all comparison tests carried out, Google said. This also applies to GPT-4, which is currently considered a state-of-the-art model, but was also released in March.

Microsoft is also upgrading

However, it will be some time before Gemini is used in other products of the company where it could be helpful. In Google Search, in the Chrome browser and also in Google's advertising services, it will still be months before this happens. The Ultra variant will initially only be available to select customers, developers, partners and security professionals "for early experimentation and feedback" before a wider range of users will be able to use the model "at the beginning of the year".

Before that, Google is still busy with protective measures. Internal and external experts, the company said, had already hacked Gemini extensively to identify potential security risks, including autonomous actions by the AI. Gemini should also avoid creating or accepting toxic, one-sided or factually incorrect content. Fine-tuning and human feedback are supposed to make the model more reliable and suitable for everyday use, as is also common with OpenAI.

Whether and for how long Google could achieve a lead over OpenAI with Gemini is therefore not to be answered for the time being. The launch of ChatGPT a little over a year ago is sometimes described as the "iPhone moment of AI" – Google, on the other hand, continues to stretch its AI moments over months. And as if by coincidence, Microsoft announced a major upgrade of its AI "Copilot" one day before Google's new presentation: "Soon" the new OpenAI model GPT-4 Turbo will be behind it and will thus be able to work more multimodally, among other things.