A year ago, the company OpenAI surprised the whole world with the launch of ChatGPT, a conversational artificial intelligence capable of giving answers almost indistinguishable from those given by a person and with an amazing ability to generate texts, computer code or summarize information.
Overnight it became an incredibly popular tool, with more than 180 million active users and that, in a way, has positioned itself as the benchmark of a new era within the technology industry, the epicenter of an earthquake that threatens to change society.
With Gemini, Google now has the possibility of competing head-to-head with this type of service, although the first thing to point out is that a direct comparison cannot be made between Gemini and ChatGPT.
Gemini is a language model and ChatGPT is a conversational application built on top of another language model, GPT-4 or GPT-3.5 depending on which version of ChatGPT is being considered (paid or free, respectively). In the case of Google, the equivalent of ChatGPT is Bard, which until now used the PaLM language model but as of today, for queries in English, already operates with an adapted version of Gemini.
You have to think of these language models as the "engine" of these applications, which are nothing more than an interface to be able to converse with them. Language models can be used in other types of applications that don't necessarily have to have this conversational interface, and both Google and OpenAI offer these models on a subscription basis to businesses and developers.
Gemini will have three different versions: Ultra, Pro, and Nano. The first is the most advanced and multimodal (you can understand issues presented with a mix of images, videos, text or voice) but will not be available until 2024. Google, in any case, has shown videos of how it works.
The second one is more limited but is the one that can already be tested in the English version of Bard. It is equivalent to GPT-3.5 in capacity and functions. Nano, finally, is a model designed for devices with less computing capacity and memory, such as a telephone.
The comparisons Google has made in the Gemini announcement are fundamentally between Gemini Ultra and GPT4. Since both are multimodal models, the most straightforward way to compare them is to use
Batteries of tests and exams with questions of logic, science, or reading or listening comprehension. In 30 of the 32 made, Gemini outperformed GPT-4.
Perhaps most remarkably, in one of them, known as the MMMU multimodal reasoning benchmark (a set of 11,500 university-level questions with more than 57 disciplines, such as physics or mathematics), Gemini managed to correctly answer nine out of 10 questions, 5% more than GPT-4 and also above the human average.
But outside of these test batteries, it's hard to make direct comparisons without being able to access the Ultra version of Gemini yet. Jeff Dean, chief scientist at Google DeepMind, one of the divisions involved in the development of Gemini, nevertheless gave some specific data.
Gemini can support a context of about 32,000 tokens in questions (although not a direct equivalence, this can be simplified as it can understand questions with a 32,000-word context). That's the same amount as GPT-4, but OpenAi recently announced a version of GOT4, GPT4 Turbo, that quadruples that capacity.
Both language models are built on the same technology – which, curiously, is mainly developed by Google, although it is in the open domain – but the results depend above all on the training they have undergone, which is the way in which the models learn to reason and articulate their answers and which basically consists of a complex statistical analysis of millions of texts. images and videos.
GPT-4, for example, is trained on a corpus of over 13 trillion tokens (again, a rough equivalence can be made between a token and a word, though it's not an exact comparison). These are documents, works, images, videos and messages obtained from various sources.
Google hasn't disclosed the size of the dataset used to train Gemini, but says it has used a novel approach focused on Gemini's multimodal capabilities that makes it much more effective at considering issues that mix images, for example, with text, such as a physics problem presented next to a diagram. In a few months it will be known if this new strategy is really an advantage over its direct rival.