Any technology advanced enough is indistinguishable from magic.

(Arthur C Clarke, science fiction writer)

With the end of each year, we often look back a little and start evaluating the most important technical innovations that emerged throughout that year, and it is often difficult to choose among a group of competing technologies of equal importance, but the situation is different this year 2022, because over the course of In the past twelve months, one category of technologies has made headlines more than once, and most importantly, it was widely available for use by society and ordinary people. That category was undoubtedly artificial intelligence and its various applications, especially Generative AI.

And even if the term is not familiar to you, you have probably seen one of the images or paintings that flooded social media during the year produced by systems based on this type of artificial intelligence.

You may have heard of the popular systems like DALL-E 2, Stable Diffusion, Midjourney, and most recently ChatGPT.

The main advantage of these systems is that they are easy to use, and for the first time they produce amazing pictures, art, and answers. Some even consider them as a combination of software and magic, as Arthur C. Clarke once said.

(Social Media)

The year of artificial intelligence

The beginning was last April, when the "DALL-E 2" text-to-image conversion system appeared, and at that time an image of an astronaut riding a horse spread on social media, which is a modern version of the "Dali" system that it launched. Artificial intelligence research organization OpenAI for public experiment.

“A photo of an astronaut riding a horse” #dalle pic.twitter.com/4UDwErtEbZ

— OpenAI (@OpenAI) April 6, 2022

By the end of last September, the organization announced that users of the "Dali" system had reached more than 1.5 million active users, working to create more than two million images per day, whether they were artists, creative directors or architects, with more than 100 thousand users. They share the resulting images and their notes with the Dali team (1).

Before that, last July, the “Midjourney” system appeared for public experience, which is similar to what the “Dali” system offers in converting words into pictures, and within a short time the number of users of the system reached two million users, and now the number has reached more than From 6.6 million users.

(Social Media)

Then, in August, StableDiffusion, an open-source model that AI startup Stability AI released to the public for free, debuted.

By October, the company's "DreamStudio" web application had more than 1.5 million users, and the number of users of all system services reached more than 10 million daily users (2).

Then the rivalry began to heat up. In September, Meta, the company that owns Facebook and Instagram, unveiled an artificial intelligence system for creating short video clips from written texts (3).

Then, a month later, Google entered the line and announced the new video system "Imagen" to generate short video clips also from written words (4).

At the end of November, OpenAI announced the launch of its new chatbot, ChatGPT, which is a new chatbot that relies on artificial intelligence to conduct conversations with humans. Natural and the person does not know that he is talking to a robot, and it is considered one of the “big language models” that learn by identifying billions of distinct patterns in the way people associate words, numbers and symbols so that they can create texts and generate responses on their own (5).

Only one week after its launch, the number of users of the robot reached one million users, and to realize how huge this number is, let's compare it with giant companies in its infancy, for example, the Facebook platform needed 10 full months to reach the number of one million users, while the Instagram application needed two and a half months, and it needed Netflix for three and a half years.

Time it took to reach 1 million users:

Netflix – 3.5 years


Facebook – 10 months


Spotify – 5 months


Instagram – 2.5 months


ChatGPT – 5 days

— Kate (@whoiskatrin) December 7, 2022

All this rapid development in a short period of time, and all those huge numbers in use brought with it very different reactions, between supporters and opponents, among those who believe that these new tools have artistic talent and will eliminate many creative jobs, such as artists, writers, and others, and those who believe that they are Mere tools that imitate what man offers and may enhance his ability, but they cannot replace him.

(Social Media)

But that brings us back to magic, for now it looks like you have a little magic chest.

Of course, this is great if you just want to continue the process of creating images, but not if you need a partner who is really creative, in the sense that if you want to create artistic stories and build new worlds, then your partner will need to have more awareness of what he is actually creating, but that is The main problem: these prototypes still have no idea what they're really doing.

magic box

When using any generative AI system, all you have to do is write a short description of what you're thinking, then wait a few seconds to get the result.

In "Medjerney", for example, these words can contain the style of a certain artist, in which you tell the artificial intelligence to imitate him, or other requirements that you would like to see in the image, when we tried the system we wrote to him: "Artificial intelligence controls the world", and this was is the score!

This is what you see in front of you, but inside that magic box something different and more complicated happens.

The AI ​​models that convert text into images consist of two main parts. One is a neural network trained to associate an image with written text describing that image, and another network trained to create images from scratch.

The basic idea here is that the second neural network generates an image that the first neural network accepts as identical to the text entered by the user.

The distinctive achievement behind these new models is in the way those images are created. The first version of the "Dali" system, in 2020, used the technology that the "GPT-3" text generator works with, as it produced images by predicting the next pixel in the image as if He was predicting the following words in a written sentence. It worked, but it was not good or impressive, and the pictures were not of the required quality.

(Shutterstock)

The second version, DALI-2, uses what is known as the "diffusion model", which is simply neural networks trained to refine images by removing noise from the pixels added by the first training process.

This process involves collecting images and changing a few pixels in them at a time, through several steps, until the original images are erased and you are left with only random pixels (6).

The neural network then trains to reverse that process and predict what the clearer version of the image will look like.

The end result is that if you give the scattering model a mess of pixels, it will try to give you a cleaner picture.

Put that clean image back into the model, and it will produce a cleaner and cleaner image.

Repeat enough times and you will get a high quality image(7).

The advantage with text-to-image models is that the process we described relies on the GPT-3 text generator trying to match the user's words to the images produced by the propagation model, which in turn pushes the propagation model towards images that the text generator considers to be compatible with the text.

But these models do not link text and images on their own, or thanks to their ability to know exactly what those images or texts mean, because they are trained on a huge data set known as "LAION", which aims to make machine learning models and large-scale big data sets available to the public People, which contain billions of text and images from the Internet (8).

So what the generative model produces are new images similar to the billions of images already on the Internet, so perhaps machine learning will only produce images that simulate what the model learned in the past.

real uses

(Shutterstock)

However, these first prototypes may be just the beginning, because generative AI could be used to produce designs for anything in the future, from building designs to new medicines.

For example, the AlphaFold artificial intelligence system from Google-owned DeepMind, which can predict the three-dimensional structure of proteins, which is key to knowing their function;

open the way for new types of research in molecular biology;

This helps researchers understand how diseases work, and how to make and produce new medicines to treat those diseases.

And last July, researchers used the "Alphafold" system to predict the structure of more than 220 million proteins coming from about a million different species, covering almost every known protein on the planet (9).

In November, Mita revealed the ESMFold model, which is a much faster protein structure prediction model that can predict the structure of about 600 million proteins from bacteria, viruses, and other microorganisms (10).

You can think of it as a kind of protein autocomplete, which uses a technique based on large language models like the "GPT-3" model.

Biologists and drug manufacturers are already taking advantage of this important publicly available resource, which has made searching for the structure of new proteins as easy as searching the Internet.

And along the line of drug development, there are hundreds of start-ups currently exploring new ways to use artificial intelligence to speed up drug discovery, even designing new, previously unknown types of drugs.

It is important to emphasize that the artificial intelligence systems that we see the magic of today are the result of decades of steady development in the field's research, and its applications, to the extent that we have been able to train neural networks and feed them with the huge amounts of data currently available.

It is true that it is not about magic or the usual exaggerations that artificial intelligence has evolved to a terrifying degree and will take our place or our jobs, but it is certain that it will bring about a great change in societies, in the economy and in everything in our lives, just as other technologies that preceded it did, whether computers or the Internet. smartphones, or social networks.

________________________________________________

Resources:

  • 1) DALL E Now Available Without Waitlist

  • 2) Stability AI Raises Seed Round at $1 Billion Value

  • 3) Introducing Make-A-Video: An AI system that generates videos from text

  • 4) Imagen Video

  • 5) ChatGPT.. The artificial intelligence revolution emerges from the laboratory into public life

  • 6) How diffusion models work: the math from scratch

  • 7) How DALL-E 2 Actually Works

  • 8) LAYON

  • 9) 'The entire protein universe': AI predicts shape of nearly every known protein

  • 10) AlphaFold's new rival?

    Meta AI predicts shape of 600 million proteins