Zhang Tiankan

  Speaking of the hottest technology term on the Internet recently, it must be "ChatGPT".

ChatGPT (Chat Generative Pre-trained Transformer) is a chat robot program developed by an artificial intelligence research company in the United States. It can not only answer questions and answers with people, but also write well-written articles. Therefore, some people call it the strongest AI in history ( Artificial intelligence), and some people even think of the plot of artificial intelligence eventually replacing humans in science fiction films.

I remember that the last round of artificial intelligence incidents that set off a wave of public opinion was the 2016 alphaGo victory over the world's top Go player Lee Sedol 4:1.

Today we will talk about the impact of artificial intelligence represented by ChatGPT on the current social development, and the role of its existing achievements in the field of biological sciences.

  Generative AI with intensive training:

  Create new content based on user needs

  ChatGPT, which is now being talked about all over the world, is an artificial intelligence writing and chatting tool. Once launched in November last year, it quickly became popular on social media, and its monthly active users have exceeded 100 million.

ChatGPT can conduct dialogues and answer various questions by learning and understanding human language, and can also complete writing tasks such as video scripts, copywriting, papers, and codes as required.

Its success stems from the long-term accumulation of artificial intelligence technology represented by deep learning.

In terms of attributes, ChatGPT is actually a large-scale language model (LLM), which has been trained with a large amount of text data and can generate human-like answers and responses to a variety of question inputs. Therefore, it can also be called tolerant Builder.

  AI requires three elements: data, computing power, and algorithms.

Data is the raw material of knowledge, while computing power and algorithms provide "computational intelligence" to learn knowledge and achieve specific goals.

People have many classifications of AI, and AI can be simply divided into reactive AI (analytical AI) and generative AI based on the criteria of "what work can be done" and "what tasks can be completed".

  Reactive AI responds to different types of stimuli according to preprogrammed rules and cannot learn from new data because it does not use memory.

Reactive AI is the IBM Deep Blue supercomputer that defeated chess champion Garry Kasparov in 1997.

  Generative AI has obtained a large amount of data and information, and has undergone intensive training and deep learning, as well as a feedback error correction mechanism similar to neural networks, so it can complete a lot of work and produce many products.

Summarize its essence in one sentence: create new content according to the specific needs of users.

  From the full name of ChatGPT "Chat Generative Pre-trained Transformer (generated pre-trained converter)", it can be seen that it is an AI that can generate many contents by itself, including various texts, articles, conversations with people, translation, Write code, draw, make videos, and more.

  Due to the constraints of various factors, the content generated by ChatGPT also has many errors, especially the content about society, culture, humanities, philosophy, politics, economy and history.

However, in the field of natural sciences, due to the recognized laws and common cognition, such as "atoms are composed of positively charged atomic nuclei and negatively charged electrons outside the nucleus", the content generated by ChatGPT has a relatively low error rate.

  Because of this, although generative AI is useful in all fields, the application of generative AI like ChatGPT in the natural sciences is more popular.

Biomedical research, healthcare, and life sciences all need generative AI, and ChatGPT is just one of them.

  Accurately predict protein structures:

  Can speed up the development of new drugs and vaccines

  Currently, the use of generative AI in biomedicine is in the ascendant.

Generative AI can not only analyze thousands of proteins, but also generate new proteins, even proteins that have never appeared in nature.

  In the past, it took a lot of time and energy to understand and accurately determine the configuration of proteins, and the measurement may not be accurate, which hindered the development of drugs, vaccines and disease treatment.

If the results of generative AI are both accurate and fast, the protein structure of some virus mutations, such as the mutation of the S protein of the new coronavirus, can be known, thereby speeding up the development of new drugs and vaccines.

  In 2020, the Alpha Fold-2 developed by the British company Deep Thinking has made amazing achievements.

This generative AI shines in the 14th "Key Evaluation of Protein Structure Prediction" competition held in 2020.

Most protein structures determined by it are very accurate, not only as accurate as those determined by experimental methods, but also far better than other methods for solving new protein structures.

Specifically, Alpha Fold 2 can predict the structure of a typical protein within minutes and generate high-precision structures within days.

In early 2022, Alpha Fold 2 has determined the structures of another 220 million proteins, covering almost all proteins of known organisms in the DNA database.

  In November 2022, Meta (formerly Facebook) is catching up, with its generative AI software called ESMFold predicting the structures of some 600 million proteins from bacteria, viruses, and other yet-to-be-named microbes.

While the software is not as accurate as Alpha Fold 2, it is about 60 times faster at predicting structures.

  The principle of ESMFold is basically similar to ChatGPT, and it is also a large-scale language model, except that the content of training it is not natural language, but biological gene language, that is, to detect proteins through the order and regularity of base arrangement.

  For example, the training of ESMFold is to "feed" the amino acid sequences of known proteins to them, just like training ChatGPT to "feed" natural language words according to grammar.

Proteins in nature can be represented by 20 different amino acid chains, and each amino acid chain is represented by a letter. This training enables ESMFold to have an intuitive understanding of protein sequences and understand the protein shape information contained in protein sequences.

After such deep learning, ESMFold learned to "autocomplete" information in the case of ambiguous amino acid ratios.

  The research team applied ESMFold to a large-scale sequenced database of "metagenomic" DNA from the environment, including soil, seawater, the human gut, skin and other microbial habitats.

ESMFold uses algorithms to combine information on the relationship between protein structures and sequences to generate predicted structures.

In total, it predicted the structures of more than 617 million proteins in just two weeks.

Moreover, over 1/3 of the 617 million protein tests were predicted to be of high quality, with millions of protein structures completely new.

  Enzymes in nature start from scratch:

  Changes in the amino acid sequence of artificial enzymes do not impair activity

  The power of generative AI is also reflected in the fact that it can generate proteins and substances that are not found in nature, and produce and provide new raw materials and products for human beings.

  An artificial intelligence research company in the United States has developed another generative AI called ProGen, an artificial enzyme artificial intelligence system.

This is an AI software that specifically detects enzymes (a special protein produced by living cells, and almost all biochemical reactions in the human body must be completed with the participation of enzymes) and generates enzymes.

In laboratory tests, some of the artificial enzymes designed by ProGen were as effective as enzymes found in nature and remained biologically active even though their amino acid sequences differed significantly from any known natural protein.

  A particular protein has its own unique sequence of amino acids.

The researchers fed the amino acid sequences of 280 million different proteins from 19,000 enzyme families into the ProGen machine-learning model, provided the associated protein properties as control labels, and let the system spend weeks "digesting" the information.

The researchers then narrowed the information down again, using the amino acid sequences of 56,000 proteins from five lysozyme families, along with some information about those proteins, to fine-tune the model.

  According to the content of the study, ProGen quickly generated 1 million protein sequences, and the research team selected 100 of them for testing and found that: all artificial proteins from the 5 lysozyme families showed activity, and 73% had antibacterial function, However, only 59% of natural proteins have antibacterial function.

  Even more surprisingly, in another round of screening, the research team found that even though only 31.4% of the sequences were similar to the currently known natural proteins, the generative AI-designed enzymes still showed biological activity.

In contrast, natural proteins may lose their biological activity if any mutation occurs.

  These research results are summed up, highlighting the significance of three aspects: First, the artificial protein generated by ProGen can not only be expressed correctly, but also show a structure similar to the natural folding of the protein; second, even if the protein generated by AI has only part of the amino acid Proteins have similar sequences and are biologically active, but natural proteins do not have this advantage; third, artificial intelligence can design new substances and products that have never existed in nature.

  This means that if generative AI is used to design and produce protein drugs, food and biological products (such as products that degrade plastics), it will be faster and more effective. Of course, its safety needs to be tested by further research.

In other words, if the protein generated by artificial intelligence can be like the naturally produced protein, it also means that in the future artificial intelligence can design various products that human beings need, the most important thing is food and medicine to meet human survival.

Aids in the diagnosis of disease and eugenics:

  The final result is still subject to human review and decision

  Now, generative AI has been developed to detect, diagnose and predict cardiovascular disease, eye disease, diabetes, and various cancers such as colorectal cancer, lung cancer, breast cancer, and prostate cancer through images, blood, and tissue scan results.

  Heart disease is a serious cardiovascular disease.

Electrocardiogram signals are most commonly used as a tool to screen for heart disease.

Researchers at institutions such as Nanyang Technological University in Singapore have used an artificial intelligence machine learning algorithm called Gabor-CNN to design a generative AI diagnostic tool that can mimic the structure and function of the human brain and use electrocardiograms to diagnose coronary artery disease, Myocardial infarction and congestive heart failure.

The test results show that this kind of artificial intelligence helps to automatically identify the ECG signals related to healthy people and patients with different cardiovascular diseases, with an accuracy rate of more than 98.5%.

  Cancer can also be diagnosed and treated using AI.

For colorectal cancer and breast cancer, it is now generally diagnosed by observing CT pictures and tissue sections.

Researchers from China's Central South University and other institutions collected more than 13,000 images of colorectal cancer from 8,803 subjects and 13 independent cancer research centers in China, Germany, and the United States. Using these randomly selected images, the researchers constructed An AI software to identify images of colorectal cancer.

Preliminary tests showed the AI ​​software could detect most images of colorectal cancer as well as real pathologists, and in many cases even better.

Of course, the final diagnosis still needs to be checked and reviewed by a pathologist.

  Another area of ​​medical attention is infertility.

Modern lifestyle and environmental changes cause about 15% of couples to be infertile, among which poor sperm quality is one of the important reasons.

The traditional approach is to test the quality of sperm biopsy, but this task will be better if it is done by AI.

  Recently, Shanghai No. 1 Maternal and Infant Health Hospital has developed an AI software that can recognize the "face" and different movement forms of sperm (similar to face recognition) through deep learning and algorithms. The operator only needs to observe through the computer screen Can.

The results of this AI system testing a total of 1,000 samples from three hospitals showed that its accuracy was the same as that of traditional methods.

The AI ​​software has greatly shortened the entire inspection process, taking only one and a half hours, compared with about a week using traditional methods to get the report.

  Examples of such "intelligence" abound.

It is foreseeable that the rapid development of artificial intelligence will have an impact on many fields, especially those jobs that are less creative and can be completed based on industry knowledge or training, such as customer service, animation modeling, art, translation, low-level code developers, etc. .

ChatGPT, which is popular all over the world this time, shows us that the development of artificial intelligence has made a qualitative leap, which heralds more possibilities, but this technological innovation is currently limited to the language dimension, not active awareness, nor does it have real innovation capabilities , It is far from the fantasy of "artificial intelligence replacing people" in science fiction films.

  In short, no matter what field AI is applied to, the final results or products still need to be reviewed and decided by humans. This is the scientific attitude towards AI.

  "Beijing Daily" February 22, 2023, Page 9