The “Tang Yao” genome has achieved a historic breakthrough—

How far is it to establish China’s own genome technology system?

  [Produced by Shen Tong Studio]

  Written by: Our reporter Cao Xiuying

  Planning: Liu Shu and Li Kun

  He Zhong (pseudonym) did not expect that a blood sample of less than 20 milliliters from his body would actually achieve a result that was evaluated as "a landmark event in our country and even the world" by Zhang Xue, academician of the Chinese Academy of Engineering and secretary of the Party Committee of Harbin Medical University. .

  Using He Zhong's blood samples, the team of Gao Zhancheng, a professor at Peking University People's Hospital, and Kang Yu, a researcher at the Beijing Institute of Genomics, Chinese Academy of Sciences (National Bioinformatics Center), successfully completed the entire Chinese genome from telomere to telomere for the first time in the world. Obtain high-quality real human diploids including Y chromosome and complete and gap-free whole-genome reference sequence (44+XY).

  Because this sampling point is located in Linfen City, Shanxi Province - near the ruins of the ancient Tang Dynasty established by Emperor Yao thousands of years ago, the research team named the reference genome "Tang Yao".

  In people's minds, the human genome map has been published a long time ago, and now the genome of ordinary people can be easily detected. Why is the "Tang Yao" genome evaluated as a "milestone event"? What does this breakthrough in basic research mean? A reporter from Science and Technology Daily conducted an interview.

The existing human reference genome is biased when used for Chinese people

  This is basic research prompted by the need for clinical application.

  In the past few decades, Gao Zhancheng, director of the Department of Respiratory and Critical Care Medicine at Peking University People's Hospital, has mainly treated patients with difficult and complicated respiratory diseases from all over the country. He led the team to diagnose multiple cases of orphan lung diseases for the first time, such as diffuse pulmonary lymphangiomatosis, pulmonary alveolar proteinosis, etc.

  Many cases enriched his medical practice, but also brought him confusion in diagnosis and treatment. There are considerable differences in the clinical manifestations of many disease syndromes among different ethnic groups.

  "All current sequencing diagnosis reports for tumors, genetic diseases, etc. are based on the US-led GRCh37/38 human reference genome sequence to determine normality or variation." Gao Zhancheng said that GRCh37/38 is a chimeric gene derived from multiple human individual genome sequences. A complete set of genomes, the main sources are African and European people. Not only is it incomplete and full of errors, it is also difficult to represent China and even the Asian community.

  Take hereditary cystic fibrosis as an example. This disease manifests as a loss of function caused by mutations in a transmembrane chloride ion transcription factor in white Europeans and Americans. But in Chinese patients, the incidence of mutations in this transcription factor was much less frequent.

  "When predicting disease risk and diagnosis and treatment, for Asians, only comparing the existing reference group may produce large deviations." Gao Zhancheng said that this deviation will also affect the development of targeted drugs.

  In 2003, AstraZeneca, an internationally renowned pharmaceutical company, was the first in the world to successfully develop the epidermal growth factor receptor tyrosine kinase inhibitor (EGFR-TKI) - gefitinib, which is suitable for patients with epidermal growth factor receptor (EGFR) genes. Mutated non-small cell lung cancer patients.

  Subsequent studies found that EGFR gene mutations are obviously racially specific. Mutation rates in non-smoking lung adenocarcinoma patients of Chinese and East Asian ethnicity are significantly higher than in European and American Caucasian patients.

  "The current mainstream view is that the difference between the genomes of different races is only one thousandth. But from clinical practice, the actual difference may be much greater than this number." Gao Zhancheng said, "Therefore, we need to construct the Chinese reference genome."

  But for a clinician, this is a new and difficult topic to overcome.

  In 2020, a suitable opportunity arrives.

  This year, preparations for the construction of Gao Zhancheng’s Respiratory Medicine Shanxi Studio located at Linfen Central Hospital in Shanxi Province began.

  "This studio must not just hang a brand, it must have specific topics and be able to solve real problems." Gao Zhancheng said that drawing the Chinese's own reference genetic map is on the agenda.

  He immediately contacted his first doctoral student and long-time collaborator, Kang Yu, a researcher at the Beijing Institute of Genomics, Chinese Academy of Sciences.

  "I am of course very happy to participate in this work." Kang Yu said, "We judge that the current technological development is the best time to construct the Chinese reference genome, which allows us to complete this matter at less cost and in a shorter time. .”

Provide a more accurate coordinate system for Chinese genome research

  Who is He Zhong? Why can He Zhong’s genome be called a reference genome?

  Kang Yu said choosing the right sample is the first step. The long history and diverse geographical and climatic environments have shaped the unique genetic diversity of the Chinese nation. "The 'Tang Yao' genome is the starting point for research, and we decided to start from the Han ethnic group, which has the largest population." Kang Yu said.

  "The purpose of constructing the Chinese people's own reference genetic map is to better serve modern medical applications, so the samples need to better represent the genomic characteristics of modern Chinese people." Kang Yu said that the sample they finally identified came from a person who is now living He Zhong, a healthy young man in an ancient village in Hongdong County, Shanxi Province.

  This area was the starting point for the Hongdong immigrants of the Ming Dynasty, the famous "Big Sophora Tree" immigrants in history. This migration that took place more than 600 years ago lasted for nearly half a century, with large numbers of immigrants spreading across China, and some into Southeast Asia. "We believe that He Zhong's genome is expected to become a representative of the modern Han population." Gao Zhancheng said.

  According to ancestry analysis, most of the "Tang Yao" genome is characteristic of East Asian populations. "The Y chromosome typing of this sample is widely distributed in China except Xinjiang, Tibet and other places, and it is very representative." Kang Yu said.

  The "Tang Yao" genome suggests significant differences at the genome level between Chinese and Europeans. Compared with the new version of the human reference genome T2T-CHM13 released in 2022 by the international scientific team "Telomere to Telomere (T2T)" Alliance (hereinafter referred to as the "T2T" Alliance), "Tang Yao" showed 11% differential sequences and 5% Differential genes.

  Chen Runsheng, an academician of the Chinese Academy of Sciences, said that "Tang Yao" has filled the gap in the high-quality genome of the Han people. The release of the complete Chinese genome sequence will also change the previous perception that there is only one thousandth difference between the genomes of different races.

  Zhang Xue believes that the "Tang Yao" genome will provide a more accurate coordinate system for locating genes and mutations for Han Chinese genome research, and at the same time solve the technical obstacle that the European blood source reference genome is not suitable for Chinese genome research. This will establish a technical system and quality benchmark for my country's medical genomic research, including genetic disease diagnosis, common disease risk prediction, tumor genome variation, pharmacogenomics and other fields.

  Cheng Jing, an academician of the Chinese Academy of Engineering, believes that the "Tang Yao" genome sequencing and analysis work not only has very important interdisciplinary and cross-field basic research significance and application value, but also answers the important social science question "Why are Chinese people" at the DNA level? The questions will help us answer questions about the origin, migration, historical evolution and exchanges of the Chinese people.

It took two years to complete the internationally leading quality standards

  Equipped with the most advanced sequencing instruments and the most capable R&D personnel, the "Tang Yao" project was launched as quickly as possible. In less than two years, in August 2023, the project team obtained He Zhong's complete and gap-free high-quality genome sequence.

  The results exceeded the research team's expectations.

  According to Merqury, an important international tool for assessing genome quality, the quality value of "Tang Yao" reached the quality standard of the reference genome, with a quality value of Q74.69, while the quality value of T2T-CHM13 was Q73.94.

  "This number shows that our reference genome has fewer errors and the splicing quality is higher than T2T-CHM13." Kang Yu said.

  Turn the time hand back to more than 30 years ago. In 1990, the Human Genome Project, known as the "Moon Landing Project" in the field of life sciences, was launched. Eleven years later, the project released a working draft of the human genome. Two more years later, researchers released what was then called a "complete map" of the human genome.

  In the following years, the research team continued to improve the blank areas of the human genome, but about 8% of the sequence was still missing.

  Until 2022, the "T2T" alliance filled in the missing "puzzle" pieces and released a new version of the T2T-CHM13 reference genome. In this achievement, scientists successfully added approximately 200 million bases to the human genome, decoding most of the gaps on chromosomes 1 to 22. The only thing left out is the Y chromosome, the smallest of all human chromosomes.

  In 2023, with the publication of two research papers in the top academic journal "Nature", the complete sequence of the human Y chromosome was finally revealed to the world.

  In other words, it took the International Genome Project more than 30 years to obtain the complete human haploid genome sequence, including the Y chromosome.

  The "Tang Yao" research team also obtained this result. For the first time in the world, they obtained a real human diploid genome sequence (44+XY) including 46 chromosomes, which can accurately distinguish two sets of haploid genome sequences from the male and female parents with 99.99% accuracy.

  In 2022, the "T2T" alliance tested a haploid, that is, the DNA sequence used was not from a natural human tissue sample, but from a vesicular fetal mass (hydatidiform mole) cell line in the female uterus - CHM13.

  At that time, Alvin Ekeler, co-chairman of the "T2T" alliance and researcher at the Howard Hughes Medical Institute of the University of Washington, told the media: "We have now completed one human genome, and the next key task is to complete it twice. The paternal and maternal lines of the somatic genome."

  The "Tang Yao" research team did it.

  "Just like the 'T2T' alliance was able to fill in the final 'puzzle', the reason why we were able to achieve this result quickly is also due to the rapid progress of DNA sequencing and splicing technology, as well as a large number of technologies and technologies including the International Genome Project. Theoretical accumulation." Kang Yu said, "We achieved results because we stood on the shoulders of our predecessors."

  This is not a job that can be completed as long as there are instruments and funds. "In the past two years, our team has worked day and night to innovate a large number of algorithms and splicing methods. Only in this way can we achieve high accuracy in distinguishing highly similar gene fragments, achieving an accuracy higher than that of the NIH reference genome." Gao Zhancheng said.

Avoid the embarrassment of "Westerners understand Chinese people better than Chinese people"

  "This is a new starting point for the study of population genetics of the Chinese nation." said Yu Jun, former deputy director of the Beijing Institute of Genomics, Chinese Academy of Sciences. "Next, we will promote the sequencing of other representative individual reference genomes and conduct research on different ethnic groups. Sequencing of the population, and eventually we hope to launch a nationwide genome sequencing project."

  Looking back on the past, China’s development in the field of genomics technology can be said to have progressed from participation to synchronization.

  Chen Runsheng recalled that in 1994, the National Natural Science Foundation funded a research project on the genetic structure of several sites in the Chinese genome, marking the official launch of my country's human genome research.

  In 1999, China won the 1% mission of the International Human Genome Project. A team of scientists, led by researchers from BGI and the Institute of Genomics, Chinese Academy of Sciences, completed this sequencing task with high quality, driving the rapid development of genomics in my country. Over the past 20 years, my country's genomic technology and research have made leaps and bounds.

  Chinese scientists have also been working hard to construct the Chinese nation's own reference genome.

  "Yanhuang No. 1" is the world's first standard Chinese genome sequence map, and also the first personal genetic sequence map of the world's 2 billion yellow people. The project was completed on October 11, 2007. After Chinese scientists took on 1% of the tasks of the International Human Genome Project and 10% of the tasks of the International Human Haplotype Map, they used next-generation sequencing technology to independently complete the Chinese genome map 100%.

  Subsequently, Jinan University, Beijing Genomics Institute of the Chinese Academy of Sciences and other units successively carried out similar research. However, limited by the technical means at the time, these genomes did not become reference genomes for practical applications in my country and did not exert their due value.

  In 2023, 26 units including Fudan University, Xi'an Jiaotong University, and the Chinese Academy of Medical Sciences jointly released the first phase of research progress of the Chinese Population Pan-Genome Consortium. This study has initially constructed the first pan-genome reference map exclusive to the Chinese population, and the results were all completed independently by Chinese scientists.

  On this basis, experts believe that our country should speed up the pace of building the Chinese people’s own “coordinate system” for genome research.

  More than 20 years ago, based on the Human Genome Project, the United States formally proposed a new big science plan-the Precision Medicine Plan. The ultimate goal of this project is to determine the genome of every person, also known as the "All of Us Research Project". In 2022, the planned research project released the first batch of whole-genome sequencing data for nearly 100,000 people for researchers to use. The data includes basic data such as height, weight and blood pressure and survey data such as data on participants' demographics, lifestyle and general health.

  Gao Zhancheng said that once the U.S. National Genome Project completes the genome sequencing of 5 million Chinese Americans, it is entirely possible that "others understand the Chinese genome better than we do."

  In recent years, international scientists have jointly established the Human Pan-Genome Consortium (HPRC) in an attempt to establish a more accurate and complete reference genome of the world's major populations and understand the diversity of the world's population. In May last year, the first human pan-genome reference draft produced by HPRC was published in Nature, including 47 samples from around the world, including 3 Han samples from southern China.

  Zhang Xue paid attention to a phenomenon: the two most important international alliances in the field of genomics - the International Human Pan-Genome Alliance and the International T2T Genome Alliance, the important members of which are from universities and research institutes in Europe and the United States. Chinese research institutions and entities are not present. Inside.

  "Under this situation, establishing the Chinese's own high-quality reference genome is a key step to prevent being 'stuck'." Zhang Xue said.

  "Next, we will further analyze and annotate 'Tang Yao' so that it can be better applied to clinical applications." Kang Yu said, we hope to develop targeted sequencing and genome analysis to serve the Chinese based on our own reference genome. and diagnostic and treatment technologies, and promote the development of new drugs in the future.

There is an urgent need to build China’s own genome technology system

  The interviewed experts predict that T2T-CHM13, with its completeness and high quality, is expected to gradually replace the GRCh38 reference genome currently in use.

  Chen Runsheng and Huang Jie, deputy director of the Institute of In Vitro Diagnostic Reagents of the Chinese Academy of Inspection and Quarantine, both suggested that at the time of handover between the old and new reference genomes, my country should establish national standards and promote the use of "Tang Yao" as the standard for sequencing and analysis in Chinese population genome research and clinical applications. Material and reference genomes, European reference genomes are no longer used to define genetic variation in Chinese people. At the same time, on this basis, the Chinese human genomics knowledge framework and application technology system will be established.

  Yu Jun and other scientists believe that to achieve the above goals, my country's human genome research urgently needs to further strengthen top-level design and planning. "Who will test it, who will use it, and how to ensure data security? These issues all need to be studied systematically."

  In 1993, Yu Jun participated in the Human Genome Project, a landmark scientific project. With the full support of his mentor Maynard Olson, he facilitated the participation of Chinese scientists in the Human Genome Project.

  Over the years, what is China's genome research plan and how to establish independent gene sequencing technology and data systems have been lingering in Yu Jun's mind.

  Yu Jun believes that our current relevant research is still relatively fragmented, the group research carried out is small in scale, and data ownership is scattered in the hands of different researchers, making it impossible to share data integration innovation, resulting in a waste of resources.

  The separation of research and application is also a prominent problem currently. Yu Jun said that basic scientific research, clinical access, and application specifications in the field of genome in my country are managed by different departments, and the efficiency of information communication is not high, making it difficult for application needs to effectively drive basic scientific research, and there is no connection between basic scientific research and clinical applications. Form effective feedback and virtuous cycle. In order to promote cooperation and exchanges in basic research and clinical medicine in the field of genome, Peking University People's Hospital established the Human Genome Research Center in January this year to further expand related research and medical applications of the "Tang Yao" genome.

  Yu Jun believes that on the basis of successively constructing the Chinese people's own reference genome, how to promote larger-scale population sequencing in the future, ultimately achieve nationwide sequencing, and truly promote the development of precision medicine are all issues that must be faced at present. "You test hundreds of people, I test thousands of people. Apart from publishing some papers that look pretty good, most of these data do not promote practical applications such as clinical diagnosis and new drug development."

  In response to this current situation, experts believe that there is an urgent need to integrate limited resources, including funds, talents, sample resources, infrastructure and other conditions, centrally manage samples and data, and effectively coordinate resources.

  "We can explore the establishment of an institution similar to the National Human Genome Research and Management Center." Yu Jun suggested that the institution adopt a management model of central decision-making, expert committee supervision and guidance, and center execution to coordinate scientific and technological funds, coordinate social resources, and standardize technology standards, promote technological transformation, and prevent safety risks. “In this way, we can achieve the goal of independently establishing my country’s internationally competitive human genome technology system and knowledge framework.” (Science and Technology Daily)