Decrypt the last "blank area"


The first complete and gap-free human genome sequence is published

  Science and Technology Daily, Beijing, April 1 (Intern reporter Zhang Jiaxin) The human genome sequencing, known as the "moon landing plan" in the life sciences, has made major progress again: the international scientific team Telomere-to-Telomere Consortium (T2T) published the first complete The uninterrupted, gap-free sequence of the human genome reveals, for the first time, highly identical segmental repetitive genomic regions and their variation in the human genome.

This is a "significant upgrade" to the standard human reference genome, the reference genome sequence (GRCh38) released in 2013.

On the 31st local time, the journal Science published 6 papers reporting this achievement.

  On February 12, 2001, the International Human Genome Project, which was jointly participated by scientists from six countries, published the first human genome map and preliminary analysis results; on April 15, 2003, the draft human genome sequence was published.

However, due to technical limitations, the original Human Genome Project left a gap of about 8% "blank".

This section is difficult to sequence and consists of highly repetitive, complex chunks of DNA that contain functional genes as well as centromeres and telomeres located at the middle and ends of chromosomes.

  The new gapless version, called T2T-CHM13, consists of 3.055 billion base pairs and 19,969 protein-coding genes.

Nearly 200 million base pairs of new DNA sequences were added, including 99 potential protein-coding genes and nearly 2,000 candidate genes for further study.

Most of these candidate genes were inactivated, but 115 of them were still likely to be expressed.

The team also found about 2 million additional variants in the human genome, 622 of which were found in genes relevant to medicine.

In addition, the new sequence corrected thousands of structural errors in GRCh38.

  Specifically, the gaps filled by the new sequences include the entire short arms of the five human chromosomes and cover some of the most complex regions in the genome.

These include the highly repetitive DNA sequences found in and around important chromosomal structures, such as telomeres at the ends of chromosomes and centromeres that coordinate the separation of replicated chromosomes during cell division.

The new sequences also revealed previously undiscovered segmental duplications, long stretches of DNA that replicate in the genome and are known to play important roles in evolution and disease.

  The new sequences also feature important improvements in identifying and interpreting genetic variation, and reveal never-before-seen details about the pericentromeric region.

Variability within this region may provide new evidence for how human ancestors evolved.

  This complete, gap-free sequence is critical for understanding the full spectrum of variation in the human genome and understanding the genetic contributions of certain diseases, according to the researchers.

  The next phase of the study, the researchers said, will be to sequence the genomes of different people to fully understand the diversity of human genes, their roles and how we relate to our close relatives and other primates.

  [Editor-in-chief's circle]

  Certain regions of the genome are actually repeated over and over again, and these repeated regions include some critically important parts of cell division, as well as new genes that may help species adapt.

In the past, all these duplications prevented scientists from "assembling the pieces" in the correct order -- like a difficult puzzle where nearly every piece is the same, and people don't know which piece goes where, right on the genome map leaves a huge void.

The latest results now no longer have any hidden or unknown parts, or it can be said that a whole new genetic treasure trove is slowly opening in front of all human beings.