About two decades ago, researchers revealed the "first draft" of the human genome sequence, but it was missing about 8% of the genome, a percentage that until recently was one of the most mysterious secrets, now scientists say they have completed - finally - the complete sequence of the genome, the matter Which opens the door to a deeper understanding of our nature as humans, in its most accurate and abstract form, and also contributes to improving our understanding of complex diseases and ways to deal with them.

But before starting the material of this report, let's simplify what the human genome means. It is the complete set of genetic information for humans, found in the DNA sequence, which is found in the nuclei and mitochondria of all the cells of our body.

DNA is - simply - the code of life, consisting of a very long line of complex chemical letters, wrapped around each other in elegant harmony, but let's understand it. The hero becomes the heroine for the first time, and there he leaves her to travel in search of money, in another piece of tape that you can find returning to her in great regret, as soon as this tape is placed in the projector until the film appears on TV.

DNA also consists of codes or codes, here is a code that determines your height, there is another code that determines the color of your eyes, and a third code that determines the nature of your hair, curly, wavy or straight, this happens because the DNA letters (codes) are translated inside cells into proteins, and the proteins are stacked to build Eye color, define your curl pattern, and pile to create the length of your cheekbones.

If your genome is the movie scenes encoded on an old brown long film, your body is the movie itself after it was shown on TV.

When the human genome was first considered complete in 2000, the news was met with great international uproar, and both groups that competed to finish this work agreed to declare their mutual success, shaking hands in the White House, in the presence of Bill Clinton (then US President), and blessing Tony Blair with a smile From London (British Prime Minister at the time).

One of the two groups is a large government consortium, and the other is a private corporation with limited influence.

One of the leading scientists in the field of molecular biology said about the discovery: “We are now witnessing an exceptional moment in the history of science, as if we had climbed to the top of the Himalayas (meaning Everest),” but despite all the hype, the human genome was not yet complete then, and the Neither group reaches the ultimate pinnacle of its fabrication, and even contemporary news coverage of the event had already suggested that this version of the human genome was little more than a rough draft, crammed with long stretches of which DNA sequences were still obscure or missing.

At this time, the private company shifted its focus from the human genome and ended its project, while the consortium scientists were still working. By 2003, the scientists once again announced the complete genome, and although the headlines made headlines, its glow was fainter than last time.

In fact, the human genome was not yet complete then (again), the new revised draft missed about 8% of the genome, because these regions are the most difficult to sequence, and are filled with repetitive letters that are difficult to read using techniques available at the time.

But this perplexity did not last forever. In May of this year (2021), a separate group of scientists finally published a preliminary version online, describing what could be considered the first truly complete human genome, as they were able to read all the chemical letters of There are 3.055 billion letters in the 23 human chromosomes.

The team, led by relatively young researchers, from all over the world gathered on the application “Slack” (an application dedicated to communication between one work teams or employees within one company) to finish the task that scientists left behind them 20 years ago, but the strange thing is that the White House has not announced About the news this time, and there was no talk of reaching the Himalayan summit with that achievement, and perhaps the reason for this silence is that the research itself is still under review until they can officially publish it.

To complete the human genome, these scientists have had to figure out how to map the most obscure and neglected repeat regions in human DNA (in some regions of the DNA chemical letter patterns repeat without an understandable reason, making them especially difficult to study), and which may finally be getting their way. scientific merit.

“I see this achievement as a milestone," says Stephen Henikov, a molecular biologist at the Fred Hutchinson Cancer Research Center who was not involved in the project.

Hennikov is interested in studying one of those mysterious, difficult-to-sequence regions that have been abandoned by previous projects of the human genome, and these regions are the centromeres, the region that connects the arms of each chromosome (chromatidin), and the chromosomes, of which we have 23 pairs, consist of a long strand called DNA (DNA), which can be condensed into a rod, and is particularly dense in the centromere region.

There are five human chromosomes in which the centromere is not located in the middle, but it is very close to one of its upper or lower ends, and the chromosome is divided into two parts, a long arm and a very short arm. These short arms are also filled with repetitions that have not had a complete sequence so far, and the centromeres contributed Short arms and other types of DNA repeat regions make up most of the 238 million letters the consortium has added to the human genome.

The regions rich in repeating sequences in the human genome usually do not contain genes, and perhaps this is one of the reasons why scientists neglected them for a long time. Their focus has been largely on genes because their function is clear and simple, which is protein encoding, and among the major surprises in previous drafts of the human genome is One of them succeeded in explaining how weak DNA is to encode proteins, with the coding rate as low as 1%.

There are hints that these repetitive regions of DNA also play important roles in how genes are expressed in the body, and abnormalities in them have been linked to cancer and aging.

Otherwise, the consortium discovered 79 new genes hidden among these repeats, not to mention that mapping these repeat regions helped scientists examine their functions more carefully.

The chromosomes of a human male

Adam Felipe, a computational geneticist at the National Institutes of Health, who co-led the Telomere-to-Telomere [T2T] consortium** (an international collaboration of nearly 30 institutions that seeks to assemble the entire human genome) The genome-completion efforts "have been drastic and vital." So Felipe and Karen Mega, an American geneticist at the University of California, Santa Cruz, decided to create the T2T Consortium in 2018 after they both realized they had ambitions to complete the human genome. Megah, who joined the project as a biologist trying to understand what's going on, said she likes these repetitions. Felipe, the computer scientist, added the technical skill to the team.

The role of traditional DNA sequencing techniques is to break the DNA into small pieces, at the same time that computer algorithms have to put them back together like jigsaw puzzles, but the problem is that the repeating regions all look almost the same, and to solve this dilemma, two new techniques have emerged For "Long Read" sequencing launched by Oxford Nanopore (UK) and PacBio HiFi (US), these two technologies help to read mysterious long stretches of the genome, although their ability is limited. To read an entire centromere or a short arm of a chromosome, algorithms can make up for that deficiency.

The role of centromere sequences, like many other repeating regions in DNA, is still not fully understood, but we do know that they are the key to cell division. When a cell prepares to split in half, a protein called the "spindle apparatus" attaches to the centromeres, and the two ends of the centromeres are then stripped. Separate each chromosome to make sure each cell gets exactly half the number of chromosomes.

An error in whether eggs or sperm have the correct number of chromosomes can result in children being born with chromosomal abnormalities, such as Down syndrome (a genetic disorder caused by abnormal cell division that results in an extra copy, either full or partial, of chromosome 21), or Turner syndrome (a genetic disorder affecting only females, and causing disruption of physical and mental development as a result of the loss of the sex chromosomes type X), and things may go wrong in other parts of the body, such as the presence of blood cells on too many or too few chromosomes, and this is a sign Prominent in old age, it is not surprising that men over the age of 70 lose type Y chromosomes in their blood cells.

The T2T consortium demonstrated in one of two research papers that the Oxford Nanopore technology for reading long sequences can also be used to determine where the spindle apparatus attaches to the centromere, and examination of sequences in these regions may yield new clues regarding chromosomal abnormalities (and diseases). resulting from it). As for the short arms rich in repeating sequences in chromosomes, although they are still mysterious, they certainly play a role in the process of translating genes into proteins, and knowing their sequence could shed more light on this function.

Brian McStay, a biologist at the National University of Ireland in Galway, likens the whole genome to a "list of parts" that allows scientists to discover basic components of a chromosome. "Knowing what this list of parts consists of helps us understand what it looks like," he says. It looks exactly like the chromosome, and the effect it has on its function if we remove part of this list."

But as impressive as the technical achievement is, scientists stress that a single genome is just a snapshot, and watching how these repeating regions change over time from person to person, and from one species to another, will surely interest us more.

According to these changes, Hinnikov, a molecular biologist at the Fred Hutchinson Cancer Research Center, asks: "What will happen to cancer? What about evolution and comparing offspring with parental traits?"

To answer these questions, the consortium demonstrated that these repeat regions can be sequenced using special new techniques, and their application to more genomes now allows scientists to compare one with the other.

Commenting on this, Karen Mega, an American genetics expert at the University of California, says that the ultimate dream is the completion of every genome that scientists are trying to understand from telomere to telomere, that is, from end to end.

But despite all this, you can challenge the claim of "completeness of the genome" by saying that it consists only of one set of 23 chromosomes, while normal human cells contain 23 pairs of those chromosomes, to study DNA scientists use cancer cells that develop from an egg Fertilized containing only 23 chromosomes (all human cells contain 46 chromosomes, which is 23 pairs, while sex cells such as the egg or sperm contain half of this number).

Hence, at some point researchers will have to use different autosomal cells that contain 23 pairs of chromosomes to complete what is known as a diploid genome.

“The next focus will be on binary genomes," says Shilpa Garg, a geneticist at the University of Copenhagen in Denmark.

Garg uses PacBio HiFi technology to assemble human genomes relatively quickly, hitting a number of genomes every day, except for some hard-to-read regions, such as centromers.

That speed could help medical centers too, by making it easier for doctors to regularly diagnose patients using genome sequencing.

If we compare the speed of new technologies with the old, we will find that assembly of the genome using the old sequencing technology takes about three weeks.

Yes, sequencing the genome, and reading its repeating regions, is becoming easier and faster by the day, and soon the completion of another human genome will no longer be shocking news at all.

__________________________________________________________________

This report is translated from: The Atlantic and does not necessarily represent the website of Meydan.

Margins:

** Telomeres are the regions at the ends of chromosomes, and are also a repeating sequence of chemical letters, so "telomere to telomere" means "from one end to the other end".