Function and Evolution of Repeated DNA Sequences

Name: Function and Evolution of Repeated DNA Sequences
Brand: Wiley
Price: 142.99 EUR
Availability: OnlineOnly

Guy-Franck Richard(Herausgeber*in)

Wiley (Verlag)

1. Auflage

Erschienen am 27. Dezember 2023

400 Seiten

E-Book

ePUB mit Adobe-DRM

Systemvoraussetzungen

978-1-394-26489-6 (ISBN)

142,99 €inkl. 7% MwSt.

Systemvoraussetzungen

für ePUB mit Adobe-DRM

E-Book Einzellizenz

Als Download verfügbar

Beschreibung

Weitere Details

Weitere Ausgaben

Person

Inhalt

Introduction
About Repeated Genomes

Guy-Franck RICHARD

Instabilités naturelles & synthétiques des génomes, Institut Pasteur, CNRS UMR3525, Paris, France

Genome ['d?i?n??m] nm Biol.: Set of hereditary characteristics of a living being, of which a small part is composed of genes providing a function to the organism, and the majority is composed of repeated sequences for which it is unknown whether they have a function.

Taking matters a little further, this could be a modern definition of the word "genome", in the light of the knowledge garnered across three decades from sequencing the DNA content of living beings, in particular eukaryotic organisms, more complex than those of their bacterial and archaebacterial ancestors. Biologists were already aware back in the 1960s, long before the invention of the first DNA sequencing methods, that the content of genomes was difficult to comprehend. Denaturation-renaturation experiments highlighted that the speed of renaturation of the double-helix was proportional to its concentration. The Cot parameter was the value at which renaturation of half the genomic DNA was complete, under controlled conditions. Each organism could then be defined by the Cot value of its genome. In trying to establish the Cot values of genomes of the simplest organisms - phages or bacteria - or of more complex organisms, such as vertebrates, it transpired that the latter contained three types of sequences presenting very different Cot values (see Figure I.1).

Figure I.1. Example of Cot curve

It is thus possible to show that the mouse genome, for example, is composed of 70% unique sequences with slow renaturation, 20% moderately repeated sequences present in 1,000 to 100,000 copies per genome and 10% highly repeated sequences representing at least 1 million copies per genome and showing rapid renaturation (Britten and Kohne 1968). This approach, based on the physicochemical properties of DNA, slightly underestimated the quantity of repeated sequences because their renaturation rate depends on the identity between these sequences, divergent sequences (such as long terminal repeats (LTRs)) renaturing more slowly than identical sequences. Nowadays, Cot curves are still sometimes used to separate the highly repetitive fraction of a genome from its unique fraction in order to sequence specific DNA of either fraction (Peterson et al. 2008).

I.1. The "C-value" paradox

From the moment it was proven that DNA was the support of heredity, and theoretically contained all the genes necessary for the development of a living being, it seemed logical that the most sophisticated organisms had to contain more genes and therefore more DNA in their genome (the "C value") to encode these genes. This idea was to be questioned in the 1950s with the discovery that the nuclei of certain amphibians and fish contained 20 times more DNA than the nuclei of mammals. Given that the latter presented a greater developmental complexity, this appeared very much paradoxical, and was even used as an argument by the opponents of DNA being the sole support of heredity (Thomas 1971). This "C-value paradox" could finally be explained only decades later, when the first genomes were sequenced. It is now known that the number of genes in an organism has little to do with its size or level of complexity. The baker's yeast genome contains about 6,000 genes, that of fruit flies about 14,000 and the human genome (or those of its very close cousins, great apes) contents itself with 20,000 genes, with which it manages a very sophisticated level of developmental and behavioral complexity. But what about the paramecium with its 40,000 genes, twice as many as the human genome? Or Trichomonas vaginalis, a parasite of the genital tract, with its 60,000 genes? Or indeed wheat and its 124,000 genes, more than six times as many as our genes? Clearly, this so-called complexity could not be measured by the number of genes in an organism. Studies of comparative genomics1 have shown that this high number of genes in certain organisms does in fact conceal ancestral events of partial or total genome duplication, followed by variable amounts of gene losses (Wolfe and Shields 1997; Jaillon et al. 2004). These events actively participate in the genetic redundancy and their identification as well as their underlying mechanisms will be addressed in Chapter 1.

If the complexity of an organism has nothing to do with the number of genes contained, the same is true of the amount of DNA. The human genome, with just over three times as many genes as brewer's yeast, contains 200 times more DNA. The genome of a rotifer - a small animal measuring just a few millimeters that lives in freshwaters - contains three times more genes than the human genome in 12 times less DNA! (see Figure I.2).

The genomic sequence of all these organisms showed that some of them had evolved a very compact genome, with high gene density, while others contained a multitude of repeated DNA sequences whose function did not appear obvious at first glance, and that some authors did not hesitate to call them "junk DNA" (Ohno 1972).

Figure I.2. Comparison of genome sizes and gene numbers

I.2. Recycling junk DNA

About 2% of the human genome is translated into proteins. Even by adding the untranslated genes (rRNA, tRNA, siRNA, snRNA, etc.), the percentage of "useful" DNA barely increases. So, what is the purpose of the 98% of DNA in our genome that has, apparently, no function? One conceivable answer is that it has none. The consortium led by Jeff Boeke, professor of genetics at Johns Hopkins University in Baltimore, set out to create the first synthetic yeast genome, using synthetic oligonucleotides. The brewer's yeast Saccharomyces cerevisiae is a eukaryotic organism whose genome contains 12.5 million nucleotides distributed across 16 chromosomes. The synthetic chromosomes were reconstructed one by one from 70 nucleotide-long sequences assembled in blocks of 750 base pairs, themselves assembled in mega-blocks of 2-4 kb, reintroduced one after the other in a hierarchical manner into the yeast genome in replacement of the natural sequences (Muller and Koszul 2015). When designing synthetic chromosomes, it was decided that all repeated sequences would be removed from the genome. All tRNA-encoding DNAs were grouped on a single circular chromosome, specifically built to carry them. Retrotransposons, microsatellites, minisatellites and other repeated elements inessential to life were removed from the new sequence. These synthetic chromosomes, with their junk DNA removed, are perfectly able to sustain life in yeast cells containing them, without any apparent phenotypic defect, at least under laboratory growth conditions (Dymond et al. 2011; Annaluru et al. 2014). One may conclude from the results of this project that junk DNA is useless. However that would be a mistake.

The human reference genome contains about 443,000 residual elements of past retroviral invasions, covering 8.3% of the total sequence (International Human Genome Sequencing Consortium 2001). These retroviral scars are the remains of successive invasions, occurring over the past hundred million years, of our mammalian ancestors by exogenous elements, which left the trace of their passage in the form of LTR2. These retroviral remains are therefore part of our junk DNA. Nevertheless, as we will see, their presence in our genome testifies to their distant but indispensable role in the existence of our lineage. Therian mammals, that is, those possessing a uterus within which the fertilized egg develops, are classified into two groups. Eutherians (or placentals) like humans and mice have a very elaborate placenta connecting the wall of the uterus to the embryo and allowing it to develop in complete safety throughout the entire gestation period. Marsupials (kangaroos and koalas) do not have placentas and the development of their young takes place mainly outside the uterus. Genome sequencing showed that the two human genes specifically expressed in the placenta, syncytin-1 and syncytin-2, were derived from a gene encoding an ancestral viral protein, which infected the primate lineage 25-40 million years ago. Remarkably, the genome of the mouse, another placental mammal, also contains two viral genes having the same function as human genes but deriving from a slightly more recent viral infection than that of the human lineage. Thus, the placenta was invented twice, independently, in two lineages of mammals, by capture of genes of retroviral origin (Dupressoir et al. 2009). Another example is even more striking. Sexual reproduction was invented at the origin of the eukaryotic world. From the first primitive eukaryotic cells, a syngamy3 system was developed that allowed the nuclei of two haploid cells to fuse to give birth to a diploid cell. The protein responsible for the fusion of male and female gametes is the same in plants and animals; it is the product of the HAP2 gene. This protein is of viral origin and allows the envelope of a virus to fuse with the plasma membrane of its host's cells (Fédry et al. 2017). Thus, a gene essential to sexual reproduction was...

Systemvoraussetzungen

Als PDF speichern Als Link merken

Function and Evolution of Repeated DNA Sequences

Beschreibung

Weitere Details

Weitere Ausgaben

Person

Inhalt

Introduction About Repeated Genomes

I.1. The "C-value" paradox

I.2. Recycling junk DNA

Systemvoraussetzungen

Introduction
About Repeated Genomes