Chapter 2: Human genome
The human genome is comprised of all of the nucleic acid sequences that are present in humans. These sequences are stored as DNA inside the 23 chromosome pairs that are found in the nucleus of cells, as well as in a tiny DNA molecule that is located within the mitochondria of each individual cell. In most cases, they are considered to be two distinct components: the nuclear genome and the mitochondrial genome. while female diploid genomes, which are present in somatic cells, contain two times as much DNA as male genomes.
Although there are substantial variances in the genomes of different human beings (on the order of 0.1 percent due to single-nucleotide variations), the vast majority of these changes are attributable to shared evolutionary history. The size of the basepairs might also vary: After each round of chromosomal duplication, the length of the telomeres becomes shorter.
Even though the sequence of the human genome has been completely determined through DNA sequencing, recent findings suggest that the majority of the enormous quantities of noncoding DNA contained within the genome have associated biochemical activities. These activities include the regulation of gene expression, the organization of chromosome architecture, and the signals controlling epigenetic inheritance. There are also many retroviruses in human DNA, and at least three of them have been shown to play a vital function. HIV-like HERV-K, HERV-W, and HERV-FRD all play a part in the machinery of the placenta by stimulating cell-cell fusion.
In 2003, researchers announced that they had sequenced 85 percent of the whole human genome; yet, in the year 2020, at least 8 percent of the genome remained still unaccounted for.
In February of 2001, the Human Genome Project released the first draft sequences of the human genome. These sequences were essentially complete at the time of publication.
These data are used all around the globe in the fields of biological science, anthropology, forensics, and other areas of research in the scientific community. These genomic investigations have resulted in developments in the detection and treatment of illnesses, as well as new insights in many areas of biology, including the evolution of humans.
By the year 2012, it has been discovered that some functioning DNA elements do not encode RNA or proteins.
Despite the fact that it was declared "completed" in 2001, the human genome project is still ongoing, The whole of the human reference genome, which does not reflect the sequencing of any one particular person. In addition to the 22 pairs of chromosomes that make up the autosomes, the genome also contains the 23rd pair of sex chromosomes, which are denoted by the letters XX in females and XY in males. When the X chromosome is included in the calculation, the haploid genome has a total of 3 054 815 472 base pairs. However, this number decreases to 2 963 015 935 base pairs when the Y chromosome is used in its place. All of these chromosomes are long, linear stretches of DNA that are housed inside of the nucleus of the cell. The mitochondrial DNA, which is a very tiny circular molecule, is included in the genome as well. There are numerous copies of this DNA in each and every mitochondrion.
Ensembl database at the European Bioinformatics Institute (EBI) and the Wellcome Trust Sanger Institute both hosted the original study that was subsequently published. A more recent estimate of human chromosome lengths based on updated data reports 205.00 cm for the diploid male genome and 208.23 cm for female, corresponding to weights of 6.41 and 6.51 picograms (pg), respectively. Chromosome lengths were calculated by multiplying the number of base pairs (of an older reference genome, not CHM13v2.0) by 0.34 nanometers (the distance between base pairs in the most common structure of the DNA double helix). The number of proteins is calculated based on the number of initial precursor mRNA transcripts. This calculation does not take into account the results of alternative pre-mRNA splicing or the alterations to protein structure that take place after translation.
Variations are one-of-a-kind differences in DNA sequence that Ensembl's analysis of individual human genome sequences up to December 2016 has uncovered. These variances may be found in variations. It is anticipated that the number of variants found will rise as further personal genomes are sequenced and studied. In addition to the gene content that is shown in this table, a huge number of functional sequences that are not expressed have been found everywhere across the human genome (see below). These links will open new windows in the EBI genome browser that will take you to the reference chromosomal sequences.
RNAs with as much as 200 bases that do not have the capacity to code for proteins are referred to as small non-coding RNAs. MicroRNAs, also known as miRNAs, are a kind of post-transcriptional gene expression regulation. Tiny nuclear RNAs, also known as snRNAs, are the RNA components of spliceosomes. Finally, small nucleolar RNAs, also known as snoRNAs, are also present (involved in guiding chemical modifications to other RNA molecules). RNA molecules that are longer than 200 bases and do not have the capacity to code for proteins are referred to as long non-coding RNAs. Among these are ribosomal RNAs, also known as rRNAs (which are the RNA components of ribosomes), as well as a variety of other long RNAs that are involved in the regulation of gene expression, epigenetic modifications of DNA nucleotides and histone proteins, and the regulation of the activity of protein-coding genes. Because the earlier figures were obtained from Ensembl release 87 and the later values were obtained from Ensembl release 68, there are some slight variations between the numbers of overall small ncRNA and the numbers of particular categories of small ncNRAs.
It is not exactly obvious how many genes are included in the human genome since the functions of a large number of transcripts are not fully understood. This is particularly true for RNA that does not code for proteins. The number of genes that code for proteins is now better understood, but scientists estimate that there are still about 1,400 genes with unknown functions. These genes are often encoded by short open reading frames and may or may not produce functional proteins.
The haploid human genome is comprised of 23 chromosomes and has a length of around 3 billion base pairs. It also has approximately 30,000 genes.
It is usual practice to separate the contents of the human genome into coding DNA sequences and noncoding DNA sequences. In the context of the human life cycle, coding DNA refers to those sequences that have the potential to be transcribed into messenger RNA and then translated into proteins; nevertheless, these sequences only make up a very tiny percentage of the genome (less than 2 percent). Approximately 98 percent of the human genome is composed of sequences that are not employed to generate proteins. These sequences make up the genome's noncoding DNA.
A portion of the DNA that is not used for coding includes genes for RNA molecules that perform essential biological activities (noncoding RNA, for example ribosomal RNA and transfer RNA). An important objective of modern genome research, such as the ENCODE (Encyclopedia of DNA Elements) project, which aims to survey the entirety of the human genome utilizing a variety of experimental tools whose results are indicative of molecular activity, is the investigation of the function and evolutionary origin of noncoding DNA. This is one of the most important goals of contemporary genome research.
The idea of the sequenced genome has developed into a more concentrated analytical concept than the traditional concept of the DNA-coding gene as a result of the vastly disproportionate proportion of non-coding DNA to coding DNA.
The portion of the human genome that codes for proteins has been the subject of the greatest research and is thus the most well known. Although there are various biological mechanisms (such as DNA rearrangements and alternative pre-mRNA splicing) that may lead to the synthesis of many more distinct proteins than the number of protein-coding genes, these sequences eventually lead to the production of all human proteins. The exome is comprised of DNA sequences that are encoded by exons and have the potential to be translated into proteins. It is the only part of the genome that has the whole modular protein-coding capability of the genome. The sequencing of the exome was the first important milestone achieved by the Human Genome Project. This was due to the exome's biological significance, as well as the fact that it only makes up less than 2 percent of the genome.
The total number of genes that code for proteins. Annotations have been added to databases such as Uniprot for about 20,000 human proteins. It is not true that there are much more protein-coding genes in humans than there are in other animals with a lower level of complexity, such as the roundworm and the fruit fly. This distinction could be the consequence of humans making significant use of alternative pre-mRNA splicing, which gives them the potential to generate a very large number of modular proteins by selectively including exons.
The ability of each chromosome to code for proteins. The number of genes that code for proteins is unevenly distributed among the chromosomes. It may range from a few hundred to more than 2000, with chromosomes 1, 11, and 19 having an unusually high gene density. There are a variety of gene-rich and gene-poor areas on each and every chromosome, and these regions may be associated with chromosomal bands and GC-content.
The term "noncoding DNA" refers to all of the...