* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download 1. What is a gene?
List of types of proteins wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
Gene desert wikipedia , lookup
Gene expression wikipedia , lookup
Transcriptional regulation wikipedia , lookup
Genomic imprinting wikipedia , lookup
Ridge (biology) wikipedia , lookup
Gene regulatory network wikipedia , lookup
Community fingerprinting wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Promoter (genetics) wikipedia , lookup
Genomic library wikipedia , lookup
Gene expression profiling wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Non-coding DNA wikipedia , lookup
Molecular evolution wikipedia , lookup
GENES AND GENOMES This document is licensed under the Attribution-NonCommercial-ShareAlike 2.5 Italy license, available at http://creativecommons.org/licenses/by-nc-sa/2.5/it/ Genetica per Scienze Naturali a.a. 08-09 prof S. Presciuttini 1. What is a gene? Definition: A gene is a discrete unit of DNA (or RNA in some viruses) that encodes a nucleic acid or protein product that contributes to or influences the phenotype of the cell or the organism. Genes are the functional units of chromosomal DNA. Each gene not only encodes the structure of some cellular product, but also bears control elements (short sequences) that determine when, where, and how much of that product is synthesized. Most genes encode protein products; special classes of genes encode for RNA molecules. The way genes encode proteins is indirect and involves several steps. The first step is to copy (transcribe) the information encoded in the DNA of the gene as a related but single-stranded molecule called messenger RNA. Subsequently the information in the messenger RNA is translated (decoded) into a string of amino acids called a polypeptide. The polypeptides, on their own or by aggregating with other polypeptides and cell constituents, form the functional proteins of the cell. Genetica per Scienze Naturali a.a. 08-09 prof S. Presciuttini 2. Introns and exons Trying to pinpoint precisely what genes are is complicated by the fact that many eukaryotic genes contain mysterious segments of DNA, called introns, interspersed in the transcribed region of the gene. Introns do not contain information for functional gene product such as protein. They are transcribed together with the coding regions (called exons) but are then excised from the initial transcript. Since correct sequence in the introns (as well as in the regulatory region) is necessary in order to generate a properly sized transcript at the right time and place, introns (along with coding and regulatory regions) should be considered part of the overall functional unit, in other words, part of the gene Genetica per Scienze Naturali a.a. 08-09 prof S. Presciuttini 4. Schematic gene structure Generalized gene structure in prokaryotes and eukaryotes. The coding region (dark green) is the region that contains the information for the structure of the gene product (usually a protein). The adjacent regulatory regions (lime green) contain sequences that are recognized and bound by proteins that make the gene's RNA and by proteins that influence the amount of RNA made. Genetica per Scienze Naturali a.a. 08-09 prof S. Presciuttini 3. The average lenght of coding regions Organism Vibrio cholerae (bacterium) Saccharomyces cerevisiae (yeast) Drosophila melanogaster (fruit fly) Cenorhabditis elegans (nematode) Arabidopsis thaliana (weed) Homo sapiens Average length of gene product (aa) 304 477 492 436 435 497 Estimates of the average length of polypeptide chains coded by genes of various organisms; these value have to be multiplied by 3 in order to obtaing the lenght of the corresponding coding DNA. Tipical values are 1,000 to 1,500 bp. Genetica per Scienze Naturali a.a. 08-09 prof S. Presciuttini 5. Number of introns-exons per gene Distribution of the number of exons among genes of three organisms Many eukaryotic genes contain mysterious segments of DNA, called introns, interspersed in the region of the gene. Introns do not contain information for functional gene product such as protein. Genetica per Scienze Naturali a.a. 08-09 prof S. Presciuttini 6. Genomes and genes Genome Group Size (kb) Number of genes 13,500 6,000 Eukaryotic nucleus Saccharomyces cerevisiae Yeast Caenorhabditis elegans Nematode 100,000 13,500 Arabidopsis thaliana Plant 120,000 25,000 Homo sapiens Human 3,000,000 100,000 Prokaryote Escherichia coli Bacterium 4,700 4,000 Hemophilus influenzae Bacterium 1,830 1,703 Methanococcus jannaschii Bacterium 1,660 1,738 Viruses T4 Bacterial virus 172 300 HCMV (herpes group) Human virus 229 200 Eukaryotic organelles S. cerevisiae mitochondria Yeast 78 34 H. sapiens mitochondria Human 17 37 121 136 Marchantia polymorpha chloroplast Liverwort The number of genes increases with genome size, but the trend is complicated due to repetitive DNA and introns. Counting genes is difficult, even in completely sequenced genomes The figure of 100,000 for human is substantially inflated Genetica per Scienze Naturali a.a. 08-09 prof S. Presciuttini 7. How many genes in the human genome? Prior to the human genome sequence, the most commonly cited estimate for the number of protein-coding genes in the human genome was 100,000, even though the basis of this figure was somewhat dubious to begin with. In 2001, when the draft sequences of the human genome were announced, the estimates were lowered somewhere between 30,000 and 35,000 The completed sequence, published in 2004, provided an even lower estimate of 20,000 to 25,000 genes A more recent estimate, based on comparing all the human genes with those cataloged for dog and mouse, has even decreased the number of genes below 20,000. Genetica per Scienze Naturali a.a. 08-09 prof S. Presciuttini 8. How to count genes Gene-prediction programs rely heavily on identifying open reading frames. However, sequences that have a biological function but don't produce a protein have been found in large quantity, and several thousand of “genes” that don't code for proteins have been reported. Thus an open reading frame "is not enough" to identify a gene, and an integrated catalog of protein-coding genes should be based more on comparative evidence from different genomes. Genetica per Scienze Naturali a.a. 08-09 prof S. Presciuttini 9. Average gene length Intron/exon statistics for various organisms Genetica per Scienze Naturali a.a. 08-09 prof S. Presciuttini 10. Plasmid genomes Bacterial cells isolated from nature often contain small DNA elements that are not essential for the basic operation of the bacterial cell. These elements are called plasmids. Plasmids are symbiotic molecules that cannot survive at all outside of cells. Even though plasmids are not part of the basic operational system of their host cells, some are quite complex, carrying many genes, so it is quite appropriate to refer to their distinctive DNA as a "plasmid genome." Bacterial plasmids often contain genes that are extremely useful to the bacterial host, for example, by promoting bacterial cell fusion, conferring antibiotic resistance, or producing toxins. Plasmids also are occasionally found in fungal and plant cells. Most are found inside mitochondria and chloroplasts, but some are found in nuclei or in the cytosol. Unlike the bacterial plasmids mentioned above, these eukaryotic plasmids seem to provide no benefits for their hoststhey seem to exist selfishly, only for the purpose of their own propagation. For their replication and maintenance, plasmids depend on the general cellular machinery encoded by the host genome. Bacterial plasmids are most often circular, but there are linear types too. In fungi and plants, linear plasmids are most common, but circular types are known in fungi. Genetica per Scienze Naturali a.a. 08-09 prof S. Presciuttini 11. Organellar genomes Mitochondrial and chloroplast chromosomes consist of double-stranded DNA molecules. Individual mitochondria and chloroplasts contain identical multiple copies of their chromosomes, and each eukaryotic cell contains several to many of these organelles. The organelle chromosomes contain genes specific to the functions of the organelle concerned. Nevertheless, most of the biological functions that occur inside these organelles are specified by genes in the nuclear genome. There is no overlap with the nuclear genome in gene content. Mitochondria and chloroplasts probably were originally prokaryotic cells that entered and took up a symbiotic relationship inside another cell. Throughout evolution most of the original prokaryotic genes were transferred to the nuclear genome or lost. Mitochondrial genomes can be eliminated in some organisms such as yeasts, but most organisms cannot survive without them, so there is still mutual interdependence between nuclear and organelle subdivisions of the genome. Chloroplasts can be eliminated only in photosynthetic organisms that can survive by taking in preformed nutrients from the environment (that is, that can act as Genetica per Scienze Naturali heterotrophs). a.a. 08-09 prof S. Presciuttini 12. Most eukaryotic DNA does not include genes Between genes there is DNA, mostly of unknown function. The size and nature of this DNA vary with the genome. In bacteria and fungi there is little, but in mammals the intergenic regions can be huge. Sequences of DNA that exist quite distant from a given gene can affect the regulation of that gene. They could thus be considered part of the functional gene unit, even though separated by long segments of DNA having nothing to do with the gene in question. In many eukaryotes some of the DNA between genes is repetitive, consisting of several different types of units repeated throughout the genome. Some of the repetitive DNA is dispersed; some is found in contiguous "tandem" arrays. Repetitive DNA is also found in some introns. The extent of this DNA is different in different species, and indeed there is variation of repeat number within species. Genetica per Scienze Naturali a.a. 08-09 prof S. Presciuttini 13. Comparing gene densities Schematic diagram of gene topography in four organisms. Light green = introns; dark green = exons; white = intergenic regions Genetica per Scienze Naturali a.a. 08-09 prof S. Presciuttini 14. A small fraction of total eukaryotic DNA is coding In mammals, only a few percent of the DNA is actualy coding: Genetica per Scienze Naturali a.a. 08-09 prof S. Presciuttini 15. Different components of the human genome Although most prokaryotic chromosomes consist almost entirely of protein-coding genes, such elements make up a small fraction of most eukaryotic genomes. As a prime example, the human genome might contain as few as 20,000 genes, comprising less than 1.5% of the total genome sequence Genetica per Scienze Naturali a.a. 08-09 prof S. Presciuttini 16. Junk DNA? Introns account for more than a quarter of the human genome. Pseudogenes are non-functional copies of coding genes. They include 'classical pseudogenes' (direct DNA to DNA duplicates), 'processed pseudogenes' (copies that are reverse transcribed back into the genome from RNA and therefore lack introns) and 'Numts' (nuclear pseudogenes of mitochondrial origin). The human genome is estimated to contain about 19,000 pseudogenes. Transposable elements are divided into Class I elements, which transpose through an RNA intermediate (long interspersed nuclear elements - LINEs, endogenous retroviruses, short interspersed nuclear elements - SINEs and long terminal repeat – LTR – retrotransposons) and Class II elements, which transpose directly from DNA to DNA (DNA transposons and miniature inverted repeat transposable elements (MITEs). Genetica per Scienze Naturali a.a. 08-09 prof S. Presciuttini 17. Coding sequences are needles in the haystack It is apparent that the coding sequences are only a small part of the genome in most eukaryotes, particularly in human. Finding these regions is like finding a needle in the haystack. In addition, the genes are not uniformly distributed. There are regions in the genome where the genes are packed together, and regions where they are sparse, where finding genes is like finding water in a desert. Genetica per Scienze Naturali a.a. 08-09 prof S. Presciuttini 18. Categorizing the genes in eukaryotic genomes Classification schemes based on gene function suggest that all eukaryotes possess the same basic set of genes, but that more complex species have a greater number of genes in each category. For example, humans have the greatest number of genes in all but one of the categories used in the figure, the exception being ‘metabolism' where Arabidopsis comes out on top as a result of its photosynthetic capability, which requires a large set of genes not present in the other four genomes included in this comparison. This functional classification reveals other interesting features, notably that C. elegans has a relatively high number of genes whose functions are involved in cellcell signaling, which is surprising given that this organism has just 959 cells. Humans, who have 1013 cells, have only 250 more genes for cell-cell signaling. Genetica per Scienze Naturali a.a. 08-09 prof S. Presciuttini 19. Overview of the human genome Genome size is approximately 3,200 Mb Gene number is approximately 20,000 Average gene density is 1 per 100 kb (5% of DNA encodes proteins); some areas are gene rich, others are gene deserts (0 to 64 genes per 100 kb) Average gene size (including introns) is 27 kb; gene regions account for about 25% of genome Average polypeptide size is 1.3 kb Fraction of genome with coding functions is about 1.5% At least 50% of genome made of transposable elements (e.g. LINES and Alus) Intron number ranges from 0 (in histones) to 234 (titin , a muscle protein). Hundreds of genes appear to have been transferred directly from bacteria to vertebrate genomes. Mechanism unknown. Functions have been assigned to 60% of genes. Largest human gene is dystrophin (mutated in muscular dystrophy): 2.5 Mb (larger than some bacterial genomes) 1077 blocks of duplicated regions in human genome (contain 10,000 genes): suggests genome rearrangements common in evolution Genetica per Scienze Naturali a.a. 08-09 prof S. Presciuttini