* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Evolutionary genomics
DNA barcoding wikipedia , lookup
Deoxyribozyme wikipedia , lookup
Molecular cloning wikipedia , lookup
Gene expression wikipedia , lookup
Gene regulatory network wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
Ridge (biology) wikipedia , lookup
Transcriptional regulation wikipedia , lookup
Genomic imprinting wikipedia , lookup
Promoter (genetics) wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Gene expression profiling wikipedia , lookup
Community fingerprinting wikipedia , lookup
Silencer (genetics) wikipedia , lookup
Genomic library wikipedia , lookup
Non-coding DNA wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Transposable element wikipedia , lookup
March 1st, 2010 Bioe 109 Winter 2010 Lecture 20 Evolutionary Genomics - we have now entered the genomics age - the number of complete genomes continues to rise rapidly each year, now numbering about 200. - it is shocking to see how far we have come so fast. - not too long ago (back in the early 1970s), the “C-value paradox” was still raging. - the “C-value” represents the amount of DNA in a haploid genome. - since most metazoans are diploid, the C-value represents one half of their total DNA content. - using C-values allows us to compare haploid and diploid organisms. - here are some data: Genus C-value (kb) Navicola (diatom) Drosophila (fruit fly) Gallus (chicken) Cyprinus (carp) Boa (snake) Rattus (rat) Homo (human) Schistocerca (locust) Allium (onion) Lilium (lily) Ophioglossum (fern) Amoeba (amoeba) 35,000 180,000 1,200,000 1,700,000 2,100,000 2,900,000 3,400,000 9,300,000 18,000,000 36,000,000 160,000,000 670,000,000 - if the genomes of most species were composed of single-copy functional genes, then we would predict a strong correlation between the degree of morphological and developmental complexity of an organism and its DNA content. - the lack of a correlation between the complexity and total DNA content gave rise to what was called the “C-value paradox”. - this paradox was partially resolved when it became evident that the proportion of the genome that encodes for structural gene loci is quite small. - in humans, only about 1.2% codes for proteins. - the vast majority of the DNA most organisms lug around is composed of non-functional, noncoding, and parasitic elements. - the proportion of DNA that is actually coding usually falls well below 5%. - the C-value paradox has now been replaced by what is called the “C-value enigma”. - the C-value enigma addresses why the total amount of non-coding DNA varies so dramatically among lineages. - some questions it addresses are: 1. What kinds of DNA make up the non-coding majority of different genomes? 2. How is non-coding DNA gained and lost from genomes over evolutionary time? 3. Why are some genomes so streamlined while others so large? - here are some of the observed ranges of genome sizes: Group Genome size range (kb) Ratio (highest/lowest) Protists Fungi Molluscs Insects Bony fishes Amphibians Reptiles Birds Mammals Angiosperms Gymnosperms 23,500 – 686,000,000 8,800 – 1,470,000 421,000 – 5,290,000 98,000 – 7,350,000 382,000 – 139,000,000 931,000 – 84,3000,000 1,230,000 – 2,250,000 1,670,000 – 2,250,000 1,420,000 – 5,680,000 50,000 – 125,000,000 4,120,000 – 76,900,000 29,191 167 13 75 364 91 4 1.3 4 2,500 17 - it is now clear that there is no relationship between genome size and overall complexity, nor between genome size and the total number genes present in a species. - Lynch and Conery (2003) have proposed that a reduction in effective population size is responsible for increase in genome size we typically see between prokaryotes and eukaryotes. - according to their model this resulted in a reduced efficiency of natural selection to remove insertions of transposable elements and gene duplications. - much of the variation we see within and among groups results from complete genome duplications (polyploidization events), variable numbers of transposable elements, and, in the case of some parasitic groups, the loss of large amounts of DNA. Complete genome data - complete genomes have been obtained for about 200 species. Species Haemophilus influenzae Escherichia coli Baker’s yeast Fruit fly Nematode worm Human Arabidopsis Number of genes 1,743 4,288 ~6,200 ~14,000 ~19,000 ~21,000 ~26,000 Rice ~37,500 - since the vast majority of genes encode for proteins, the next great challenge will be to work out the functional roles of each, how they interact, and how they are regulated. - the entire collection of proteins that a cell or organism produces is called its proteome. - the proteome contains a number of distinctly different groups such as enzymes, structural proteins, transport proteins, cell-signaling proteins, etc. - an extremely important finding is that the proteome is much larger than the genome. - there are two reasons for this. - first, genes may undergo alternative splicing where, for example, different protein products may be missing some exons. - these splicing pathways are commonly cell-specific or differ between developmental stages or environmental conditions. - second, proteins may undergo post-translational modification that may either be permanent or reversible. - permanent modifications include things such as proteolytic processing, disulfide bond formation, or the addition of prosthetic groups, carbohydrates or lipids. - reversible modifications include such things as phosporylation, acetylation, or methylation. - the net outcome is that the same gene can produce many different protein products. - this fact will greatly complicate the study of the proteome. Transposable elements - our genomes are populated by large numbers of “selfish” genetic elements collectively referred to as “transposable elements (TEs)”. - transposable elements have been identified in all organismal groups. - about 44% of the human genome is comprised of transposable elements. - the widespread presence of TEs shows that genomes are riddled with parasitic elements whose sole purpose is to replicate themselves at the host’s expense. - there are two basic categories of transposable elements: Class I elements - there are called retrotransposons. - replication is through an RNA intermediate. - the transposition event is replicative (meaning that the original element remains intact). - one common class of retrotransposons are called LINES (long interspersed elements) - in mammals, LINEs are typically 6-7 kb in length. - another important category of retrotransposons is characterized by the presence of long terminal repeats (LTRs), which are a characteristic of retroviral genomes. - this suggests that LTRs evolved from retroviruses. - in fact, retrotransposons resemble retroviruses that have lost the ability to make capsule proteins. - these parasitic elements may thus have evolved to replicate vertically rather than horizontally. - a second important type of Class I elements are called retrosequences. - these do not encode for a reverse transcriptase but amplify through RNA intermediates that are reverse transcribed and inserted to new locations in the genome. - some of the best-studied retrosequences are called SINES (short interspersed elements). - SINES are grouped into different families that show a resemblance to different functional genes (such as tRNAs). - SINES are especially abundant in primates – the human genome, for example, has over a million copies of a SINE called Alu. - in most SINE families there is only one or a few master copies that actively transposing – the remainder resemble pseudogenes. - how this replication proceeds is still poorly understood. Class II elements - unlike Class I elements, Class II elements replicate via a DNA intermediate and are the most common transposable elements present in bacteria - the transposition of Class II elements may be replicative (as in Class I elements) or conservative, where the original element is excised during the move so copy number does not change. - this is analogous to the “copy” and “paste” functions of a word processor. - when Class II elements contain one or more protein-coding sequences they are called transposons. - transposons encode a protein called transposase that catalyzes transposition. - the abundance of various types of transposable elements varies considerably between species (see Table 15.1 in textbook). - primates in particular have a large number of LINES and SINES. - in humans about 20% of our genome are LINE elements and about 15% are SINE elements. - if transposable elements insert into coding DNA regions, they will likely disrupt the function of that gene and experience purifying selection. - as expected, TEs are most abundant in non-coding heterochromatic regions near centromeres. - it appears that most species have mechanisms in place to control the spread of TEs. - one way this occurs is through DNA methylation – a mechanism used to silence gene expression. - an example of what can happen when this constraint is removed was provided by Waugh O’Neill’s study on the hybrid offspring of two wallaby species. - for reasons that are not clear, the hybrid’s DNA was virtually unmethylated. - in the hybrids, a retrotransposon named KERV-1 had exploded in abundance, particularly near centromeric regions (see Fig. 15.1 in textbook). - by contrast, there was no detection of KERV-1 elements in either parental species. - this observation supports the hypothesis that methylation (among other things) serves to protect a host’s genome against uncontrolled expansion of TEs. Why are transposable elements important? - TEs can lead to adaptive molecular evolution in a number of ways. - in bacteria, transposons mobilize genes for antibiotic resistance, heavy-metal tolerance, etc., into plasmids. - plasmids are autonomously replicating circular DNA molecules that exist within bacterial cells. - many plasmids also contain genes that enhance their spread between different bacteria. - plasmids are the major source of multiple antibiotic-resistance genes called resistance transfer factors found in highly pathogenic strains. - in eukaryotes, TEs can lead to the formation of novel genes through a process known as exonshuffling. - LINE elements have been found to insert exons and/or regulatory elements into new locations in the genome. - in the rice genome, Jiang et al. (2004) found 3,000 copies of a Class II element that contained some fragments of functional genes. - many of these fragments appeared to be expressed and incorporated into proteins or RNA molecules. - these results suggest that mixing and matching exons among genes can lead to novel and presumably adaptive new combinations. - the insertions of TEs have also been found to modify the expression of nearby genes. - in doing so the mobilization of TEs is though to play some role in generating variation in quantitative trait loci (QTLs) that control continuously varying traits. - finally, TEs can also play an important role in genesis of major chromosome rearrangements (i.e., translocations, inversions, and Robertsonian fusions and fissions). - these large-scale changes can have important effects on local gene regulation and lead to problems in chromosome pairing that may contribute to speciation. Lateral gene transfer - transposable elements showed that the genomes of most organisms are far more dynamic than previously thought. - a far more dramatic example was the discovery of lateral gene transfer (LGT) (also called horizontal gene transfer). - here, genes move “laterally” or “horizontally” between species instead of “vertical”. - in many cases, the species involved in the transfer are closely related. - however, in other cases they can be distantly related. - one example (see Fig. 15.4 in textbook) involves the lateral transfer of an HMGcoA reductase gene between a bacteria and an Archaea (Archeoglobus fulgidis). - how does lateral gene transfer occur? - there are four known mechanisms. 1. Viral transfer - in bacteria, viral excision has been found to sometimes contain bacterial DNA. - when the virus infects a different host species, it can transfer the DNA between species. - this process is also called transduction. 2. Conjugation - plasmids can move between bacterial cells by conjugation. - occasionally, conjugation events can occur between bacterial and archaeal species. 3. Transformation - some bacteria can uptake DNA directly from their environment. - occasionally, the DNA can be incorporated into the hosts chromosome – this process is called transformation. 4. Endosymbiosys - endosymbiosis is type of mutualism where one species can tale up temporary residence within another. - there are many examples of contemporary endosymbioses. - in these examples, transfer of DNA can occur between the endosymbiont and the host. - both mitochondria and chloroplasts were once free-living cells that took up permanent residence in eukaryotic cells. - mitochondria were once free-living -proteobacteria while chloroplasts were once free-living cyanobacteria. - although both organelles still possess circular DNA molecules, the majority of their genomes have been transferred to the nucleus. - most chloroplasts now have about 100 genes, while mitochondria typically has 37. - detailed searches of the yeast and human genomes have identified about 630 genes with an proteobacterial ancestry. - if we assume that the original endosymbiont had about 4,300 genes, this means that the vast majority of its original genome was lost or slowly taken over by host genes. - the movement of genes from the mitochondrial genome to the nucleus is still continuing in some plant groups. - for example, in most plants the cox2 gene (a member of the electron transport chain) is usually found in the mitochondrial genome. - however, in peas there exist copies of cox2 in both the mitochondria and the nucleus. - in mung beans, there is only a nuclear copy. - in animal mtDNA, different subunits of some cytochrome oxidase genes also have nuclear and mitochondrial locations. - the movement of random pieces of mitochondrial DNA to the nucleus continues. - these nuclear mitochondria DNA (Numts) have been discovered in a growing number of species. - for example, in the domestic cat one Numt is tandemly repeated between 38 and 76 times at a single genomic locus on cat chromosome D2. Lateral gene transfer in Bacteria and Archaea - lateral gene transfer is the norm both among and within both Bacteria and Archaea. - for example, E. coli K12 was found to have an estimated 4,288 genes (38% of which are still of unknown function). - about 18% of E. coli K12’s genome is thought to have been acquired by LGT. - when a closely related strain was sequenced (O157:H7), this bacterium was found to have 5,361 genes, of which 1,387 are not present in K12. - K12 in turn has 528 genes not found in O157:H7. - what types of genes are typically transferred between species? - it turns out that most laterally transferred genes encode for proteins that control novel types of metabolism or adaptations to specific environments. - for example, 17% of the genome of an Archaea (Thermoplasma acidophilum) that lives in high temperature (60C) and low pH (2.0) conditions is similar to a distantly related species living in the same habitats (Sulfolobus solfataricus). - the genes involved in this transfer function in the uptake and processing of nutrients and were likely swapped in that habitat. - most bacterial and archaeal species also possess a set of core housekeeping genes (controlling for example DNA replication, protein synthesis, etc.) that are rarely transferred. Comparative genomics - comparing the genomes of two or more species allows a number of interesting questions to be raised. - for example, comparing non-coding DNA allows us to identify regions that are highly conserved and thus likely to have some function. - such ultra-conserved regions have been identified (by David Haussler’s group at UCSC) and have provoked great interest around the world. - in a similar vein, comparisons between homologous genes allow us to identify loci that are (i) highly conserved (i.e., experiencing strong selective constraint) and (ii) evolving very quickly (i.e., experiencing strong directional selection). - comparisons between the human and chimpanzee genomes have identified genes expressed in the brain that may have played a major role in the evolution of uniquely human traits (such as language – the SOXP2 gene).