Download Evolutionary genomics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

DNA barcoding wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Molecular cloning wikipedia , lookup

Gene expression wikipedia , lookup

Gene regulatory network wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

RNA-Seq wikipedia , lookup

Ridge (biology) wikipedia , lookup

Transcriptional regulation wikipedia , lookup

Genomic imprinting wikipedia , lookup

Promoter (genetics) wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

Gene expression profiling wikipedia , lookup

Community fingerprinting wikipedia , lookup

Silencer (genetics) wikipedia , lookup

Gene wikipedia , lookup

Genomic library wikipedia , lookup

Non-coding DNA wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Transposable element wikipedia , lookup

Molecular evolution wikipedia , lookup

Genome evolution wikipedia , lookup

Transcript
March 1st, 2010
Bioe 109
Winter 2010
Lecture 20
Evolutionary Genomics
- we have now entered the genomics age
- the number of complete genomes continues to rise rapidly each year, now numbering about 200.
- it is shocking to see how far we have come so fast.
- not too long ago (back in the early 1970s), the “C-value paradox” was still raging.
- the “C-value” represents the amount of DNA in a haploid genome.
- since most metazoans are diploid, the C-value represents one half of their total DNA content.
- using C-values allows us to compare haploid and diploid organisms.
- here are some data:
Genus
C-value (kb)
Navicola (diatom)
Drosophila (fruit fly)
Gallus (chicken)
Cyprinus (carp)
Boa (snake)
Rattus (rat)
Homo (human)
Schistocerca (locust)
Allium (onion)
Lilium (lily)
Ophioglossum (fern)
Amoeba (amoeba)
35,000
180,000
1,200,000
1,700,000
2,100,000
2,900,000
3,400,000
9,300,000
18,000,000
36,000,000
160,000,000
670,000,000
- if the genomes of most species were composed of single-copy functional genes, then we would
predict a strong correlation between the degree of morphological and developmental complexity
of an organism and its DNA content.
- the lack of a correlation between the complexity and total DNA content gave rise to what was
called the “C-value paradox”.
- this paradox was partially resolved when it became evident that the proportion of the genome
that encodes for structural gene loci is quite small.
- in humans, only about 1.2% codes for proteins.
- the vast majority of the DNA most organisms lug around is composed of non-functional, noncoding, and parasitic elements.
- the proportion of DNA that is actually coding usually falls well below 5%.
- the C-value paradox has now been replaced by what is called the “C-value enigma”.
- the C-value enigma addresses why the total amount of non-coding DNA varies so dramatically
among lineages.
- some questions it addresses are:
1. What kinds of DNA make up the non-coding majority of different genomes?
2. How is non-coding DNA gained and lost from genomes over evolutionary time?
3. Why are some genomes so streamlined while others so large?
- here are some of the observed ranges of genome sizes:
Group
Genome size range (kb)
Ratio (highest/lowest)
Protists
Fungi
Molluscs
Insects
Bony fishes
Amphibians
Reptiles
Birds
Mammals
Angiosperms
Gymnosperms
23,500 – 686,000,000
8,800 – 1,470,000
421,000 – 5,290,000
98,000 – 7,350,000
382,000 – 139,000,000
931,000 – 84,3000,000
1,230,000 – 2,250,000
1,670,000 – 2,250,000
1,420,000 – 5,680,000
50,000 – 125,000,000
4,120,000 – 76,900,000
29,191
167
13
75
364
91
4
1.3
4
2,500
17
- it is now clear that there is no relationship between genome size and overall complexity, nor
between genome size and the total number genes present in a species.
- Lynch and Conery (2003) have proposed that a reduction in effective population size is
responsible for increase in genome size we typically see between prokaryotes and eukaryotes.
- according to their model this resulted in a reduced efficiency of natural selection to remove
insertions of transposable elements and gene duplications.
- much of the variation we see within and among groups results from complete genome
duplications (polyploidization events), variable numbers of transposable elements, and, in the
case of some parasitic groups, the loss of large amounts of DNA.
Complete genome data
- complete genomes have been obtained for about 200 species.
Species
Haemophilus influenzae
Escherichia coli
Baker’s yeast
Fruit fly
Nematode worm
Human
Arabidopsis
Number of genes
1,743
4,288
~6,200
~14,000
~19,000
~21,000
~26,000
Rice
~37,500
- since the vast majority of genes encode for proteins, the next great challenge will be to work out
the functional roles of each, how they interact, and how they are regulated.
- the entire collection of proteins that a cell or organism produces is called its proteome.
- the proteome contains a number of distinctly different groups such as enzymes, structural
proteins, transport proteins, cell-signaling proteins, etc.
- an extremely important finding is that the proteome is much larger than the genome.
- there are two reasons for this.
- first, genes may undergo alternative splicing where, for example, different protein products
may be missing some exons.
- these splicing pathways are commonly cell-specific or differ between developmental stages or
environmental conditions.
- second, proteins may undergo post-translational modification that may either be permanent or
reversible.
- permanent modifications include things such as proteolytic processing, disulfide bond
formation, or the addition of prosthetic groups, carbohydrates or lipids.
- reversible modifications include such things as phosporylation, acetylation, or methylation.
- the net outcome is that the same gene can produce many different protein products.
- this fact will greatly complicate the study of the proteome.
Transposable elements
- our genomes are populated by large numbers of “selfish” genetic elements collectively referred
to as “transposable elements (TEs)”.
- transposable elements have been identified in all organismal groups.
- about 44% of the human genome is comprised of transposable elements.
- the widespread presence of TEs shows that genomes are riddled with parasitic elements whose
sole purpose is to replicate themselves at the host’s expense.
- there are two basic categories of transposable elements:
Class I elements
- there are called retrotransposons.
- replication is through an RNA intermediate.
- the transposition event is replicative (meaning that the original element remains intact).
- one common class of retrotransposons are called LINES (long interspersed elements)
- in mammals, LINEs are typically 6-7 kb in length.
- another important category of retrotransposons is characterized by the presence of long terminal
repeats (LTRs), which are a characteristic of retroviral genomes.
- this suggests that LTRs evolved from retroviruses.
- in fact, retrotransposons resemble retroviruses that have lost the ability to make capsule
proteins.
- these parasitic elements may thus have evolved to replicate vertically rather than horizontally.
- a second important type of Class I elements are called retrosequences.
- these do not encode for a reverse transcriptase but amplify through RNA intermediates that are
reverse transcribed and inserted to new locations in the genome.
- some of the best-studied retrosequences are called SINES (short interspersed elements).
- SINES are grouped into different families that show a resemblance to different functional genes
(such as tRNAs).
- SINES are especially abundant in primates – the human genome, for example, has over a
million copies of a SINE called Alu.
- in most SINE families there is only one or a few master copies that actively transposing – the
remainder resemble pseudogenes.
- how this replication proceeds is still poorly understood.
Class II elements
- unlike Class I elements, Class II elements replicate via a DNA intermediate and are the most
common transposable elements present in bacteria
- the transposition of Class II elements may be replicative (as in Class I elements) or
conservative, where the original element is excised during the move so copy number does not
change.
- this is analogous to the “copy” and “paste” functions of a word processor.
- when Class II elements contain one or more protein-coding sequences they are called
transposons.
- transposons encode a protein called transposase that catalyzes transposition.
- the abundance of various types of transposable elements varies considerably between species
(see Table 15.1 in textbook).
- primates in particular have a large number of LINES and SINES.
- in humans about 20% of our genome are LINE elements and about 15% are SINE elements.
- if transposable elements insert into coding DNA regions, they will likely disrupt the function of
that gene and experience purifying selection.
- as expected, TEs are most abundant in non-coding heterochromatic regions near centromeres.
- it appears that most species have mechanisms in place to control the spread of TEs.
- one way this occurs is through DNA methylation – a mechanism used to silence gene
expression.
- an example of what can happen when this constraint is removed was provided by Waugh
O’Neill’s study on the hybrid offspring of two wallaby species.
- for reasons that are not clear, the hybrid’s DNA was virtually unmethylated.
- in the hybrids, a retrotransposon named KERV-1 had exploded in abundance, particularly near
centromeric regions (see Fig. 15.1 in textbook).
- by contrast, there was no detection of KERV-1 elements in either parental species.
- this observation supports the hypothesis that methylation (among other things) serves to protect
a host’s genome against uncontrolled expansion of TEs.
Why are transposable elements important?
- TEs can lead to adaptive molecular evolution in a number of ways.
- in bacteria, transposons mobilize genes for antibiotic resistance, heavy-metal tolerance, etc.,
into plasmids.
- plasmids are autonomously replicating circular DNA molecules that exist within bacterial cells.
- many plasmids also contain genes that enhance their spread between different bacteria.
- plasmids are the major source of multiple antibiotic-resistance genes called resistance transfer
factors found in highly pathogenic strains.
- in eukaryotes, TEs can lead to the formation of novel genes through a process known as exonshuffling.
- LINE elements have been found to insert exons and/or regulatory elements into new locations
in the genome.
- in the rice genome, Jiang et al. (2004) found 3,000 copies of a Class II element that contained
some fragments of functional genes.
- many of these fragments appeared to be expressed and incorporated into proteins or RNA
molecules.
- these results suggest that mixing and matching exons among genes can lead to novel and
presumably adaptive new combinations.
- the insertions of TEs have also been found to modify the expression of nearby genes.
- in doing so the mobilization of TEs is though to play some role in generating variation in
quantitative trait loci (QTLs) that control continuously varying traits.
- finally, TEs can also play an important role in genesis of major chromosome rearrangements
(i.e., translocations, inversions, and Robertsonian fusions and fissions).
- these large-scale changes can have important effects on local gene regulation and lead to
problems in chromosome pairing that may contribute to speciation.
Lateral gene transfer
- transposable elements showed that the genomes of most organisms are far more dynamic than
previously thought.
- a far more dramatic example was the discovery of lateral gene transfer (LGT) (also called
horizontal gene transfer).
- here, genes move “laterally” or “horizontally” between species instead of “vertical”.
- in many cases, the species involved in the transfer are closely related.
- however, in other cases they can be distantly related.
- one example (see Fig. 15.4 in textbook) involves the lateral transfer of an HMGcoA reductase
gene between a bacteria and an Archaea (Archeoglobus fulgidis).
- how does lateral gene transfer occur?
- there are four known mechanisms.
1. Viral transfer
- in bacteria, viral excision has been found to sometimes contain bacterial DNA.
- when the virus infects a different host species, it can transfer the DNA between species.
- this process is also called transduction.
2. Conjugation
- plasmids can move between bacterial cells by conjugation.
- occasionally, conjugation events can occur between bacterial and archaeal species.
3. Transformation
- some bacteria can uptake DNA directly from their environment.
- occasionally, the DNA can be incorporated into the hosts chromosome – this process is called
transformation.
4. Endosymbiosys
- endosymbiosis is type of mutualism where one species can tale up temporary residence within
another.
- there are many examples of contemporary endosymbioses.
- in these examples, transfer of DNA can occur between the endosymbiont and the host.
- both mitochondria and chloroplasts were once free-living cells that took up permanent
residence in eukaryotic cells.
- mitochondria were once free-living -proteobacteria while chloroplasts were once free-living
cyanobacteria.
- although both organelles still possess circular DNA molecules, the majority of their genomes
have been transferred to the nucleus.
- most chloroplasts now have about 100 genes, while mitochondria typically has 37.
- detailed searches of the yeast and human genomes have identified about 630 genes with an proteobacterial ancestry.
- if we assume that the original endosymbiont had about 4,300 genes, this means that the vast
majority of its original genome was lost or slowly taken over by host genes.
- the movement of genes from the mitochondrial genome to the nucleus is still continuing in
some plant groups.
- for example, in most plants the cox2 gene (a member of the electron transport chain) is usually
found in the mitochondrial genome.
- however, in peas there exist copies of cox2 in both the mitochondria and the nucleus.
- in mung beans, there is only a nuclear copy.
- in animal mtDNA, different subunits of some cytochrome oxidase genes also have nuclear and
mitochondrial locations.
- the movement of random pieces of mitochondrial DNA to the nucleus continues.
- these nuclear mitochondria DNA (Numts) have been discovered in a growing number of
species.
- for example, in the domestic cat one Numt is tandemly repeated between 38 and 76 times at a
single genomic locus on cat chromosome D2.
Lateral gene transfer in Bacteria and Archaea
- lateral gene transfer is the norm both among and within both Bacteria and Archaea.
- for example, E. coli K12 was found to have an estimated 4,288 genes (38% of which are still of
unknown function).
- about 18% of E. coli K12’s genome is thought to have been acquired by LGT.
- when a closely related strain was sequenced (O157:H7), this bacterium was found to have 5,361
genes, of which 1,387 are not present in K12.
- K12 in turn has 528 genes not found in O157:H7.
- what types of genes are typically transferred between species?
- it turns out that most laterally transferred genes encode for proteins that control novel types of
metabolism or adaptations to specific environments.
- for example, 17% of the genome of an Archaea (Thermoplasma acidophilum) that lives in high
temperature (60C) and low pH (2.0) conditions is similar to a distantly related species living in
the same habitats (Sulfolobus solfataricus).
- the genes involved in this transfer function in the uptake and processing of nutrients and were
likely swapped in that habitat.
- most bacterial and archaeal species also possess a set of core housekeeping genes (controlling
for example DNA replication, protein synthesis, etc.) that are rarely transferred.
Comparative genomics
- comparing the genomes of two or more species allows a number of interesting questions to be
raised.
- for example, comparing non-coding DNA allows us to identify regions that are highly
conserved and thus likely to have some function.
- such ultra-conserved regions have been identified (by David Haussler’s group at UCSC) and
have provoked great interest around the world.
- in a similar vein, comparisons between homologous genes allow us to identify loci that are (i)
highly conserved (i.e., experiencing strong selective constraint) and (ii) evolving very quickly
(i.e., experiencing strong directional selection).
- comparisons between the human and chimpanzee genomes have identified genes expressed in
the brain that may have played a major role in the evolution of uniquely human traits (such as
language – the SOXP2 gene).