Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 13 The Structure of Genomes Variations in genome anatomy in different organisms © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Contents Differences in gene structure among the life domains Variations in genome size Isochores Homology in noncoding regions Noncoding DNA Pseudogenes Repeats Transposons Function of noncoding DNA © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Variation in genome structure Genetic code is almost universal, but genome structure varies considerably from organism to organism Genomics reveals whole-genome view of organisms Sources of genome variation Duplication events (pseudogenes, genome duplications: poplar, rice, Arabidopsis; polyploidy) Transposons (sequence elements that jump around genome) Mutations (e.g., microsatellites) DNA replication is prone to create and then delete base-pair repeats Biophysical constraints (repeats at centromeres, telomeres: Structural roles) © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 The three-domain system is a biological classification introduced by Carl Woese in 1990 based on 16S rRNA © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Comparison of genome structure between life domains Bacteria No introns Circular chromosomes w/ plasmids Archaea Some introns TATA box–like binding sites (like Eukarya, unlike Bacteria) Circular chromosomes with plasmids Eukarya Many introns and exons Chromosomes located inside nucleus Chromosomal DNA tightly bound to histones © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Range of genome size Eukaryotic genome sizes vary by a factor of 200,000 Genomes of flowering plants vary a thousandfold Greater than two hundred–fold variation in genome size among vertebrates C-value (pg) 10-4 10-2 101 103 Prokaryotes Eukaryotes Algae Protozoa Fungi Vascular plants Arthropods Chordates Fish, amphibians Birds, mammals, reptiles Prokaryotic genomes vary by a factor of only 10 © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Genome size and complexity Prokaryotes Prokaryotes Eukaryotes No apparent relationship between complexity and genome size Simplicity due to advancement? 8,000 genes Encompass life domains of Archaea and Bacteria Linear relationship between gene number and genome size 0 0 4 8 genome size (Mb) Eukaryotes Amoeba dubia: 6 x 1011 bp Homo sapiens: 3 x 109 bp © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Deletional bias in prokaryotic genomes Comparison of pseudogenes between prokaryotic taxa shows more deletions than insertions over time Conclusion: Areas of genome not subject to selective pressure get smaller over time insertions deletions 10,000 33 1,000 Genome size bp 1 8 15 7 8 15 16 31 22 100 10 27 26 9 7 12 2 2 25 41 6 10 2 0 Prokaryotic genera © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 The C-value paradox Are bigger genomes richer in genes? Why is there a lack of correlation between genome size and complexity in eukaryotes? Some things do correlate to genome size Duration of the cell cycle Minimum cell volume Presence of LTR transposon sequences in various species of grasses increases their genome size © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Hypotheses to explain the C-value paradox Some hypotheses to explain the paradox Junk DNA Selfish DNA Nucleoskeletal hypothesis: balanced growth Nucleotypic hypothesis Developmental control Transposon hypothesis All are overlapping theories of deciphering a puzzle © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Caveats Junk DNA and selfish DNA Lack of constant accumulation in many organisms with large genomes Nucleoskeletal hypothesis Not compatible with sudden changes in genome size Nucleotypic hypothesis: cell volume and genome size Not always true across taxa Transposons Relationship not quite linear © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 The human genome organization Human genome 3000 MB Genes and related sequences 900 MB 30% Coding DNA 90 MB 3% Extragenic DNA 2100 MB 70% Repetitive DNA 420 MB 14% Non-coding DNA 810 MB 27% Tandemly repeated DNA pseudogenes Low copy or unique DNA 1680 MB 56% Interspersed repeats Introns leaders, trailers Satellite Minisatellite Microsatellite LTRs, TEs © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Common Euk Genome features Variations in base-pair content Isochores (GC rich or AT rich large regions) Duplicated genes Paralogous genes: gene duplication Pseudogenes: Lost functions Repetitive sequences Minisatellites (100 bp) and microsatellites (1-6 bp) Transposon sequences Foreign DNA DNA from other free-living organisms Viral DNA remnants © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Isochores (1970s) Isochore structure Variation of G+C content over large genomic areas >300 Kb Typical of mammals and birds Why do isochores exist? High GC stabilizes genomes? Extra thermal stability for homeotherms (warm blooded: mammals and birds)? But bacterial expt. do not pan out of GC and optimal growth conditions. G+C 0.6 0.3 0 kb 4,000 © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Genome duplications A driving force in evolution? Large-scale duplication of genes Differentiation of function in duplicated genes over time Evidence Polyploidy More than two copies per chromosome Endopolyploidy: cotyledons and salivary glands of fruit flies, blood platelets with 64 times of the genomes Evidence of past genome duplications in diploid organisms © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Polyploidy in plants and animals Plants Wheat (6n) Banana 3n ~50% of naturally occurring flowering plants are polyploid Polyploidy in animals Fish Amphibians Red viscacha rat (4n) 10 mm Comparison of red viscacha rat sperm (left) to sperm of other rodent species © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Evidence of genome duplication in Arabidopsis The Arabidopsis genome shows evidence of 4+ large genome duplications Compare blocks of genes that have related sequences 1 2 3 4 5 Blocks imply genome, rather than gene, duplication Distribution of block ages points to multiple duplications © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Junk DNA? Junk DNA Sequences that serve no apparent evolutionary function Is non-coding DNA junk? Evidence against junk DNA Statistical evidence of selection acting on noncoding regions (high identity between mouse and human) Biological functionality for noncoding regions AGACCAGGAACTTACAGCGACCTTGAACTGTTCCATTGCTCTTTTCCTGGGGCGG-GGGC |||||||||||| ||| || |||||||||||||||||||||||||||||||| ||| AGACCAGGAACTCGTGGCGGCCGTGAACTGTTCCATTGCTCTTTTCCTGGGGCGGAGGGA Comparison of human and mouse intergenic regions © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Variations in the amount of noncoding DNA Prokaryotes Bacteria: ~15% Eukaryotes Yeast (S. cerevisiae): 30% Malarial parasite (P. falciparum): 50% Flowering plant (A. thaliana): 70% Nematode worm (C. elegans): 70% Human (H. sapiens): 95% © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Classification of noncoding DNA Pseudogenes Duplicated genes that have accumulated too many deleterious mutations to function Repeats Repeated units of DNA are 1–200 bp long Transposable elements Pieces of DNA that have the ability to jump from place to place in the genome © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Pseudogenes Pseudogenes can exist because a single functioning copy of a gene is sufficient Processed pseudogenes lack a promoter and introns Believed to derive from mRNA copy Reverse transcribed into cDNA Reintroduced into genome Pseudogenes can be quite common In C. elegans, there is one pseudogene for every eight genes © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Minisatellites Density centrifugation in 60’s Tandem repeats 7-100 bp long Generally GC rich Highly polymorphic (different in different organisms) Minisatellites in coding and non-coding regions Example: apolipoprotein family Association with specific lipoprotein depends on number of repeats in coding minisatellite Minisatellites can also affect gene regulation of nearby genes Found in wide range of eukaryotic organisms © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Instability in human minisatellites Minisatellites in humans are hypermutable Mutation rate > 0.5% per sperm per allele Not the case in mouse, rat, or pig genome Insertion of human minisatellites into mouse genome does not increase germ line mutation rate Differences between minisatellites in humans and other mammals 90% of minisatellites are in highly recombinant regions 66% of pig, 30% of rat, and 15% of mouse minisatellites are in similar regions © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Minisatellite mutations Mutations involving minisatellites can be complex Mechanism not fully understood Minisatellites associated with fragile sites on chromosomes step 1 step 2 step 3 © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Applications using minisatellites DNA fingerprinting Highly polymorphic, thus ideal for identifying individuals Studying the effects of mutagens Used to count germ line mutations in children near Chernobyl Chernobyl children had double the mutation rate of the controls kB –12 –10 –8 –6 udder cultured cells Dolly –4 © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Forensics Alec Jeffreys and his technician Vickie Wilson found minisatellites by chance in 1984. This is the story from “Genome” by Matt Ridley P. 132 Jeffreys et al. were interested in gene evolution by studying a muscle protein myoglobin from humans and seals when they found a 12 bp long repeat in the middle of the gene. They found that these repeats vary in number between different individuals and can actually be used as genetic fingerprinting like a barcode. They left working with myoglobins but started working on the minisatellites. Immigration authorities got interested in using it for checking if new immigrants have any undisclosed relatives in that country. © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Actual story On August 2, 1986 a young girl, Dawn Ashworth was found raped and murdered in a small town of Narbourough. A week later Police arrested a man named Richard Buckland who confessed to the murder. A case should have been closed but Police saw similarities between Ashworth murder with another unsolved murder of Lynda Mann three years earlier. The situation was so similar that Police were sure that the same man did that murder too but Buckland was not ready to confess for that other murder. So they decided to use DNA fingerprinting. All DNA samples were sent to Jeffreys and he did the minisatellite testing in a week. What he found was most bizarre. Two semen samples were identical but Buckland DNA did not match with the semen found at both the murder sites. So he was lying and was cleared of both the murders. This was the first time a man was released on the basis of DNA evidence. So who did it? Police thought Jeffreys made a mistake and must be wrong. So they repeated at their own lab and result was the same. They did not give up. They took blood samples from 5,500 men in that village and did large scale fingerprinting. None matched with semen so this must be work of an outsider. A bakery man, Ian Kelly in another town boasted to his colleagues that he took the blood test for a guy, Colin Pitchfork who lived in Narborough. Pitchfork requested Kelly to help him since Police were trying to frame him. News went out and the Pitchfork DNA samples matched with semen. Case closed! Pitchfork got life in prison in January 1988. Murderer was now behind the bar! © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 More applications In Britain, 320,000 people have been typed and 28,000 were used to link with crime. However, 60,000 were cleared of the crime. Mini and microsatellites are used commonly but definitively. Josef Mengle’s remains were exhumed and confirmed! Presidential semen confirmed in Monica Lewinsky’s case In World Trade Center attacks, DNA fingerprinting was used to provide death certificates. Illegitimate kids of Thomas Jefferson were identified. “IdentiGene” and “DNA diagnostics” services used and flourishing for paternity tests that may wiggle some out of paying child-support. Presence of pathogen’s in biological warfare also use another kind of DNA fingerprinting. Molecular Biologists can become defense lawyers, patent attorneys, and DNA experts. Bart Simpson case! (O. J.). © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Microsatellites (SSR) Definition Tandem repeats of 1–6 bp Also called simple sequence repeats (SSRs) Found in prokaryotes and eukaryotes Slipped-strand mispairing is the biological process that creates microsatellites Examples from human genome Common Rare A C AT CG AAC ACG AGAT CCCG © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Microsatellite utility in bacteria Microsatellites are unstable because of susceptibility to slipped-strand mispairing Slipped-strand mispairing causes mutations that alter protein function Virulent bacteria have microsatellites associated with multiple genes (contingency genes) By maintaining genetic diversity within a small population, bacteria can increase adaptability found in the bacterium N. gonorrhoeae, which causes the sexually transmitted disease gonorrhea © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 An example of contingency genes Bacterium N. gonorrhoeae causes gonorrhea Opa genes code for surface proteins Allow infection by facilitating adhesion to epithelial cells But also make bacteria susceptible to human phagocytes Contain microsatellites with CTCTT repeats Repeat deletions cause frame-shift mutations Surface proteins no longer “sticky” N. gonorrhoeae can thereby avoid lethal phagocytes 1 in 100–1,000 new cells will have CTCTT mutation Therefore, there is always a population of evasive and infectious cells © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Microsatellites in the human genome SSR density (bp/Mb) 1 6 chromosome More or less evenly distributed among chromosomes Some repeats more common than others Trinucleotide repeats responsible for a number of diseases 12 18 x y © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Trinucleotide repeats and human disease Expansion of trinucleotide repeats responsible for over a dozen neurological diseases Examples: Huntington’s disease, fragile X syndrome (mental retardation) Repeat motifs Mutations involving CGG, GCC, GAA, CTG, and CAG account for 14 diseases Some located in noncoding areas Others located in coding regions © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Trinucleotide expansion in Huntington’s disease Huntington’s disease Midlife onset of dementia (progressive decline in cognitive function due to damage or disease in the brain), followed by death Caused by expansion of CAG repeats in the coding region of Huntingtin CAG repeats translated into polyglutamine tract (QQQ) 6–35 repeats: no disease 36–121 repeats: Huntington’s disease > 70 repeats: Huntington’s disease with juvenile onset Other diseases also caused by CAG expansion in coding region Polyglutamine tract believed to cause neurotoxic aggregates of protein to form inside neurons © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Transposable elements DNA regions flanked by repeats that can jump around the genome Transposable-element (TE) insertion can be faster than chromosome replication Allows TEs to rapidly accumulate in genome 44% of human genome sequence is TEs TEs can have different effects on the host Insertion into gene can destroy function TE can evolve into beneficial gene (insertion in lethal gene) TE can affect gene expression by regulatory region insertion But most TEs will have a neutral effect © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 TE content among species TE content varies greatly among species Humans: 44% Maize: > 50% Drosophila: 15% Arabidopsis, C. elegans, yeast: < 5% TEs tend to be concentrated in particular genomic regions: Human X chromosomes high TEs In maize, TEs have doubled the size of the genome over the last few million years © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Intraspecies variations in TE content Wild barley plants examined in Israeli canyon containing distinct microclimates (center of origin) Copies of BARE-1 transposon vary in individual wild barley plants 8,300–22,000 copies per plant (1.8%–4.7% of the total genome) BARE-1 copy number (and genome size) found to positively correlate with dryness and altitude Biological significance Large genome correlates with increased cell volume Large cell volumes make growth more efficient in cooler weather of higher altitudes with slower cell division © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Autonomous, nonautonomous, and inactive transposons Autonomous Has all genes required for transposition Autonomous Nonautonomous Degraded autonomous sequence Move around using autonomous proteins Nonautonomous Inactive (relics) Stationary Sequence too degraded for transposition Inactive © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Class I transposable elements Use RNA to transcribe themselves Types Long terminal repeat (LTR) retrotransposons Non-LTR retrotransposons Class I transposable elements in the human genome account for ~42% of the total genome sequence while a reverse transcriptase gene has been found in almost all eukaryotic genomes examined to date, it is found in only a minority of Bacteria and in almost no Archaea © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 LTR retrotransposons (autonomous) Flanked by long terminal repeats (LTR) Genes code for transposition machinery Gag: capsid-like protein Pol: polymerase (reverse-transcriptase activity) and protease activity RT and protease long long terminal repeats terminal repeats Gag Pol Example: BARE-1 (~8.9 kb) © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Non-LTR retrotransposons (autonomous) Flanked by 6- to 20-bp target-site duplications (TSDs) (Short) Also 5’ and 3’ untranslated regions (UTR) Transcription machinery coded by ORF-1 and ORF-2 L1 retrotransposons make up over 15% of the human genome. Most of them are nonfunctional, but of the 500,000 total copies, 40–60 are considered to be active. In contrast, the mouse genome has up to 5,000 active non-LTR retrotransposons, which accounts for the fact that roughly 10% of the spontaneous-mutation rate in mice is due to TE insertions. 3’ UTR characterized by AATAAA and poly(A) tail ORF-2 contains reverse transcriptase RT TSD ORF 1 TSD ORF 2 3’ UTR 5’ UTR Example: Line1 element (6 kb) © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Nonautonomous retrotransposons Alu elements (no function known) Type of short interspersed nuclear element (SINE) Transposed by L1 retrotransposon TSDs like non-LTR transposons Left (L) and right (R) monomers and poly(A) tail One million copies in human genome TSD A(n) L TSD R Example: Alu element (0.3 kb) © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Effects of active retrotransposons Cis retrotransposition (copy and insert itself throughout the genome) Trans retrotransposition (copy others) L1 retrotransposons replicate Alu elements Insertional mutagenesis Retrotransposon inserts into coding region of gene Unequal homologous recombination Duplications and deletions in chromosome regions Effects on gene expression Some L1 sequences have acquired enhancer function © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Class II transposable elements Use DNA-based method of transposition not RNA DNA transposons are typically unable to copy themselves Approximately 200,000 copies of this type of TE in the human genome Class II also includes P-element transposons in Drosophila and activation-dissociation (Ac/Ds) elements in maize Transposes functions in the excision of the TE from one region and its integration into another TSD TSD transposase inverted terminal repeats Example: Tc1-mariner (1.4 kb) inverted terminal repeats © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Miniature inverted transposable elements (MITEs) (unusual) Commonly found in plants (6% rice), insects, nematodes, and humans (high copy number) Nonautonomous class II transposons Small (< 500 bp), less disruptive, size possibly responsible for high copy number © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 PING and PONG For many years, MITEs were a mystery to biologists, because their high copy number implied active (or autonomous) elements, but none had been found. The sequencing of the rice genome gave researchers the opportunity to intensify the search for active MITEs, and in 2003, three separate groups of researchers published proof that MITEs can move about the rice genome. Plant MITEs fall into two categories: stowaway-like and tourist-like. Stowaway-like MITEs are moved about the genome by autonomous class II transposons similar to the TC1-mariner element described in the previous slide. Tourist-like elements have as their autonomous partners a newly described class of active MITE called PIF/Pong. The previous figure in the slide compares the sequences of Pong and a nonautonomous MITE called Ping. The red and green regions represent homologous terminal regions among a variety of degraded Ping elements (mPings in the figure) and Pong elements. Unlike active Pong sequences, Ping elements have degraded ORF-1 genes. While the function of ORF-1 is not fully understood, the ORF-2 gene is believed to code for the transposase that is responsible for making Ping elements mobile. The high copy number of MITEs is very unusual for a class II transposon. The small size of MITEs (typically less than 500 bp) is believed to be the reason that these transposons are so common. Insertions of MITEs in the genome would presumably be less disruptive than insertions of larger TEs and hence would not be selected against as strongly. © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 TEs and X-chromosome inactivation X-chromosome inactivation (mammalian) One X chromosome in each somatic cell inactivated Occurs during embryonic development in female mammals Regions of X chromosome that are inactivated are rich in L1 elements 10% of X chromosome that escapes inactivation is in region with low L1 density L1 elements on X chromosome dated to 100 million years ago (emergence of mammals) Correlations suggest L1 elements might be involved in X-chromosome inactivation © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Summary I Genome structure varies tremendously Genome size and complexity Linear relationship in prokaryotes No clear relationship in eukaryotes Genome features Statistical features Isochores: regions of elevated G+C content © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Summary II Coding regions Bacteria: no introns Archaea: some introns, TATA boxes Eukarya: many introns and exons, TATA boxes Noncoding regions Pseudogenes Repetitive sequences Minisatellites Microsatellites Endogenous viral sequences © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Summary III Transposons Class I RNA intermediate Frequent replication Class II DNA intermediate Rare replication MITEs DNA intermediate High copy number in plants © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458 Summary IV Forces affecting genome structure Point mutations Gene and genome duplications Viral infection Transposable elements Tandem repeat–associated mutations Functions of noncoding DNA Gene regulation Adaptation to environmental conditions Co-opting for new gene function © 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey 07458