* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Alu elements and splicing events
Epigenetics of diabetes Type 2 wikipedia , lookup
Zinc finger nuclease wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Human genetic variation wikipedia , lookup
Copy-number variation wikipedia , lookup
Nutriepigenomics wikipedia , lookup
X-inactivation wikipedia , lookup
Metagenomics wikipedia , lookup
Gene desert wikipedia , lookup
Biology and consumer behaviour wikipedia , lookup
Point mutation wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
Gene expression programming wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Ridge (biology) wikipedia , lookup
Extrachromosomal DNA wikipedia , lookup
Oncogenomics wikipedia , lookup
Long non-coding RNA wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Short interspersed nuclear elements (SINEs) wikipedia , lookup
Whole genome sequencing wikipedia , lookup
Genetic engineering wikipedia , lookup
Public health genomics wikipedia , lookup
Genomic imprinting wikipedia , lookup
Gene expression profiling wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Mitochondrial DNA wikipedia , lookup
Transposable element wikipedia , lookup
Pathogenomics wikipedia , lookup
Genomic library wikipedia , lookup
Microevolution wikipedia , lookup
Non-coding DNA wikipedia , lookup
History of genetic engineering wikipedia , lookup
Human Genome Project wikipedia , lookup
Designer baby wikipedia , lookup
Minimal genome wikipedia , lookup
Primary transcript wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Genome (book) wikipedia , lookup
Alternative splicing wikipedia , lookup
Human genome wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Genome editing wikipedia , lookup
Bat Sheva Workshop Can junk DNA be exapted? Dan Graur 1 Can straw (junk DNA) be spun into gold (genes)? 2 Exaptation 3 15 February 2001 4 The human genome is disappointing: • It is small • It is empty • It is unoriginal • It is repetitive 5 K-value paradox: Complexity does not correlate with chromosome number. Homo sapiens 46 Lysandra atlantica 250 Ophioglossum reticulatum 12606 C-value paradox: Complexity does not correlate with genome size. 3.4 109 bp Homo sapiens 6.7 1011 bp Amoeba dubia 7 N-value paradox: Complexity does not correlate with gene number. ~31,000 genes ~26,000 genes ~50,000 genes 8 1.5% Exons Introns (junk) Intergenic regions (junk) The genome is empty. 9 The genome contains a large number of genetic “corpses” (pseudogenes). 10 L-gluono-g-lactone oxidase deficiency 11 There are gene-dense (urban centers) and gene-poor (deserts) chromosomes From 23 genes per million base pairs on chromosome 19 (3%) to only 5 genes per million base pairs on chromosome 13 (0.7%). 12 How can we be sure that the genome is empty? Isn’t it possible that the emptiness is a mere artifact of our ignorance? 13 959 cells 1,031 cells ~108 cells 19,000 genes 13,600 genes 14 The gene number game: Gensweep© July 2000 July 2001 Bets: 165 Mean: 61,710 Lowest: 27,462 Highest: 153,478 Bets: 281 Median: 61,302 Lowest: 27,462 Highest: 212,278 15 Humans are not at all original in comparison with other vertebrates. 16 Mouse-human synteny. Human chromosomes can be cut into ~150 pieces, then shuffled into a reasonable approximation of the mouse genome. 17 2 solutions to the N-value paradox: * What looks empty isn’t. * What looks functional is more so. 18 Junk DNA Junk can sometimes be useful: • spare parts (modules) • motif donors (exon shuffling) • molds (gene conversion) 19 Eukaryotic genes (exons & introns) Splicing Translation 20 Alternative splicing: One gene, several proteins! Alternative Splicing Mature splice variant I Mature splice variant II 21 Types of alternative splicing 22 Cassette exon or internal-exon skipping 23 Deduction of internal-exon skipping through mRNA sequence alignment 24 Large-scale multiple alignment of expressed sequences Databases: tens of thousands of mRNAs millions of ESTs From large-scale alignments, it is known that 40-60% of all human genes undergo alternative splicing. 25 GenCarta (Compugen): Alignment of expressed sequences to genomic sequences 26 Alternative splicing: Alternative splicing may be unconditional, i.e., two or more mRNA variants are produced in all tissues expressing the gene. Alternative splicing may be conditional, i.e., tissue specific, developmental-stage specific or physiological-state specific. 27 Initial goal: Identifying sequence elements that regulate alternative splicing Compile a database of skipped exons. Compile a database of constitutive exons. Characterize diagnostic features of alternative splicing versus constitutive splicing. 28 Initial results 4,151 constitutive exons. 1,182 alternative exons. A motif searching program was run on each set. A strong motif, found in some of the alternative exons, was not found in the constitutive ones. The motif turned out to be part of an Alu element. 29 Exaptation case report: Alus 30 Alu elements Length = ~300 bp Repetitive: > 1,000,000 times in the human genome Constitute >10% of the human genome Found mostly in intergenic regions and introns Propagate in the genome through retroposition (RNA intermediates). 31 Repetitive DNA Alus are like that! interspersed I in tandem 32 Evolution of Alu elements 33 Master-gene model for Alu proliferation in the genome Master gene A Replicatively incompetent progeny Progeny undergoes multiple independent mutations Mutation renders A nonfunctional & creates new master gene B Mutation renders B nonfunctional & creates new master gene C 34 Alu elements can be divided into subfamilies The subfamilies are distinguished by ~16 diagnostic positions. 35 Signals of splicing Donor site 1 Branch point CAG GTRAGT A Acceptor site 2 YYYYYYYYYNCAG G Pyrimidine tract 1 -OH 2 A Lariat A 1 2 36 Because mRNAs and Alus are frequently reverse transcribed and incorporated into the genome, pyrimidine tracts are ubiquitous The complementary strand of polyA is polyT = pyrimidine tract. 37 Our findings Out of 1,182 alternatively spliced cassette exons, 62 have a significant hit to an Alu sequence. Out of 4,151 constitutively spliced exons, none has a significant hit to an Alu sequence. all Alu-containing exons are alternatively spliced. 38 Retention Ratio Retention ratio = number of mRNA molecules containing the alternatively spliced exon divided by total number of mRNA molecules. Retention ratio for Alu-containing exons was ~10%. Retention ratio for alternatively spliced exons that do not contain Alu was ~45%. 39 Alu elements: Definitions aaaaa + strand: – strand: tttttttttttttttt aaaaaaaaaa ttttttt 40 The minus strand of Alu elements contains “near” splice sites The minus strand of Alu contains ~3 sites that resemble the acceptor recognition site: Consensus acceptor site:YYYYYYNCAG/R Alu-J: (127-114) :TTTTTTGtAG/A The minus strand of Alu contains ~9 sites that resemble the consensus donor site: Consensus donor site: CAG/GTRAGT Alu-J: (25-17) : CAG/GTGtGA 41 The plus strand of Alu elements does not contain “near” acceptor splice sites 42 Exonization of a minus strand (all is Alu) Donor Exon Alu Acceptor 43 Exonization of a plus strand (3’ of Alu is “in”) Donor Alu Exon Acceptor 44 Alus within alternatively spliced exons – strand + strand 50 1 3’ 1 6 5’ 3 1 middle of exon 0 0 Alu occupies entire exon 45 Proposed model for Alu exonization Exon Exon 46 Proposed model for Alu exonization Exon Exon 47 Does Exonization Represent Functionalization? 1. Alus are only found in alternative exons. – Alu-containing constitutive exons cannot be created by mutation. – Alu-containing constitutive exons are deleterious and, therefore, selected against. Constitutve Alu-containing exons are known and they are invariably deleterious. 48 Does Exonization Represent Functionalization? 2. Alus are only found in alternative exons with low retention indices. Highly expressed alternative Alu-containing exons are deleterious. 49 Does Exonization Represent Functionalization? 3. Eighty-four percent of all Alu- containing exons cause frameshifts or premature termination. Alu-containing exons are unlikely to contribute to the proteome. 50 Does Exonization Represent Functionalization? 4. There are reasons to believe that many identifications of alternative splicing are spurious. The contribution of alternative splicing to the proteomic repertoire may be vastly overestimated. 51 Conclusion? Alu elements increase coding and regulatory versatility of the transcriptome, while maintaining the intactness of the genomic repertoire. 52 Conclusion No exaptation 53 Exaptation case report: numts* *pronounced “new mights” 54 Numts (nuclear mitochondrial DNA sequences) are a type of promiscuous DNA, i.e., nuclear sequences of organelle (e.g., mitochondrial) origin. 55 Numts: Evolution’s misplaced witnesses 56 The transfer of functional genes from the mitochondria to the nucleus is thought to have has stopped in evolution after the emergence of animals (~1,000 MYA). 57 The reason is thought to be the differences between the nuclear and mitochondrial genetic codes. 58 The transfer of nonfunctional pieces of mitochondrial genetic information continues to this day. 59 Numts have been found so far in 83 eukaryote species. 60 Most species whose genomes have been completely sequenced contain very few numts. Saccharomyces cerevisiae Caenorhabditis elegans Drosophila melanogaster Plasmodium falciparum 17 numts 3 numts 3 numts 3 numts 61 In the human genome we find ~1,000 numts total length = 831 Kb ~0.02% of the nuclear genome 62 We found 82 numts larger than 1,000 bp in the human genome. 63 Numts were found on all chromosomes. Numts larger than 1,000 bp were found on 21 chromosomes. 64 The newest numt was found on chromosome 6. Length = ~6,000 bp (35% of the human mithochondrial genome) Similarity = 98.2% DNA identity. The longest numt was found on chromosome 5. Length = ~16,000 (an entire mitochondrional genome) Similarity = 88.8% DNA identity. 65 The largest documented nonhuman numt is a 7.9-Kb fragment in the nuclear genome of the domestic cat. 66 The 82 numts contain a total of 362 complete mitochondrial genes (of which 108 are proteincoding genes). 67 With the exception of the D-loop, which is variable and difficult to detect by similarity, all other regions of the mtDNA are represented in numts at frequencies that do no deviate significantly from the random expectation 68 Only 4 numts retained an intact reading frame. They are annotated as putative protein coding genes 69 In all cases the gene is NADH dehydrogenase subunits 4L (ND4L). 70 ND4L is the also the only mitochondrial gene that can be translated “without incident” by the nuclear genetic code. 71 Conclusion No exaptation 72