* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Genome evolution: a sequence
Cell-free fetal DNA wikipedia , lookup
Long non-coding RNA wikipedia , lookup
Public health genomics wikipedia , lookup
Cancer epigenetics wikipedia , lookup
Genetic engineering wikipedia , lookup
RNA silencing wikipedia , lookup
Nucleic acid analogue wikipedia , lookup
Nucleic acid tertiary structure wikipedia , lookup
Epigenomics wikipedia , lookup
Metagenomics wikipedia , lookup
Genome (book) wikipedia , lookup
Cre-Lox recombination wikipedia , lookup
Point mutation wikipedia , lookup
Oncogenomics wikipedia , lookup
History of RNA biology wikipedia , lookup
Mitochondrial DNA wikipedia , lookup
Epitranscriptome wikipedia , lookup
Designer baby wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Vectors in gene therapy wikipedia , lookup
Transposable element wikipedia , lookup
Non-coding RNA wikipedia , lookup
Extrachromosomal DNA wikipedia , lookup
Pathogenomics wikipedia , lookup
No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup
Deoxyribozyme wikipedia , lookup
Microevolution wikipedia , lookup
Short interspersed nuclear elements (SINEs) wikipedia , lookup
Whole genome sequencing wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Site-specific recombinase technology wikipedia , lookup
Primary transcript wikipedia , lookup
History of genetic engineering wikipedia , lookup
Human genome wikipedia , lookup
Human Genome Project wikipedia , lookup
Minimal genome wikipedia , lookup
Genomic library wikipedia , lookup
Non-coding DNA wikipedia , lookup
Genome editing wikipedia , lookup
Genome evolution: a sequence-centric approach Lecture 7: Brief evolutionary history of everything (Probability, Calculus/Matrix theory, some graph theory, some statistics) Simple Tree Models HMMs and variants PhyloHMM,DBN Context-aware MM Factor Graphs DP Sampling Variational apx. LBP EM Generalized EM (optimize free energy) Probabilistic models Genome structure Inference Mutations Parameter estimation Population Inferring Selection Genome Structure, Genome Information Mutation Genome structure Genomic information Selection Diversity: Brief description of the tree of life Genome structure: Size, Key features, Mobile elements Genome information: Proteins/RNA genes, regulatory elements Today: A lot of terminology, basic overview ? RNA Ribosome Based Proteins Genomes Genetic Code DNA Based Genomes 3.4 – 3.8 BYA – fossils?? 3.2 BYA – good fossils 3 BYA – metanogenesis 2.8 BYA – photosynthesis .. .. 1.7-1.5 BYA – eukaryotes .. 0.55 BYA – camberian explosion 0.44 BYA – jawed vertebrates 0.4 – land plants 0.14 – flowering plants 0.10 - mammals ? Membranes Diversity! Curated set of universal proteins Eliminating Lateral transfer Multiple alignment and removal of bad domains Maximum likelihood inference, with 4 classes of rate and a fixed matrix Bootstrap Validation Ciccarelli et al 2005 EUKARYOTES PROKARYOTES Presence of a nuclear membrane (Also present in the Planktomycetes) Organelles derived from endosymbionts (also in b-protebacteria) Cytoskeleton and vesicle transport Tubulin-related protein, no microtubules Trans-splicing - Introns in protein coding genes, spliceosome Rare – almost never in coding Expansion of untranslated regions of transcripts Short UTRs Translation initiation by scanning for start Ribosome binds directly to a Shine-Delgrano sequence mRNA surveillance Nonsense mediated decay pathway is absent Multiple linear chromosomes, telomeres Single linear chromosomes in a few eubacteria Mitosis, Meiosis Absent Gene number expansion - Expansion of cell size Some exceptions, but cells are small Eukaryotes Uniknots Biknots Eukaryotes Uniknots – one flagela at some developmental stage Fungi Animals Animal parasites Amoebas Biknots – ancestrally two flagellas Green plants Red algea Ciliates, plasmoudium Brown algea More amobea Strange biology! A big bang phylogeny: speciations across a short time span? Ambiguity – and not much hope for really resolving it Vertebrates Fossil based, large scale phylogeny Sequenced Genomes phylogeny 0.5% Human Chimp Gorilla Orangutan Gibbon Baboon Macaque Marmoset Primates 9% 1.2% 0.8% 3% 1.5% 0.5% Flies Yeasts Genome Size Why larger genomes? • Selflish DNA – – larger genomes are a result of the proliferation of selfish DNA – Proliferation stops only when it is becoming too deleterious • Bulk DNA – Genome content is a consequence of natural selection – Larger genome is needed to allow larger cell size, larger nuclear membrane etc. Why smaller genomes? • Metabolic cost: maybe cells lose excess DNA for energetic efficiency – But DNA is only 2-5% of the dry mass – No genome size – replication time correlation in prokaryotes – Replication is much faster than transcription (10-20 times in E. coli) Mutational balance • Balance between deletions and insertions – May be different between species – Different balances may have been evolved • In flies, yeast laboratory evolution – 4-fold more 4kb spontaneous insertions • In mammals – More small deletions than insertions Mutational hazard Can we model genome size evolution in a quantitative way? • No loss of function for inert DNA – But is it truly not functional? • Gain of function mutations are still possible: – Transcription – Regulation Differences in population size may make DNA purging more effective for prokaryotes, small eukaryotes Differences in regulatory sophistication may make DNA mutational hazard less of a problem for metazoan Genome Structural features: centromeres/telomeres Human Rat – Partly acrocentric Centromeres are essential and universally important for proper cell division, but are highly diverging among species Sattelites and repeats Pericentromeric regions – more repeats Telomeres are critical for genome maintenance Sub telomeric regions – also repetitive May be key to nuclear structure? Genome Structural features: nuclear organization The nucleus must be organized to allow functional transcription and replication Incredibly dense mesh of chromosomes, cytoskeleton, membranes Transcription factories / chromosomal territories “spacer DNA” may affect physical organization in unexpected ways Inter- and Intra- chromosomal interactions Entire genome may participate in regulating interactions Genomic information: Protein coding genes Modeling protein coding genes Modeling protein structure/function Structure is complex Dependencies are not confined by gene linear coding http://predictioncenter.org/ Genomic information: the gene repertoire is evolving by duplication and loss Genome information: Introns/Exons Genome information: RNA genes mRNA – messenger RNA. Mature gene transcripts after introns have been processed out of the mRNA precursor miRNA – micro-RNA. 20-30bp in length, processed from transcribed “hair-pin” precursors RNAs. Regulate gene expression by binding nearly perfect matches in the 3’ UTR of transcripts siRNA – small interfering RNAs. 20-30bp in length, processed from double stranded RNA by the RNAi machinary. Used for posttranscriptional silencing rRNA – ribosomal RNA, part of the ribosome machine (with proteins) snRNA – small nuclear RNAs. Heterogeneous set with function confined to the nucleus. Including RNAs involved in the Splicesome machinery. snoRNA – small nucleolar RNA. Involved in the chemical modifications made in the construction of ribosomes. Often encode within the introns of ribosomal proteins genes tRNA – transfer RNA. Delivering amino-acid to the ribosome. piRNA - ??? miRNA clusters snRNA works by binding other RNAs RNA structure affects function Computational perspective: finding and understanding RNAs and their evolution Ultra-high throughput sequencing is transforming all aspects of biology Ultra-high throughput sequencing is transforming all aspects of biology Genome information: regulatory elements Specialized proteins can bind DNA in a sequence specific fashion Genomes can therefore control the level of affinity of each region to a large set of DNA binding proteins DNA binding sites are typically short (<20bp) Multiple binding sites at different affinities participate in regulation Computational perspective: finding and understanding TFBSs The regulatory process is likely to less deterministic and discrete the this beautiful idealized sea urchin regulatory network Each regulatory interaction is parameterized and many additional weak interaction participate in the Process Evolution of regulatory regions involve more than a small set of discrete 20bp sites Chromatin Immunoprecipitation is mapping DNA binding sites Structure meets information: packaging and chromosomal interactions are critical for proper genome function Structure meets information: HOX clusters as an example Hox genes are important developmental regulators Present in linear clusters, preserving order Their expression is frequently coordinate with the gene order 4 HOX clusters are present in the human genome Additional gene clusters: Protocadherins, Olfactory receptors, MAGE genes, Zinc fingers Additional smaller groups of related regulators are co-located Mapping chromosomal interactions: 4C Repeats: selfish DNA Repetitive elements in the human genome Class Copies Genome Fraction LINEs 868,000 20.4% (only ~100 active!!) SINEs 1,558,000 (70% Alu) 13.1% LTR elements 443,000 8.3% Transposons 294,000 2.8% Retrotransposition via RNA Repeats: short tandems, satellites DNA-based transposons do not involve an RNA intermediate, and are quite rare. Satellite DNA duplicate by Replication slippages which is enhanced for specific sequences. Abundant near telomeres and centromeres. Some of these are still a mystery. Retrotransposition is generally sloppy and noisy – so elements die out quickly Element proliferation appears in evolutionary bursts. Pseudogenes Genes that are becoming inactive due to mutations are called pseudogenes mRNAs that jump back into the genome are called processed pseudogenes (they therefore lack introns) Summary – • History/Phylogeny: – – – • Genome structure – – – – • Early phylogenetics can be inferred using genome sequences, but conclusions are not always reliable Maximum likelihood models sometime depends on the gene/genomic region analyzed, genome is highly heterogeneous at all levels. The major clades, phylogeny of model organisms and sequenced genomes Size and its consequences Packaging and nuclear organization Mutational effects and differences Selfish DNA Genome information – – – – Protein coding genes RNA genes Transcription factor binding sites Chromosomal organization and DNA codes that affect it