Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Journal Club Jenny Gu October 24, 2006 Introduction Defining the subset of Superfamilies in LUCA Examine adaptability and expansion of particular superfamilies of LUCA related to function and genome size. Challenged Woese’s Annealing hypothesis. Methods 3-D Structural Comparison Domain Similarity Defined by: SSAP Dynamic Programming based Structure Comparison Algorithm CORA Comparison to 3D templates for each Superfamily. Manual Inspection. Profile based approaches Detect sequence patterns between relatives Functional Information Public resources (COGs, GO, KEGG) and literature Expect Curators Methods Genome Structural Annotation and Occurrence Profiles Dataset: 114 complete genomes. 100 Prokaryotic Genomes 85 Bacteria, 15 Archeobacteria species 14 Eukaryotic Genomes Structural Annotation CATH HMMs -> Gene3D database. Superfamily Domain Occurrence Profiles (Prokaryotes) 940/1278 CATH domain present in at least one genome. Annotation Coverage: 50% of genes. Methods Ancestral Superfamily Set Selection Defined by: Present in at least 90% of species from all kingdoms. Present in at least 70% archaeal and eukaryotic species. Definition avoids selection of superfamilies overrepresented in Bacteria but poorly represented in smaller groups. Flexibility for considering false-negative prediction error with sequence based approach. Guarantee selection of families in LUCA . Eliminate error introduced by horizontal gene transfer. Methods Functional Annotation Automatic Functional Annotation for 940 structural superfamilies annotated in 100 prokaryotic species with COG. Superfamily functionally classified according to statistically most represented functional COG subcategory. 726/940 superfamilies annotated in COG (5% or more of species, at least 5 genes) For ancestral superfamily, further annotation with Pfam and literature. Methods Definition of the Superfamily Functional Groups COG has six functional groups Translation Replication Metabolism Cellular Process Transcription Poorly Characterized Not considered RNA processing and modificaton Chromatin structure and dynamics Results and Discussion Superfamily Functional Distribution in the Ancestral Domain Set 140 superfamilies found in all organisms of the three main kingdoms (Bacteria, Archaea, and Eukaryotes) 15% of Superfamilies, 55% of all domains in bacterial genes, and 18% of all domains in eukaryotes. Results and Discussion Superfamily Functional Distribution in the Ancestral Domain Set (cont..) Representatives in all six COG functional groups. Translation (48 superfamilies) and Metabolic (46 superfamilies) comprise majority of ancestral domains. Metabolism (385 superfamilies) has undergone a higher expansion than translation (90 superfamilies). Results and Discussion Analysis of the Cellular Functions of Ancestral CATH Superfamilies in the LUCA Two issues in defining ancestry: Domain ubiquity through all species. Probable functions such domains could have performed in LUCA. Results and Discussion Analysis of the Cellular Functions of Ancestral CATH Superfamilies in the LUCA Results and Discussion Analysis of the Cellular Functions of Ancestral CATH Superfamilies in the LUCA Interconversion of sugars and synthesis of polysaccharides. Synthesis of ATP and partial equilibrium of NAD/NADH Part of the Calvin Cycle Pentose phosphate pathway Acetyl-CoA for cholesterol and/or steroids and synthesis and degradation of fatty acids. Part of the Krebs Cycle Results and Discussion Analysis of the Cellular Functions of Ancestral CATH Superfamilies in the LUCA Nucleotide metabolism incomplete. Two alternatives for LUCA Synthesized nucleotides by de novo pathways Incorporated from surrounding soup. Enzyme for interconversion of nucleoside monophosphates are present. Results and Discussion Analysis of the Cellular Functions of Ancestral CATH Superfamilies in the LUCA DNA synthesis, repair, ligation, and modification are represented. Synthesis of RNA and DNA transcription represented. Domain related to robosomal partical and protein synthesis are abundant. Methyl Transfer Proteins Results and Discussion Analysis of the Cellular Functions of Ancestral CATH Superfamilies in the LUCA Membrane and Cell wall biogenesis Transduction of protein-protein signals and gene regulation Protein signal recognitio for protein transport Cell division Electron transport And ATP synthase Methods Universal Distribution Percentage of Superfamilies Universal Distribution Percentages Superfamily occurrence profiles derived from the prokaryotic sample (Archaea and Bacteria) 100% = Superfamily present in all species. 0% = Superfamily has highly specific distribution in just a few species. Results and Discussion Ancestry and Evolutionary Temperature Results and Discussion Ancestry and Evolutionary Temperature Results and Discussion Superfamily Duplication Rates and Functional Diversification Another measure to gauge evolutionary temperature. Number of homologues within a superfamily. Observed high correlation with duplication and functional diversification. Results and Discussions Superfamily Duplication Rates and Functional Diversification High universality spans across more function subcategories. Metabolism has a higher duplication rate and functional diversification than translation. Methods Genome Size Correlation and the Coefficient of Interspecies Gene Variation (CIGV) of Superfamilies Domain occurrence profiles from 100 prokaryotic sample. Correlation coefficients between occurrence and genome size. (compared to randomly generated null model.) CIGV calculated by dividing standard deviation over all values of occurrence profile for a given superfamily. Methods Statistical Analysis of Superfamily Distributions Kolmogorov-Smirnov two-sample test in the twotailed version for large samples. Compared pairs of distribution between different functional groups. Results and Discussions Superfamily Occurrence Profiles and Genome Size Correlation Results and Discussions Superfamily Occurrence Profiles and Genome Size Correlation Results and Discussions Superfamily Occurrence Profiles and Genome Size Correlation Results and Discussions Superfamily Coefficient of Interspecies Gene Variation High CIGV values = more adaptable. Hotter evolutionary temperature Low CIGV values = less adaptable. Results and Discussions Superfamily Coefficient of Interspecies Gene Variation Results and Discussions Rates of Superfamily Innovation in the Functional Groups High Innovation Poor Innovation Conclusions A more realistic distribution of superfamilies in distant species. Life achived modern cellular status long before separation of three kingdoms. Woese’s annealing hypothesis called into question. A function of specific features and adaptabilities versus time.