* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download PPT
Survey
Document related concepts
Epigenetics of neurodegenerative diseases wikipedia , lookup
Minimal genome wikipedia , lookup
Epigenetics of diabetes Type 2 wikipedia , lookup
Epigenetics of human development wikipedia , lookup
Protein moonlighting wikipedia , lookup
Nutriepigenomics wikipedia , lookup
Long non-coding RNA wikipedia , lookup
Epitranscriptome wikipedia , lookup
Therapeutic gene modulation wikipedia , lookup
Artificial gene synthesis wikipedia , lookup
Polycomb Group Proteins and Cancer wikipedia , lookup
Gene expression profiling wikipedia , lookup
Gene expression programming wikipedia , lookup
Genome evolution wikipedia , lookup
Transcript
Surveys of a Finite Parts List Mark Gerstein Molecular Biophysics & Biochemistry and Computer Science, Yale University H Hegyi, J Lin, B Stenger, P Harrison, N Echols, J Qian, A Drawid, D Greenbaum, R Jansen Transcriptome 2000, Paris 8 November 2000 1 (c) Mark Gerstein, 2000, Yale, bioinfo.mbb.yale.edu Analysis of Genomes & Transcriptomes in terms of the Occurrence of Parts and Features: Bacteria, 1.6 Mb, ~1600 genes [Science 269: 496] 1997 Eukaryote, 13 Mb, ~6K genes [Nature 387: 1] Genomes highlight the Finiteness of the “Parts” in Biology 1998 Animal, ~100 Mb, ~20K genes [Science 282: 1945] 2000? Human, ~3 Gb, ~100K genes [???] ‘98 spoof 2 (c) Mark Gerstein, 2000, Yale, bioinfo.mbb.yale.edu 1995 real thing, Apr ‘00 Simplifying the Complexity of Genomes: Global Surveys of a Finite Set of Parts from Many Perspectives 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 … ~100000 genes ~1000 folds (human) (T. pallidum) 1 2 3 4 5 6 7 8 9 10 11 12 13 Same logic for sequence families, blocks, orthologs, motifs, pathways, functions.... Functions picture from www.fruitfly.org/~suzi (Ashburner); Pathways picture from, ecocyc.pangeasystems.com/ecocyc (Karp, Riley). Related resources: COGS, ProDom, Pfam, Blocks, Domo, WIT, CATH, Scop.... 14 15 … ~1000 genes 3 (c) Mark Gerstein, 2000, Yale, bioinfo.mbb.yale.edu 1 4 (c) Mark Gerstein, 2000, Yale, bioinfo.mbb.yale.edu A Parts List Approach to Bike Maintenance A Parts List Approach to Bike Maintenance What are the shared parts (bolt, nut, washer, spring, bearing), unique parts (cogs, levers)? What are the common parts - types of parts (nuts & washers)? Where are the parts located? Which parts interact? 5 (c) Mark Gerstein, 2000, Yale, bioinfo.mbb.yale.edu How many roles can these play? How flexible and adaptable are they mechanically? Analysis of Genomes & Transcriptomes in terms of the Occurrence of Parts & Features 5(G)1(T) 7(G)15(T) 1 Using Parts to Interpret Genomes. Shared and/or unique parts. Venn 1(G)2(Y) 2 Using Parts to Interpret Pseudogenomes. In worm, top Y-folds (DNAse, hydrolase) v top-folds (Ig). chr. IV enriched, dead and dying families (90YG v 1G) 3 Using Parts to Interpret Transcriptomes: Expression & Structure. Top-10 parts in mRNA. Enriched in transcriptome: ab folds, energy, synthesis,TIM fold, VGA. Depleted: TMs, transport, transcription, Leu-zip, NS. Compare with prot. abundance. 4 Expression & Localization. Enriched : Cytoplasmic. Depleted: Nuclear. Bayesian localizer 5 Expression & Function. Expression relates to structure & localization but to function, globally? P-value formalism. Weak relation to protein-protein interactions. bioinfo.mbb.yale.edu H Hegyi, J Lin, B Stenger, P Harrison, N Echols, R Jansen, A Drawid, J Qian, D Greenbaum, M Snyder 6 (c) Mark Gerstein, 2000, Yale, bioinfo.mbb.yale.edu Diagrams, Fold tree with all-b diff. Ortholog tree. Top-10 folds. ProtoMap One approach of many... Much previous work on Sequence & Structure Clustering CATH, Blocks, FSSP, Interpro, eMotif, Prosite, CDD, Pfam, Prints, VAST, TOGA… Remington, Matthews ‘80; Taylor, Orengo ‘89, ‘94; Thornton, CATH; Artymiuk, Rice, Willett ‘89; Sali, Blundell, ‘90; Vriend, Sander ‘91; Russell, Barton ‘92; Holm, Sander ‘93+ (FSSP); Godzik, Skolnick ‘94; Gibrat, Bryant ‘96 (VAST); F Cohen, ‘96; Feng, Sippl ‘96; G Cohen ‘97; Singh & Brutlag, ‘98 finding parts in genome sequences blast, y-blast, fasta, TM, lowcomplexity, &c (Altschul, Pearson, Wooton) part occurrence profiles 7 (c) Mark Gerstein, 2000, Yale, bioinfo.mbb.yale.edu Integrated Analysis System: X-ref Parts with Genomes Folds: scop+automatic Orthologs: COGs “Families”: homebrew, 21 43 E. coli 42 149 16 8 yeast 8 (c) Mark Gerstein, 2000, Yale, bioinfo.mbb.yale.edu Shared Folds worm of 339 35 Cluster Trees Grouping Initial Genomes on Basis of Shared Folds Cele Mthe Mjan Phor Cpne Ctra Tpal Aful Bbur Aaeo “Classic” Tree Fold Tree Mpne Syne Mgen Rpro Hpyl Mtub D=S/T 20 10 S = # shared folds 30 D=10/(20+10+30) D = shared fold dist. betw. 2 genomes T= total # folds in both Hinf Bsub 0.1 20 Genomes Ecol 9 (c) Mark Gerstein, 2000, Yale, bioinfo.mbb.yale.edu Scer Fold Tree Unusual distribution of all-beta folds 10 (c) Mark Gerstein, 2000, Yale, bioinfo.mbb.yale.edu Distribution of Folds in Various Classes Ortholog Tree Fold Tree (based on COGs scheme of Koonin & Lipman, similar approaches by Dujon, Bork, &c.) 11 (c) Mark Gerstein, 2000, Yale, bioinfo.mbb.yale.edu Compare with Ortholog Occurrence Trees (Part = ortholog v fold) Depends on comparison method, DB, sfams v folds, &c (new top superfamilies via yBlast, Intersection of top-10 to get shared and common) Top-10 Worm Folds class Ig B Knottins SML Protein kinases (cat. core) MULT C-type lectin-like A+B corticoid recep. (DNA-bind dom.) SML Ligand-bind dom. nuc. receptor A alpha-alpha superhelix A C2H2 Zn finger SML P-loop NTP Hydrolase A/B Ferrodoxin A+B M. genitalium Rank 1 2 = B. subtilis Superfamily # P-loop hydrolase 60 SAM methyltransferase 16 Rossmann domain 13 4 Class I synthetase 12 5 Class II synthetase 11 6 Nucleic acid binding dom. 11 3 Total ORFs with Common Superfamilies 479 105 (22%) = Superfamily P-loop hydrolyase Rossmann domain M. thermoautotrophicum E. coli # 173 165 Phosphatebinding barrel 79 PLP-transferase 44 CheY-like domain 36 SAM methyltransferase 30 4268 465 (11%) Superfamily # P-loop hydrolase 191 1 Rossmann domain 158 2 Rank Phosphatebinding barrel 64 3 PLP-transferase 38 4 CheY-like domain 36 5 Ferredoxins 35 6 4268 458 (11%) Total ORFs with Common Superfamilies = Superfamily P-loop hydrolyase Phosphatebinding barrel num. matches frac. all in worm worm genome dom. in in (N) (F) EC? SC? 830 1.7% 18 4 565 1.1% 0 3 472 0.9% 1 142 322 0.6% 0 1 276 0.5% 1 10 257 0.5% 0 0 247 0.5% 6 114 239 0.5% 0 78 235 0.5% 72 133 207 0.4% 83 114 S. cerevisiae A. fulgidus # 93 54 Rossmann domains 53 Ferredoxins 48 SAM methyltranferase 17 PLP-transferases 15 1869 252 (14%) = Superfamily P-loop hydrolyase Rossmann domain # 118 104 Rank 1 2 3 Phosphatebinding barrel 56 Ferredoxins 49 4 SAM methyltranferase 24 5 PLP-transferases 18 6 2409 309 (13%) Total ORFs with Common Superfamilies x = Superfamily P-loop hydrolyase # 249 Protein kinase 123 Rossmann domain 90 RNA-binding domain 75 SAM methyltransferase 63 Ribonuclease Hlike 57 6218 560 (9%) 12 (c) Mark Gerstein, 2000, Yale, bioinfo.mbb.yale.edu Common Folds in Genome, Varies Betw. Genomes x C Su Pro h Liga N C- a h Ig s Common, Shared Folds: bab structure All share a/b structure with repeated R.H. bab units connecting adjacent strands or nearly so (18+4+2 of 24) HI, MJ, SC vs scop 1.32 P-loop hydrolase Flavodoxin like Rossmann Fold Thiamin Binding TIMbarrel 13 (c) Mark Gerstein, 2000, Yale, bioinfo.mbb.yale.edu 336: 42 Analysis of Genomes & Transcriptomes in terms of the Occurrence of Parts & Features 5(G)1(T) 7(G)15(T) 1 Using Parts to Interpret Genomes. Shared and/or unique parts. Venn 1(G)2(Y) 2 Using Parts to Interpret Pseudogenomes. In worm, top Y-folds (DNAse, hydrolase) v top-folds (Ig). chr. IV enriched, dead and dying families (90YG v 1G) 3 Using Parts to Interpret Transcriptomes: Expression & Structure. Top-10 parts in mRNA. Enriched in transcriptome: ab folds, energy, synthesis,TIM fold, VGA. Depleted: TMs, transport, transcription, Leu-zip, NS. Compare with prot. abundance. 4 Expression & Localization. Enriched : Cytoplasmic. Depleted: Nuclear. Bayesian localizer 5 Expression & Function. Expression relates to structure & localization but to function, globally? P-value formalism. Weak relation to protein-protein interactions. bioinfo.mbb.yale.edu H Hegyi, J Lin, B Stenger, P Harrison, N Echols, R Jansen, A Drawid, J Qian, D Greenbaum, M Snyder 14 (c) Mark Gerstein, 2000, Yale, bioinfo.mbb.yale.edu Diagrams, Fold tree with all-b diff. Ortholog tree. Top-10 folds. Pseudogenomics: Surveying “Dead” Parts (Our def’n: YG = obvious homolog to known protein with frameshift or stop in mid-domain) 15 (c) Mark Gerstein, 2000, Yale, bioinfo.mbb.yale.edu Example of a potential YG with frameshift in mid-domain Folds in Pseudogenes 16 (c) Mark Gerstein, 2000, Yale, bioinfo.mbb.yale.edu YG identification pipeline to Summary of Pseudogenes in worm G=19K GE=8K YG=4K (2K) Example of a potential YG with frameshift in mid-domain 17 (c) Mark Gerstein, 2000, Yale, bioinfo.mbb.yale.edu Most Common Worm “Pseudofolds” #1 18 (c) Mark Gerstein, 2000, Yale, bioinfo.mbb.yale.edu Most Common Worm “Pseudofolds” #2 YG -G YG -G 16% (min) Pseudogene Distribution on Chomosomes ~50% YG in terminal 3Mb vs ~30% G 19 (c) Mark Gerstein, 2000, Yale, bioinfo.mbb.yale.edu 29% (max) D1022.6 has 90 dead fragments of itself – a disused line of chemoreceptors? 1 Num. pseudogenes in family 90 20 (c) Mark Gerstein, 2000, Yale, bioinfo.mbb.yale.edu Num. genes in family Decayed Lines of Genes? Completely Dead Families 1 Rank Number matches Organism of closest match* #1 7 ******* 5 ***** 5 ***** 4 **** 4 **** 4 **** 3 *** 3 *** 3 *** 3 *** Yeast Human Cow Frog Human Fly #2 = #2 = #4 = #4 = #4 = #7 = #7 = #7 = #7 = PROTOMAP family representative YJA7_YEAST Notes on representative Hypothetical protein in yeast THB_RANCA Xeroderma pigmentosum group D complementing protein Cleavage and polyadenylation specificity factor Thyroid hormone receptor beta SEX_HUMAN SEX gene MDR1_RAT Multidrug resistance protein 1 Vaccinia virus YVFB_VACCC Hypothetical vaccinia virus protein Fly Human E. coli VHRP_VACCC Host range protein from vaccinia IF4V_TOBAC Eukaryotic initiation factor 4A ACRR_ECOLI Acrab operon repressor XPD_MOUSE CPSA_BOVIN 21 (c) Mark Gerstein, 2000, Yale, bioinfo.mbb.yale.edu 0 Analysis of Genomes & Transcriptomes in terms of the Occurrence of Parts & Features 5(G)1(T) 7(G)15(T) 1 Using Parts to Interpret Genomes. Shared and/or unique parts. Venn 1(G)2(Y) 2 Using Parts to Interpret Pseudogenomes. In worm, top Y-folds (DNAse, hydrolase) v top-folds (Ig). chr. IV enriched, dead and dying families (90YG v 1G) 3 Using Parts to Interpret Transcriptomes: Expression & Structure. Top-10 parts in mRNA. Enriched in transcriptome: ab folds, energy, synthesis,TIM fold, VGA. Depleted: TMs, transport, transcription, Leu-zip, NS. Compare with prot. abundance. 4 Expression & Localization. Enriched : Cytoplasmic. Depleted: Nuclear. Bayesian localizer 5 Expression & Function. Expression relates to structure & localization but to function, globally? P-value formalism. Weak relation to protein-protein interactions. bioinfo.mbb.yale.edu H Hegyi, J Lin, B Stenger, P Harrison, N Echols, R Jansen, A Drawid, J Qian, D Greenbaum, M Snyder 23 (c) Mark Gerstein, 2000, Yale, bioinfo.mbb.yale.edu Diagrams, Fold tree with all-b diff. Ortholog tree. Top-10 folds. Young, Church… Affymetrix GeneChips Abs. Exp. Brown, marrays, Rel. Exp. over Timecourse Yeast Expression Data: 6000 levels! Integrated Gene Expression Analysis System: X-ref. Parts and Features against expression data... Also: SAGE (mRNA); 2D gels for Protein Abundance (Aebersold, Futcher) Snyder, Transposons, Protein Abundance 24 (c) Mark Gerstein, 2000, Yale, bioinfo.mbb.yale.edu Gene Expression Datasets: the Yeast Transcriptome Common Parts: the Transcriptome alpha/beta hydrolases a/ b 2ace 2.2 0.9 Zn2/C6 DNA-bind. dom. sml 1aw6 2.6 0.3 SAGE-S 1.6 SAGE-L 6.8 SAGE-GM multi 1hcl Church-heat Protein kinases (cat. core) Church-gal 2.1 Church-alpha 3.8 Church-a 1zta 8 1 1 1 1 1 1 1 1 2 4 4 4 5 5 6 7 7 11 9 8 10 4 10 11 3 3 3 2 2 19 15 9 4 5 6 6 7 9 9 16 11 15 16 12 12 8 5 8 6 8 2 5 4 11 10 6 12 2 5 3 3 35 19 30 9 10 21 9 18 21 82 122 120 10 16 17 11 16 12 48 25 56 15 8 14 21 15 19 21 20 33 18 18 19 9 16 11 15 13 16 17 10 32 31 25 26 21 23 26 26 26 9 75 94 27 50 32 40 48 39 50 + + + + + -46 -77 - 1 -62 -89 7 1 2 3 4 5 6 Samson a Genome 8.3 +98 5 5.2 -11 3 • 3.4 -14 • 6 8 3.3 0 • 2.9 -55 2 2.7 -37 4 14 2.7 +63 81 2.7 +1316 36 2.6 +348 2.6 +231 31 Young Rel. Diff. [%] Rep. PDB 4.2 5.8 3.9 3.3 6.4 4.4 1.7 0.2 0.6 0.8 7 25 (c) Mark Gerstein, 2000, Yale, bioinfo.mbb.yale.edu 715 Transcriptome [%] Fold Class 1byb 1gky 1fxd 1xel 1mda* 2bct 2trx 1drw† 1igd 1dky TIM barrel P-loop NTP hydrolases Ferredoxin like Rossmann fold 7-bladed beta-propeller aplha-alpha superhelix Thioredoxin fold G3P dehydrogenase-like beta grasp HSP70 C-term. fragment long helices oligomers Leu-zipper 118 Rank a/ b a/ b ab a/ b b a a/ b ab ab multi Fold 51 Genome [%] Composition Change Common Folds Folds that change a lot in frequency are not common Changing Folds 26 (c) Mark Gerstein, 2000, Yale, bioinfo.mbb.yale.edu Freq. Similar results Martin et al. (1998) The number of interactions for each fold = the number of other folds it is found to contact in the PDB 27 (c) Mark Gerstein, 2000, Yale, bioinfo.mbb.yale.edu Most Versatile Folds – Relation to Interactions unclassified transcription transport signaling Prot. Syn. cell structure energy Functional Category (MIPS) TMs ab 28 (c) Mark Gerstein, 2000, Yale, bioinfo.mbb.yale.edu Transcriptome Enrichment Composition of Transcriptome in terms of Functional Classes NS Genome Composition Transcriptome Composition VGA Amino Acid 29 (c) Mark Gerstein, 2000, Yale, bioinfo.mbb.yale.edu Transcriptome Enrichment Composition of Genome vs. Transcriptome 2D-gel electrophoresis Data sets: Futcher (71), Aebersold (156), scaled set with 171 proteins New effect is dealing with gene selection bias What is Proteome? Protein complement in genome or cellular protein population? 31 (c) Mark Gerstein, 2000, Yale, bioinfo.mbb.yale.edu Relating the Transcriptome to Cellular Protein Abundance (Translatome) Amino Acid Enrichment mRNA Simple story is translatome is enriched in same way as transcriptome 33 (c) Mark Gerstein, 2000, Yale, bioinfo.mbb.yale.edu Protein Protein 34 (c) Mark Gerstein, 2000, Yale, bioinfo.mbb.yale.edu Amino Acid Enrichment – Complexities mRNA Analysis of Genomes & Transcriptomes in terms of the Occurrence of Parts & Features 5(G)1(T) 7(G)15(T) 1 Using Parts to Interpret Genomes. Shared and/or unique parts. Venn 1(G)2(Y) 2 Using Parts to Interpret Pseudogenomes. In worm, top Y-folds (DNAse, hydrolase) v top-folds (Ig). chr. IV enriched, dead and dying families (90YG v 1G) 3 Using Parts to Interpret Transcriptomes: Expression & Structure. Top-10 parts in mRNA. Enriched in transcriptome: ab folds, energy, synthesis,TIM fold, VGA. Depleted: TMs, transport, transcription, Leu-zip, NS. Compare with prot. abundance. 4 Expression & Localization. Enriched : Cytoplasmic. Depleted: Nuclear. Bayesian localizer 5 Expression & Function. Expression relates to structure & localization but to function, globally? P-value formalism. Weak relation to protein-protein interactions. bioinfo.mbb.yale.edu H Hegyi, J Lin, B Stenger, P Harrison, N Echols, R Jansen, A Drawid, J Qian, D Greenbaum, M Snyder 35 (c) Mark Gerstein, 2000, Yale, bioinfo.mbb.yale.edu Diagrams, Fold tree with all-b diff. Ortholog tree. Top-10 folds. Composition of Transcriptome in terms of Broad Structural Classes ab protein # TM helices in yeast protein Fold Class of Soluble Proteins 36 (c) Mark Gerstein, 2000, Yale, bioinfo.mbb.yale.edu Transcriptome Enrichment Membrane (TM) Protein 37 (c) Mark Gerstein, 2000, Yale, bioinfo.mbb.yale.edu Expression Level is Related to Localization 38 (c) Mark Gerstein, 2000, Yale, bioinfo.mbb.yale.edu Distributions of Expression Levels ~6000 yeast genes with expression levels 39 (c) Mark Gerstein, 2000, Yale, bioinfo.mbb.yale.edu but only ~2000 with localization…. State Vects loc= Represent localization of each protein by the state vector P(loc) and each feature by the feature vector P(feature|loc). Use Bayes rule to update. 18 Features: Expression Level (absolute and fluctuations), signal seq., KDEL, NLS, Essential?, aa composition 40 (c) Mark Gerstein, 2000, Yale, bioinfo.mbb.yale.edu Bayesian System for Localizing Proteins Feature Vects P(feature|loc) loc= Represent localization of each protein by the state vector P(loc) and each feature by the feature vector P(feature|loc). Use Bayes rule to update. State Vects 41 (c) Mark Gerstein, 2000, Yale, bioinfo.mbb.yale.edu Bayesian System for Localizing Proteins Feature Vects P(feature|loc) Individual proteins: 75% with cross-validation Carefully clean training dataset to avoid circular logic Testing, training data, Priors: ~2000 proteins from Swiss-Prot Master List Also, YPD, MIPS, Snyder Lab 42 (c) Mark Gerstein, 2000, Yale, bioinfo.mbb.yale.edu Results on Testing Data Compartment Populations. Like QM, directly sum state vectors to get population. Gives 96% pop. similarity. 43 (c) Mark Gerstein, 2000, Yale, bioinfo.mbb.yale.edu Results on Testing Data #2 ytoplasmic Mem. uclear 44 (c) Mark Gerstein, 2000, Yale, bioinfo.mbb.yale.edu Extrapolation to Compartment Populations of Whole Yeast Genome: ~4000 predicted + ~2000 known Analysis of Genomes & Transcriptomes in terms of the Occurrence of Parts & Features 5(G)1(T) 7(G)15(T) 1 Using Parts to Interpret Genomes. Shared and/or unique parts. Venn 1(G)2(Y) 2 Using Parts to Interpret Pseudogenomes. In worm, top Y-folds (DNAse, hydrolase) v top-folds (Ig). chr. IV enriched, dead and dying families (90YG v 1G) 3 Using Parts to Interpret Transcriptomes: Expression & Structure. Top-10 parts in mRNA. Enriched in transcriptome: ab folds, energy, synthesis,TIM fold, VGA. Depleted: TMs, transport, transcription, Leu-zip, NS. Compare with prot. abundance. 4 Expression & Localization. Enriched : Cytoplasmic. Depleted: Nuclear. Bayesian localizer 5 Expression & Function. Expression relates to structure & localization but to function, globally? P-value formalism. Weak relation to protein-protein interactions. bioinfo.mbb.yale.edu H Hegyi, J Lin, B Stenger, P Harrison, N Echols, R Jansen, A Drawid, J Qian, D Greenbaum, M Snyder 45 (c) Mark Gerstein, 2000, Yale, bioinfo.mbb.yale.edu Diagrams, Fold tree with all-b diff. Ortholog tree. Top-10 folds. Do Expression Clusters Relate to Protein Function? 46 (c) Mark Gerstein, 2000, Yale, bioinfo.mbb.yale.edu Can they predict functions? Func. A Func. B • • Clustering of expression profiles Grouping functionally related genes together (?) • Botstein (Eisen), Lander, Haussler, and Church groups, Eisenberg Sample for Diauxic shift Expt. (Brown), Ex. Ravg,G=3 = [ R(gene-1,gene-3) + R(gene-1,gene-4) + R(gene-5,gene-7) ] / 3 47 (c) Mark Gerstein, 2000, Yale, bioinfo.mbb.yale.edu Distributions of Gene Expression Correlations, for All Possible Gene Groupings P-value for specific 10gene func. group Sample for Diauxic shift Expt. (Brown), Ex. Ravg,G=3 = [ R(gene-1,gene-3) + R(gene-1,gene-4) + R(gene-5,gene-7) ] / 3 48 (c) Mark Gerstein, 2000, Yale, bioinfo.mbb.yale.edu Distributions of Gene Expression Correlations, for All Possible Gene Groupings 2 Correlation: Always Significant 49 (c) Mark Gerstein, 2000, Yale, bioinfo.mbb.yale.edu Sometimes Significant (depends on expt.) Never Significant Based on Distributions, Correlation of Established Functional Categories, Computer Clusterings Protein-Protein Interactions & Expression Sets of interactions (all pairs) (control) (from MIPS) (Uetz et al.) between selected expression timecourses in CDC28 expt. (Davis) (strong interaction, clearly diff.) 50 (c) Mark Gerstein, 2000, Yale, bioinfo.mbb.yale.edu Use same formalism to assess how closely related expression timecourses to sets of known p-p interactions Sets of Interacting Proteins for Relation of P-P Interactions to Abs. Expression Level 51 (c) Mark Gerstein, 2000, Yale, bioinfo.mbb.yale.edu Distribution of Normalized Expression Levels Can we define FUNCTION well enough to relate to expression? Fold, Localization, Interactions & Regulation are Problems defining function: Multi-functionality: 2 functions/protein (also 2 proteins/function) attributes of proteins that are much more clearly defined Conflating of Roles: molecular action, cellular role, phenotypic Non-systematic Terminology: ‘suppressor-of-white-apricot’ & ‘darkener-of-apricot’ Functional Classification (cross-org., just conserved, NCBI Koonin/Lipman) GenProtEC (E. coli, Riley) (SwissProt Bairoch/ Apweiler, just enzymes, cross-org.) Also: Other SwissProt Annotation “Fly” (fly, Ashburner) now extended to GO (cross-org.) MIPS/PEDANT (yeast, Mewes) WIT, KEGG (just pathways) TIGR EGAD (human ESTs) 24 (c) Mark Gerstein, 2000, Yale, bioinfo.mbb.yale.edu ENZYME COGs vs. 52 (c) Mark Gerstein, 2000, Yale, bioinfo.mbb.yale.edu manifestation. Phenotype ORF Clusters from Transposon Expt. 20 Conditions Conditions 20 Metabolism Cold k-means clustering of ORFs based on “phenotype patterns,” crossref. to MIPs Functional Classes Cluster showing cold phenotype (containing genes most necessary in cold) is enriched in metabolic functions M Snyder, A Kumar, et al…. 54 (c) Mark Gerstein, 2000, Yale, bioinfo.mbb.yale.edu 28 ORFs 28 ORFs in cluster in cluster Transposon insertions into (almost) each yeast gene to see how yeast is affected in 20 conditions. Generates a phenotype pattern vector, which can be treated similarly to expression data Analysis of Genomes & Transcriptomes in terms of the Occurrence of Parts & Features 5(G)1(T) 7(G)15(T) 1 Using Parts to Interpret Genomes. Shared and/or unique parts. Venn 1(G)2(Y) 2 Using Parts to Interpret Pseudogenomes. In worm, top Y-folds (DNAse, hydrolase) v top-folds (Ig). chr. IV enriched, dead and dying families (90YG v 1G) 3 Using Parts to Interpret Transcriptomes: Expression & Structure. Top-10 parts in mRNA. Enriched in transcriptome: ab folds, energy, synthesis,TIM fold, VGA. Depleted: TMs, transport, transcription, Leu-zip, NS. Compare with prot. abundance. 4 Expression & Localization. Enriched : Cytoplasmic. Depleted: Nuclear. Bayesian localizer 5 Expression & Function. Expression relates to structure & localization but to function, globally? P-value formalism. Weak relation to protein-protein interactions. bioinfo.mbb.yale.edu H Hegyi, J Lin, B Stenger, P Harrison, N Echols, R Jansen, A Drawid, J Qian, D Greenbaum, M Snyder 55 (c) Mark Gerstein, 2000, Yale, bioinfo.mbb.yale.edu Diagrams, Fold tree with all-b diff. Ortholog tree. Top-10 folds. GeneCensus bioinfo.mbb.yale.edu Detailed Tables Alignment Server ORF Query Ranks Trees PDB Query 56 (c) Mark Gerstein, 2000, Yale, bioinfo.mbb.yale.edu Alignment Database J Qian, B Stenger, J Lin.... Rank Folds by Genome Occurrence, Expression, Fold Clustering, Length, &c 57 (c) Mark Gerstein, 2000, Yale, bioinfo.mbb.yale.edu PartsList Ranking Viewers 58 (c) Mark Gerstein, 2000, Yale, bioinfo.mbb.yale.edu Surveying a Finite PartsList from Many Perspective Recluster organisms based on folds, composition, &c and compare to traditional taxonomy 59 (c) Mark Gerstein, 2000, Yale, bioinfo.mbb.yale.edu GeneCensus Dynamic Tree Viewers Analysis of Genomes & Transcriptomes in terms of the Occurrence of Parts & Features 5(G)1(T) 7(G)15(T) 1 Using Parts to Interpret Genomes. Shared and/or unique parts. Venn 1(G)2(Y) 2 Using Parts to Interpret Pseudogenomes. In worm, top Y-folds (DNAse, hydrolase) v top-folds (Ig). chr. IV enriched, dead and dying families (90YG v 1G) 3 Using Parts to Interpret Transcriptomes: Expression & Structure. Top-10 parts in mRNA. Enriched in transcriptome: ab folds, energy, synthesis,TIM fold, VGA. Depleted: TMs, transport, transcription, Leu-zip, NS. Compare with prot. abundance. 4 Expression & Localization. Enriched : Cytoplasmic. Depleted: Nuclear. Bayesian localizer 5 Expression & Function. Expression relates to structure & localization but to function, globally? P-value formalism. Weak relation to protein-protein interactions. bioinfo.mbb.yale.edu H Hegyi, J Lin, B Stenger, P Harrison, N Echols, R Jansen, A Drawid, J Qian, D Greenbaum, M Snyder 60 (c) Mark Gerstein, 2000, Yale, bioinfo.mbb.yale.edu Diagrams, Fold tree with all-b diff. Ortholog tree. Top-10 folds.