* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download SRI International Bioinformatics
Survey
Document related concepts
Transcript
Computational Exploration of Metabolic Networks with Pathway Tools Part 2: APIs & Examples Randy Gobbel, Ph.D. Bioinformatics Research Group SRI International [email protected] http://BioCyc.org/ Computing with Pathway Tools: APIs Generic SRI International Bioinformatics functions with a consistent naming scheme Basic frame access functions Built-in functions for analysis and global statistics Simultaneous access to multiple KBs Cross-species comparisons Specialized KBs MetaCyc SchemaBase Computing with Pathway Tools: APIs SRI International Bioinformatics PerlCyc interface Library of Perl functions for querying PGDBs via socket connection Database access functions Select_Organism, All_Pathways Functions for performing inference / hardwired queries Genes_Of_Reaction, Genes_Of_Pathway Transcription_Unit_Transcription_Factors Enzyme_P JavaCyc interface also in progress http://aracyc.stanford.edu/~mueller/perlcyc/ Lisp API http://bioinformatics.ai.sri.com/ptools/ptools-resources.html Perlcyc and Javacyc Interface SRI International Bioinformatics to running Pathway Tools image through TCP Names are translated to Perl and Java conventions Object references are supported by means of unique frame names SRI International Bioinformatics Pathway Tools API Functions get_class_all_instances(Class) Returns the instances of Class Key Pathway Tools classes: Genetic-Elements Genes Proteins Polypeptides Protein-Complexes Pathways Reactions Compounds-And-Elements Enzymatic-Reactions Transcription-Units Promoters DNA-Binding-Sites Pathway Tools API Functions SRI International Bioinformatics Notation Frame.Slot means a specified slot of a specified frame get_slot_value(Frame Slot) Returns first value of Frame.Slot get_slot_values(Frame Slot) Returns all values of Frame.Slot slot_has_value_p(Frame Slot) Returns true if Frame.Slot has at least one value member_slot_value_p(Frame Slot Value) Returns true if Value is one of the values of Frame.Slot Additional Pathway Tools Functions – Semantic Inference Layer Built-in SRI International Bioinformatics functions encode commonly used queries that compute indirect DB relationships genes_of_pathway, substrates_of_pathway all_transcription_factors, regulon_of_protein See http://bioinformatics.ai.sri.com/ptools/ptoolsfns.html for more information Computing with Pathway Tools: Flat Files SRI International Bioinformatics Two file formats: tab-delimited, attribute-value One file for each format, each datatype Specification: http://bioinformatics.ai.sri.com/ptools/flatfile-format.html Examples: Pathways.col – Pathways and genes encoding enzymes Enzymes.col – Enzymes and reactions they catalyze Pathways.dat – Full data on each pathway Reactions.dat – Full data on each reaction Example Flat File SRI International Bioinformatics UNIQUE-ID - P107-PWY TYPES - Energy-Metabolism COMMON-NAME - RuMP cycle and formaldehyde assimilation REACTION-LIST - FORMATEDEHYDROG-RXN REACTION-LIST - FORMALDEHYDE-DEHYDROGENASE-RXN REACTION-LIST - 6PGLUCONDEHYDROG-RXN REACTION-LIST - R84-RXN REACTION-LIST - PGLUCISOM-RXN REACTION-LIST - R12-RXN REACTION-LIST - R10-RXN SYNONYMS - ribulose-monophosphate cycle SYNONYMS - formaldehyde oxidation // Example Flat File – Reactions.dat UNIQUE-ID - R84-RXN TYPES - EC-1.1.1 EC-NUMBER - 1.1.1.IN-PATHWAY - P122-PWY IN-PATHWAY - P107-PWY LEFT - GLC-6-P LEFT - NAD OFFICIAL-EC? - NO RIGHT - 6-P-GLUCONATE RIGHT - NADH RIGHT - PROTON // SRI International Bioinformatics Example Flat File – Compounds.dat UNIQUE-ID - GLC-6-P TYPES - Carbohydrate-Derivatives COMMON-NAME - glucose-6-phosphate CAS-REGISTRY-NUMBERS - 56-73-5 CHEMICAL-FORMULA - (C 6) CHEMICAL-FORMULA - (H 13) CHEMICAL-FORMULA - (O 9) CHEMICAL-FORMULA - (P 1) MOLECULAR-WEIGHT - 260.137 SYNONYMS - D-glucose-6-P SYNONYMS - glucose-6-P SYNONYMS - α-D-glucose-6-phosphate SYNONYMS - α-D-glucose-6-P SYNONYMS - D-glucose-6-phosphate // SRI International Bioinformatics Bioinformatics Results: Algorithms SRI International Bioinformatics Query and visualization environment for genome and pathway information PathoLogic algorithm predicts the metabolic network of an organism from its genome Algorithm for global characterization of a metabolic network Algorithms under development for qualitative modeling of the cell The Pathway Tools KB as a "virtual cell" Detailed SRI International Bioinformatics representation of proteins, including subunits Protein complexes and modifications Links from genome, through proteins, to pathways and superpathways Computing with the Metabolic Network SRI International Bioinformatics Comparative analysis of metabolic networks Visualization of expression data Correlation of metabolism and transport Connectivity analysis of metabolic network Forward propagation of metabolites Verification of known growth media with metabolic network Computational Exploration of PGDBs SRI International Bioinformatics Infer metabolic network from genome Bioinformatics 18:705 2002 Global properties of the metabolic network Genome Research 10:568 2000 Global properties of the genetic network Comparison Consistency of whole metabolic networks of a PGDB with respect to known growth-media requirements Search for gaps in metabolic network Pacific Symp Biocomputing 2001:471 Example Studies Relationship of protein subunits to gene positions Global properties of the E. coli metabolic network Reactions catalyzed by more than one enzyme Enzymes that catalyze more than one reaction Reactions participating in more than one pathway SRI International Bioinformatics Automatic detection of intersection points in the metabolic network Nutrient analyses Forward propagation: Given a set of nutrients, what compounds will be produced by the metabolic network? Backtracking: Given a forward propagation result, and a set of essential compounds that are not included in that result, what precursors must be supplied to produce those compounds? Operon prediction Protein subunits and linked genes Question: SRI International Bioinformatics are protein subunits coded by neighboring genes? Proteins are linked to genes, gene positions are recorded in the KB Procedure Fetch all protein complexes Subunits are stored in the ‘components’ slot Each component has a ‘gene’ slot Genes have ‘left-end-position’ and ‘right-end-position’ slots Results Protein subunits of >90% of heteromeric enzymes are encoded by neighboring genes Global properties: How many reactions are catalyzed by more than one enzyme? SRI International Bioinformatics Procedure get_class_all_instances(‘Reactions’) We are interested only in reactions with at least one value in their ‘enzymatic-reaction’ slot result = reactions with more than one value for their ‘enzymatic-reaction’ slot Results About 10% of reactions are catalyzed by more than one enzyme Two classes of multi-enzyme reactions Homologous enzymes “Easy” reactions Global properties: Multifunctional enzymes (how many enzymes catalyze more than one reaction?) SRI International Bioinformatics Procedure get_class_all_instances(‘Proteins’) result = proteins with more than one value in the ‘catalyzes’ slot Results 100 out of 607 enzymes catalyze multiple reactions This is significantly more than predicted by genome sequencing projects Global properties: Reactions in multiple pathways SRI International Bioinformatics Procedure get_class_all_instances(‘Reactions’) result = reactions with more than one value in the ‘inpathway’ slot Significance Reactions that appear in multiple pathways correspond to intersection points in the metabolic network Could be used to identify candidate reactions for drug targets Metabolic Overview Queries Species comparison Highlight reactions that are Shared/not-shared with Any-one/All-of A specified set of species Overlay expression data Absolute or relative expression levels Reaction colors reflects expression level SRI International Bioinformatics SRI International Bioinformatics A E SRI International Bioinformatics SRI International Bioinformatics C. crescentus Cell Cycle Gene Expression Global Consistency Checking of Biochemical Network SRI International Bioinformatics Given: A PGDB for an organism A set of initial metabolites Infer: What set of products can be synthesized by the smallmolecule metabolism of the organism Can known growth medium yield known essential compounds? Pacific Symposium on Biocomputing p471 2001 SRI International Bioinformatics Algorithm: Forward Propagation Nutrient set Products Metabolite set PGDB reaction pool Reactants “Fire” reactions Results SRI International Bioinformatics Phase I: Forward propagation 21 initial compounds yielded only half of 38 essential compounds for E. coli Phase II: Manually identify Bugs in EcoCyc (e.g., two objects for tryptophan) Missing initial protein substrates (e.g., ACP) Missing pathways in EcoCyc Phase III: Forward propagation with 11 more initial metabolites Yielded all 38 essential compounds SRI International Bioinformatics Initial Metabolites (Total: 21 compounds) Nutrients (8) (M61 Minimal growth medium) Nutrients (10) (Growth conditions) Bootstrap Compounds (3) + 2+ 2+ + H , Fe , Mg , K , NH3, 22SO4 , PO4 , Glucose Water, Oxygen, Trace elements (Mn2+, Co2+, Mo2+, Ca2+, Zn2+, Cd2+, Ni2+, Cu2+) ATP, NADP, CoA SRI International Bioinformatics Nutrient-Related Analysis: Validation of the EcoCyc Database Results on EcoCyc: Phase I: • Essential compounds • produced • not produced 19 19 • Total compounds • produced: (28%) • Reactions • Fired (31%) Missing Essential Compounds Due To Bugs SRI International Bioinformatics in EcoCyc Narrow conceptualization of the problem Protein substrates Incomplete biochemical knowledge SRI International Bioinformatics Nutrient-Related Analysis: Validation of the EcoCyc Database Results on EcoCyc: Phase II (After adding 11 extra metabolites): • Essential compounds • produced • not produced • Total compounds • produced: • not produced: • Reactions • Fired • Not fired 38 0 (49%) (51%) (58%) (42%) Operon Prediction Based SRI International Bioinformatics on the method of Moreno-Hagelsieb et al. Bioinformatics 18 Suppl. 1 (2002) Distance between genes Functional classification Correctly predicts 75% of transcription units, 65% of operons Additional information available in PGDB Pathways Protein complexes Transporters Improved prediction performance: 80% of transcription units, 69% of operons Detailed paper in preparation Visualization of Genetic Network SRI International Bioinformatics Operon display window Transcription factor display window Highlight regulon on Overview diagram Paint expression data onto Overview diagram Database adapter mechanism: MAGE-ML intermediate form Adapter defined for SMD Animation User specified mapping of color ranges Import of SAM files (next release) List of significantly +/- genes Display full genetic network (later release) SRI International Bioinformatics Acknowledgements SRI Peter Karp, Suzanne Paley, Pedro Romero, John Pick, Randy Gobbel, Cindy Krieger, Martha Arnaud EcoCyc Project Julio Collado-Vides, Ian Paulsen, Monica Riley, Milton Saier MetaCyc Project Sue Rhee, Lukas Mueller, Peifen Zhang, Chris Somerville Stanford Gary Schoolnik, Harley McAdams, Lucy Shapiro, Russ Altman, Iwei Yeh Funding sources: NIH National Center for Research Resources NIH National Institute of General Medical Sciences NIH National Human Genome Research Institute Department of Energy Microbial Cell Project DARPA BioSpice, UPC BioCyc.org